A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2007-August/074121.html below:

[Python-Dev] Regular expressions, Unicode etc.

[Python-Dev] Regular expressions, Unicode etc.Mike Klaas mike.klaas at gmail.com
Wed Aug 8 22:29:50 CEST 2007
In 8-Aug-07, at 12:47 PM, Nick Maclaren wrote:

>
>>> The other approach, which is to stick to true regular expressions,
>>> and wholly or partially convert to DFAs, has already been rendered
>>> impossible by even the limited Perl/PCRE extensions that Python
>>> has adopted.
>>
>> Impossible?  Surely, a sufficiently-competent re engine could detect
>> when a DFA is possible to construct?
>
> I doubt it.  While it isn't equivalent to the halting problem, it IS
> an intractable one!  There are two problems:
>
> Firstly, things like backreferences are an absolute no-no.  They
> are not regular, and REs with them in cannot be converted to DFAs.
> That could be 'solved' by a parser that kicked out such constructions,
> but it would get screams from many users.
>
> Secondly, anything involving explicit or implicit negation can lead
> to (if I recall) a super-exponential explosion in the size of the
> DFA.  That could be 'solved' by imposing a limit, but few people
> would be able to predict when it would bite.

Right.  The analysis I envisioned would be more along the lines of  
"if troublesome RE extensions are used, do not attempt to construct a  
DFA".  It could even be exposed via an alternate api (re.compile_dfa 
()) that admitted a subset of the usual grammar.

> Thirdly, I would require notice of the question of whether capturing
> parentheses could be supported, and what semantic changes would be
> to which were set and how.

Capturing groups are rather integral to the python regex api and, as  
you say, a major difficulty for DFA-based implementations.  Sounds  
like a task best left to a thirdparty package.

-Mike
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4