A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2017-December/151102.html below:

[Python-Dev] Zero-width matching in regexes

[Python-Dev] Zero-width matching in regexesSerhiy Storchaka storchaka at gmail.com
Wed Dec 6 09:15:12 EST 2017
06.12.17 15:37, Paul Moore пише:
> Behaviour (1) means that we'd get
> 
> >>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION1)
> 'xx xx'
> 
> (because \w* matches the empty string after each word, as well as each
> word itself). I just tested in Perl, and that is indeed what happens
> there as well.

Yes, because in this case you need to use `\w+`, not `\w*`.

No CPython tests will be failed if change re.sub() to behaviour (2) 
except just added in 3.7 tests and the one test specially purposed to 
guard the old behavior. But I don't know how much third party code will 
be broken if made this change.

> On that basis, I have to say that I find behaviour (2) more intuitive
> and (arguably) "correct":
> 
> >>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION0)
> 'x x'
> >>> re.sub(r'\w*', 'x', 'hello world')
> 'x x'

The actual behavior of re.sub() and regex.sub() in the VERSION0 mode was 
a weird behavior (4).

 >>> regex.sub(r'(\b|\w+)', r'[\1]', 'hello world', flags=regex.VERSION0)
'[]h[ello] []w[orld]'
 >>> regex.sub(r'(\b|\w+)', r'[\1]', 'hello world', flags=regex.VERSION1)
'[][hello][] [][world][]'
 >>> re.sub(r'(\b|\w+)', r'[\1]', 'hello world')  # 3.6, behavior (4)
'[]h[ello] []w[orld]'
 >>> re.sub(r'(\b|\w+)', r'[\1]', 'hello world')  # 3.7, behavior (2)
'[][hello] [][world]'

More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4