06.12.17 15:37, Paul Moore пише: > Behaviour (1) means that we'd get > > >>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION1) > 'xx xx' > > (because \w* matches the empty string after each word, as well as each > word itself). I just tested in Perl, and that is indeed what happens > there as well. Yes, because in this case you need to use `\w+`, not `\w*`. No CPython tests will be failed if change re.sub() to behaviour (2) except just added in 3.7 tests and the one test specially purposed to guard the old behavior. But I don't know how much third party code will be broken if made this change. > On that basis, I have to say that I find behaviour (2) more intuitive > and (arguably) "correct": > > >>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION0) > 'x x' > >>> re.sub(r'\w*', 'x', 'hello world') > 'x x' The actual behavior of re.sub() and regex.sub() in the VERSION0 mode was a weird behavior (4). >>> regex.sub(r'(\b|\w+)', r'[\1]', 'hello world', flags=regex.VERSION0) '[]h[ello] []w[orld]' >>> regex.sub(r'(\b|\w+)', r'[\1]', 'hello world', flags=regex.VERSION1) '[][hello][] [][world][]' >>> re.sub(r'(\b|\w+)', r'[\1]', 'hello world') # 3.6, behavior (4) '[]h[ello] []w[orld]' >>> re.sub(r'(\b|\w+)', r'[\1]', 'hello world') # 3.7, behavior (2) '[][hello] [][world]'
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4