On 2017-10-29 12:27, Serhiy Storchaka wrote: > 27.10.17 18:35, Guido van Rossum пише: >> The "why" question is not very interesting -- it probably wasn't in PCRE >> and nobody was familiar with it when we moved off PCRE (maybe it wasn't >> even in Perl at the time -- it was ~15 years ago). >> >> I didn't understand your description of \G so I googled it and found a >> helpful StackOverflow article: >> https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex. >> From this I understand that when using e.g. findall() it forces >> successive matches to be adjacent. > > This looks too Perlish to me. In Perl regular expressions are the part > of language syntax, they can contain even Perl expressions. Arguments to > them are passed implicitly (as well as to Perl's analogs of str.strip() > and str.split()) and results are saved in global special variables. > Loops also can be implicit. > > It seems to me that \G makes sense only to re.findall() and > re.finditer(), not to re.match(), re.search() or re.split(). > > In Python all this is explicit. Compiled regular expressions are > objects, and you can pass start and end positions to Pattern.match(). > The Python equivalent of \G looks to me like: > > p = re.compile(...) > i = 0 > while True: > m = p.match(s, i) > if not m: break > ... > i = m.end() > > You're correct. \G matches at the start position, so .search(r\G\w+') behaves the same as .match(r'\w+'). findall and finditer perform a series of searches, but with \G at the start they'll perform a series of matches, each anchored at where the previous one ended. > The one also can use the undocumented Pattern.scanner() method. Actually > Pattern.finditer() is implemented as iter(Pattern.scanner().search). > iter(Pattern.scanner().match) would return an iterator of adjacent matches. > > I think it would be more Pythonic (and much easier) to add a boolean > parameter to finditer() and findall() than introduce a \G operator. >
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4