On Sat, 13 May 2000 14:56:41 +0200, you wrote: >in the current 're' engine, a newline is chr(10) and nothing >else. > >however, in the new unicode aware engine, I used the new >LINEBREAK predicate instead, but it turned out to break one >of the tests in the current test suite: > > sre.match('a\rb', 'a.b') => None > >(unicode adds chr(13), chr(28), chr(29), chr(30), and also >unichr(133), unichr(8232), and unichr(8233) to the list of >line breaking codes) > >what's the best way to deal with this? I see three alter- >natives: > >a) stick to the old definition, and use chr(10) also for > unicode strings In the ORO matcher that comes with jpython, the dot matches all but chr(10). But that is bad IMO. Unicode should use the LINEBREAK predicate. regards, finn
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4