> From: "Fredrik Lundh" <effbot@telia.com> > > I wrote: > > > what's the best way to deal with this? I see three alter- > > natives: > > > > a) stick to the old definition, and use chr(10) also for > > unicode strings > > > > b) use different definitions for 8-bit strings and unicode > > strings; if given an 8-bit string, use chr(10); if given > > a 16-bit string, use the LINEBREAK predicate. > > > > c) use LINEBREAK in either case. > > > > I think (c) is the "right thing", but it's the only that may > > break existing code... > > I'm probably getting old, but I don't remember if anyone followed > up on this, and I don't have time to check the archives right now. > > so for the upcoming "feature complete" release, I've decided to > stick to (a). > > ... > > for the next release, I suggest implementing a fourth alternative: > > d) add a new unicode flag. if set, use LINEBREAK. otherwise, > use chr(10). > > background: in the current implementation, this decision has to > be made at compile time, and a compiled expression can be used > with either 8-bit strings or 16-bit strings. > > a fifth alternative would be to use the locale flag to tell the > difference between unicode and 8-bit characters: > > e) if locale is not set, use LINEBREAK. otherwise, use chr(10). > > comments? I proposed before to see what Perl does -- since we're supposedly following Perl's RE syntax anyway. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4