I wrote: > what's the best way to deal with this? I see three alter- > natives: >=20 > a) stick to the old definition, and use chr(10) also for > unicode strings >=20 > b) use different definitions for 8-bit strings and unicode > strings; if given an 8-bit string, use chr(10); if given > a 16-bit string, use the LINEBREAK predicate. >=20 > c) use LINEBREAK in either case. >=20 > I think (c) is the "right thing", but it's the only that may > break existing code... I'm probably getting old, but I don't remember if anyone followed up on this, and I don't have time to check the archives right now. so for the upcoming "feature complete" release, I've decided to stick to (a). ... for the next release, I suggest implementing a fourth alternative: d) add a new unicode flag. if set, use LINEBREAK. otherwise, use chr(10). background: in the current implementation, this decision has to be made at compile time, and a compiled expression can be used with either 8-bit strings or 16-bit strings. a fifth alternative would be to use the locale flag to tell the difference between unicode and 8-bit characters: e) if locale is not set, use LINEBREAK. otherwise, use chr(10). comments? </F> <project name=3D"sre" phase=3D" complete=3D"97.1%" />
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4