Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > ... > > > background: in the current implementation, this decision has to > > > be made at compile time, and a compiled expression can be used > > > with either 8-bit strings or 16-bit strings. > ... > > For Unicode objects you should really default to using the > > Py_UNICODE_ISLINEBREAK() macro which defines all line break > > characters (note that CRLF should be interpreted as a > > single line break; see PyUnicode_Splitlines()). The reason > > here is that Unicode defines how to handle line breaks > > and we should try to stick to the standard as close as possible. > > All other possibilities could still be made available via new > > flags. > > > > For 8-bit strings I'd suggest sticking to the re definition. > > guess my background description wasn't clear: > > Once a pattern has been compiled, it will always handle line > endings in the same way. The parser doesn't really care if the > pattern is a unicode string or an 8-bit string (unicode strings > can contain "wide" characters, but that's the only difference). Ok. > At the other end, the same compiled pattern can be applied > to either 8-bit or unicode strings. It's all just characters to > the engine... Doesn't the engine remember wether the pattern was a string or Unicode ? > Now, I can of course change the engine so that it always uses > chr(10) on 8-bit strings and LINEBREAK on 16-bit strings, but the > result is that > > pattern.match(widestring) > > won't necessarily match the same thing as > > pattern.match(str(widestring)) > > even if the wide string only contains plain ASCII. Hmm, I wouldn't mind, as long as the engine does the right thing for Unicode which is to respect the line break standard defined in Unicode TR13. Thinking about this some more: I wouldn't even mind if the engine would use LINEBREAK for all strings :-). It would certainly make life easier whenever you have to deal with file input from different platforms, e.g. Mac, Unix and Windows. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4