M.-A. Lemburg wrote: >>Here's a rough draft: >> >> def textopen(name, mode="r", encoding=None): >> if "U" not in mode: >> mode += "U" > > > The "U" is not needed when opening files using codecs - > these always break lines using .splitlines() which > breaks lines according to the Unicode rules and also > knows about the various line break variants on different > platforms. Still, codecs typically don't implement universal newlines correctly. If you specify 'U', then do .read(), you deserve to get \n (U+0010) as the line separator; with most codecs, you get whatever line breaks where in the file. Passing 'U' to the underlying stream is wrong, as well: if the stream is double-byte oriented (e.g. UTF-16), the 'U' filtering will rarely do anything, but if it does something, it will be wrong. I agree that it would be desirable to have textopen always default to universal newlines, however, this is difficult to implement. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4