Neil Hodgson wrote: > Glenn Linderman: > >> and perhaps other things (and >> are there new Unicode control characters that could be used for line >> endings?), > > Unicode includes Line Separator U+2028 and Paragraph Separator > U+2029 but they are rarely supported and very rarely used. They are a > pain to work with since they are 3 byte sequences in UTF-8. Visual > Studio does support them. > > Python does not currently support these line separators such as in > this example which only reads 2 lines rather than 3: > > with open("x.txt", "wb") as f: > f.write("a\nb\u2029c\n".encode('utf-8')) > with open("x.txt", "r") as f: > n = 1 > for l in f.readlines(): > print(n, repr(l)) > n += 1 Please file a bug report for this. f.readlines() (or rather the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) for detecting line break characters. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4