Hye-Shik Chang wrote: > On Tue, 27 Jul 2004 22:39:45 +0200, Walter Dörwald > <walter at livinglogic.de> wrote: > >>Pythons unicode machinery currently has problems when decoding >>incomplete input. >> >>When codecs.StreamReader.read() encounters a decoding error it >>reads more bytes from the input stream and retries decoding. >>This is broken for two reasons: >>1) The error might be due to a malformed byte sequence in the input, >> a problem that can't be fixed by reading more bytes. >>2) There may be no more bytes available at this time. Once more >> data is available decoding can't continue because bytes from >> the input stream have already been read and thrown away. >>(sio.DecodingInputFilter has the same problems) > > StreamReaders and -Writers from CJK codecs are not suffering from > this problems because they have internal buffer for keeping states > and incomplete bytes of a sequence. In fact, CJK codecs has its > own implementation for UTF-8 and UTF-16 on base of its multibytecodec > system. It provides a "working" StreamReader/Writer already. :) Seems you had the same problems with the builtin stream readers! ;) BTW, how do you solve the problem that incomplete byte sequences are retained in the middle of a stream, but should generate errors at the end? >>I've uploaded a patch that fixes these problems to SF: >>http://www.python.org/sf/998993 >> >>The patch implements a few additional features: >>- read() has an additional argument chars that can be used to >> specify the number of characters that should be returned. >>- readline() is supported on all readers derived from >> codecs.StreamReader(). > > I have no comment for these, yet. > >>- readline() and readlines() have an additional option >> for dropping the u"\n". > > +1 > > I wonder whether we need to add optional argument for writelines() > to add newline characters for each lines, then. This would probably be a nice convenient additional feature, but of course you could always pass a GE to writelines(): stream.writelines(line+u"\n" for line in lines) Bye, Walter Dörwald
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4