On Tue, 27 Jul 2004 22:39:45 +0200, Walter Dörwald <walter at livinglogic.de> wrote: > Pythons unicode machinery currently has problems when decoding > incomplete input. > > When codecs.StreamReader.read() encounters a decoding error it > reads more bytes from the input stream and retries decoding. > This is broken for two reasons: > 1) The error might be due to a malformed byte sequence in the input, > a problem that can't be fixed by reading more bytes. > 2) There may be no more bytes available at this time. Once more > data is available decoding can't continue because bytes from > the input stream have already been read and thrown away. > (sio.DecodingInputFilter has the same problems) StreamReaders and -Writers from CJK codecs are not suffering from this problems because they have internal buffer for keeping states and incomplete bytes of a sequence. In fact, CJK codecs has its own implementation for UTF-8 and UTF-16 on base of its multibytecodec system. It provides a "working" StreamReader/Writer already. :) > I've uploaded a patch that fixes these problems to SF: > http://www.python.org/sf/998993 > > The patch implements a few additional features: > - read() has an additional argument chars that can be used to > specify the number of characters that should be returned. > - readline() is supported on all readers derived from > codecs.StreamReader(). I have no comment for these, yet. > - readline() and readlines() have an additional option > for dropping the u"\n". +1 I wonder whether we need to add optional argument for writelines() to add newline characters for each lines, then. Hye-Shik
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4