Walter Dörwald wrote: > Pythons unicode machinery currently has problems when decoding > incomplete input. > > When codecs.StreamReader.read() encounters a decoding error it > reads more bytes from the input stream and retries decoding. > This is broken for two reasons: > 1) The error might be due to a malformed byte sequence in the input, > a problem that can't be fixed by reading more bytes. > 2) There may be no more bytes available at this time. Once more > data is available decoding can't continue because bytes from > the input stream have already been read and thrown away. > (sio.DecodingInputFilter has the same problems) > > I've uploaded a patch that fixes these problems to SF: > http://www.python.org/sf/998993 > > The patch implements a few additional features: > - read() has an additional argument chars that can be used to > specify the number of characters that should be returned. > - readline() is supported on all readers derived from > codecs.StreamReader(). > - readline() and readlines() have an additional option > for dropping the u"\n". > > The patch is still missing changes to the escape codecs > ("unicode_escape" and "raw_unicode_escape") and I haven't > touched the CJK codecs, but it has test cases that check > the new functionality for all affected codecs > (UTF-7, UTF-8, UTF-16, UTF-16-LE, UTF-16-BE). > > Could someone take a look at the patch? Just did... please see the comments in the SF tracker. I like the idea, but don't think the implementation is the right way to do it. Instead, I'd suggest using a new error handling strategy "break" ( = break processing as soon as errors are found). The advantage of this approach is twofold: * no new APIs or API changes are required * other codecs (including third-party ones) can easily implement the same strategy -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 27 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4