Walter Dörwald wrote: > But then a file that contains the two bytes 0x61, 0xc3 > will never generate an error when read via an UTF-8 reader. > The trailing 0xc3 will just be ignored. > > Another option we have would be to add a final() method > to the StreamReader, that checks if all bytes have been > consumed. Alternatively, we could add a .buffer() method that returns any data that are still pending (either a Unicode string or a byte string). > Maybe this should be done by StreamReader.close()? No. There is nothing wrong with only reading a part of a file. > Now > inShift counts the number of characters (and the shortcut > for a "+-" sequence appearing together has been removed. Ok. I didn't actually check the correctness of the individual methods. OTOH, I think time spent on UTF-7 is wasted, anyway. > Would a version of the patch without a final argument but > with a feed() method be accepted? I don't see the need for a feed method. .read() should just block until data are available, and that's it. > I'm imagining implementing an XML parser that uses Python's > unicode machinery and supports the > xml.sax.xmlreader.IncrementalParser interface. I think this is out of scope of this patch. The incremental parser could implement a regular .read on a StringIO file that also supports .feed. > Without the feed method(), we need the following: > > 1) A StreamQueue class that Why is that? I thought we are talking about "Decoding incomplete unicode"? Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4