A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2004-August/048026.html below:

[Python-Dev] Decoding incomplete unicode

[Python-Dev] Decoding incomplete unicode"Martin v. Löwis" martin at v.loewis.de
Wed Aug 18 23:57:22 CEST 2004
Walter Dörwald wrote:
> But then a file that contains the two bytes 0x61, 0xc3
> will never generate an error when read via an UTF-8 reader.
> The trailing 0xc3 will just be ignored.
> 
> Another option we have would be to add a final() method
> to the StreamReader, that checks if all bytes have been
> consumed. 

Alternatively, we could add a .buffer() method that returns
any data that are still pending (either a Unicode string or
a byte string).

> Maybe this should be done by StreamReader.close()?

No. There is nothing wrong with only reading a part of a file.

> Now
> inShift counts the number of characters (and the shortcut
> for a "+-" sequence appearing together has been removed.

Ok. I didn't actually check the correctness of the individual
methods.

OTOH, I think time spent on UTF-7 is wasted, anyway.

> Would a version of the patch without a final argument but
> with a feed() method be accepted?

I don't see the need for a feed method. .read() should just
block until data are available, and that's it.

> I'm imagining implementing an XML parser that uses Python's
> unicode machinery and supports the
> xml.sax.xmlreader.IncrementalParser interface.

I think this is out of scope of this patch. The incremental
parser could implement a regular .read on a StringIO file
that also supports .feed.

> Without the feed method(), we need the following:
> 
> 1) A StreamQueue class that

Why is that? I thought we are talking about "Decoding
incomplete unicode"?

Regards,
Martin
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4