A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2005-April/052533.html below:

[Python-Dev] Unicode byte order mark decoding

[Python-Dev] Unicode byte order mark decoding [Python-Dev] Unicode byte order mark decodingEvan Jones ejones at uwaterloo.ca
Tue Apr 5 21:53:05 CEST 2005
On Apr 5, 2005, at 15:33, Walter Dörwald wrote:
> The stateful decoder has a little problem: At least three bytes
> have to be available from the stream until the StreamReader
> decides whether these bytes are a BOM that has to be skipped.
> This means that if the file only contains "ab", the user will
> never see these two characters.

Shouldn't the decoder be capable of doing a partial match and quitting 
early? After all, "ab" is encoded in UTF8 as <61> <62> but the BOM is 
<ef> <bb> <bf>. If it did this type of partial matching, this issue 
would be avoided except in rare situations.

> A solution for this would be to add an argument named final to
> the decode and read methods that tells the decoder that the
> stream has ended and the remaining buffered bytes have to be
> handled now.

This functionality is provided by a flush() method on similar objects, 
such as the zlib compression objects.

Evan Jones

More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4