RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2001-January/011985.html below:

Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows))

Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows))Martin von Loewis loewis@informatik.hu-berlin.de
Mon, 22 Jan 2001 15:46:39 +0100 (MET)

Previous message: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows))
Next message: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows))
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> This has nothing to do with Python. UTF-8 marks the codes 
> from 128-191 as illegal prefix. 
[...]
> Perhaps the parser should catch the UnicodeError and
> instead return a not-wellformed exception ?!

Right on both accounts. If no encoding is specified, and if the
document appears not to be UTF-16 in any endianness, an XML processor
shall assume it is UTF-8. As Marc-Andre explains, your document is not
proper UTF-8, hence the error.

The confusing thing is that expat itself does not care about it not
being UTF-8; that is only detected when the callback is invoked in
pyexpat, and therefore conversion to a Unicode object is attempted.

The right solution probably would be to change expat so that it
determines correctness of the encoding for each string it gets as part
of the wellformedness analysis, and produces illformedness exceptions
when an encoding error occurs. Patches are welcome, although they
probable should go to sourceforge.net/projects/expat.

Regards,
Martin

Previous message: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows))
Next message: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows))
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4