RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2001-January/012070.html below:

Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows))

Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows))uche.ogbuji@fourthought.com uche.ogbuji@fourthought.com
Tue, 23 Jan 2001 10:28:18 -0700

Previous message: [Python-Dev] webbrowser.py
Next message: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows))
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> > This has nothing to do with Python. UTF-8 marks the codes 
> > from 128-191 as illegal prefix. 
> [...]
> > Perhaps the parser should catch the UnicodeError and
> > instead return a not-wellformed exception ?!
> 
> Right on both accounts. If no encoding is specified, and if the
> document appears not to be UTF-16 in any endianness, an XML processor
> shall assume it is UTF-8. As Marc-Andre explains, your document is not
> proper UTF-8, hence the error.
> 
> The confusing thing is that expat itself does not care about it not
> being UTF-8; that is only detected when the callback is invoked in
> pyexpat, and therefore conversion to a Unicode object is attempted.

Pyexpat violates the XML spec here.  XML parsers are not allowed to "recover" 
from well-formedness errors.  And I would classify blithley reporting the 
character data as "recovery".

However, I'm amazed that this wouldn't have come up before, considering the 
pedigree of expat.

I'll poke around, and raise a bug on the expat site if need be.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python

Previous message: [Python-Dev] webbrowser.py
Next message: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows))
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4