A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2002-April/023730.html below:

[Python-Dev] Unicode entities in XML cause problems :-(

[Python-Dev] Unicode entities in XML cause problems :-(Martin v. Loewis martin@v.loewis.de
28 Apr 2002 01:28:09 +0200
"Matthias Urlichs" <smurf@noris.de> writes:

> >>> import xml.dom.minidom as md
> >>> d=md.parseString("<foo>b&#2000;</foo>"))
> >>> d.writexml(sys.stdout)
> ...
> UnicodeError: ASCII encoding error: ordinal not in range(128)
[...]
> Thus, my proposal (which I'm going to implement since I need it...) is to
> write such a codec. For simplicity, I propose to accept &uuml; and &euro;
> and friends, but to emit them as &#1234; (or whatever).

The proper fix, IMO, is to have writexml accept an encoding argument,
and, by default, write the output as UTF-8. Then there is no need for
character or entity references.

In any case, emitting &uuml; and &euro; in XML is wrong: you cannot
use them unless your document type provides them - you should not
assume that all XML files use the HTML DTD.

> After this codec is written, all occurrences of string.replace('&','&amp;')
> (and vice versa) within the standard library can be replaced with the
> appropriate encode/decode methods. 

Please see http://python.org/sf/432401. Walter is working on such a
codec.

Regards,
Martin




RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4