RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://mail.python.org/pipermail/python-dev/2002-April/023721.html below:

[Python-Dev] Unicode entities in XML cause problems :-(

[Python-Dev] Unicode entities in XML cause problems :-(Matthias Urlichs smurf@noris.de
Sat, 27 Apr 2002 21:30:57 +0200

Previous message: [Python-Dev] _PyString_Resize
Next message: [Python-Dev] Unicode entities in XML cause problems :-(
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Playing around with xml.dom.minidom, I noticed that this beast is
perfectly able to read HTML which it can't print:

>>> import xml.dom.minidom as md
>>> d=md.parseString("<foo>b&#2000;</foo>"))
>>> d.writexml(sys.stdout)
...
UnicodeError: ASCII encoding error: ordinal not in range(128)

Ouch.

Scanning the sources, which revealed various ways to replace
'&' with '&amp;' but no generic codec for [ht|x]ml-escaped character
entities.

Thus, my proposal (which I'm going to implement since I need it...) is to
write such a codec. For simplicity, I propose to accept &uuml; and &euro;
and friends, but to emit them as &#1234; (or whatever).

After this codec is written, all occurrences of string.replace('&','&amp;')
(and vice versa) within the standard library can be replaced with the
appropriate encode/decode methods. 

Thoughts? Or am I totally blind, such a codec already exists, and I
have missed it?

--
Matthias Urlichs     |     noris network AG     |     http://smurf.noris.de/

Previous message: [Python-Dev] _PyString_Resize
Next message: [Python-Dev] Unicode entities in XML cause problems :-(
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4