RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://mail.python.org/pipermail/python-dev/2002-April/023734.html below:

[Python-Dev] Unicode entities in XML cause problems :-(

[Python-Dev] Unicode entities in XML cause problems :-(Matthias Urlichs smurf@noris.de
Sun, 28 Apr 2002 06:16:10 +0200

Previous message: [Python-Dev] Unicode entities in XML cause problems :-(
Next message: [Python-Dev] Unicode entities in XML cause problems :-(
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

Martin v. Loewis:
> The proper fix, IMO, is to have writexml accept an encoding argument,
> and, by default, write the output as UTF-8. Then there is no need for
> character or entity references.
> 
The encoding should probably default to the one from the document header
(UTF-8 if that isn't given).

> In any case, emitting &uuml; and &euro; in XML is wrong: you cannot
> use them unless your document type provides them - you should not
> assume that all XML files use the HTML DTD.
> 
Good point. On the other hand, I didn't plan to do that anyway. ;-)
(Are &#1234; and friends OK with any DTD?)

> Please see http://python.org/sf/432401. Walter is working on such a
> codec.
> 
Thank you.

For XML escaping, the approach suggested by this patch would be to use
xmlcharrefreplace() (see the test script) as the error handler.
But that doesn't help with &<>". Personally, I rather dislike having to do
a separate replace() for these.

One approach would be to use character maps which have strategic holes
where & < > and possibly " live..?

-- 
Matthias Urlichs     |     noris network AG     |     http://smurf.noris.de/

Previous message: [Python-Dev] Unicode entities in XML cause problems :-(
Next message: [Python-Dev] Unicode entities in XML cause problems :-(
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4