Hi, Martin v. Loewis: > The proper fix, IMO, is to have writexml accept an encoding argument, > and, by default, write the output as UTF-8. Then there is no need for > character or entity references. > The encoding should probably default to the one from the document header (UTF-8 if that isn't given). > In any case, emitting ü and € in XML is wrong: you cannot > use them unless your document type provides them - you should not > assume that all XML files use the HTML DTD. > Good point. On the other hand, I didn't plan to do that anyway. ;-) (Are Ӓ and friends OK with any DTD?) > Please see http://python.org/sf/432401. Walter is working on such a > codec. > Thank you. For XML escaping, the approach suggested by this patch would be to use xmlcharrefreplace() (see the test script) as the error handler. But that doesn't help with &<>". Personally, I rather dislike having to do a separate replace() for these. One approach would be to use character maps which have strategic holes where & < > and possibly " live..? -- Matthias Urlichs | noris network AG | http://smurf.noris.de/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4