"Matthias Urlichs" <smurf@noris.de> writes: > >>> import xml.dom.minidom as md > >>> d=md.parseString("<foo>bߐ</foo>")) > >>> d.writexml(sys.stdout) > ... > UnicodeError: ASCII encoding error: ordinal not in range(128) [...] > Thus, my proposal (which I'm going to implement since I need it...) is to > write such a codec. For simplicity, I propose to accept ü and € > and friends, but to emit them as Ӓ (or whatever). The proper fix, IMO, is to have writexml accept an encoding argument, and, by default, write the output as UTF-8. Then there is no need for character or entity references. In any case, emitting ü and € in XML is wrong: you cannot use them unless your document type provides them - you should not assume that all XML files use the HTML DTD. > After this codec is written, all occurrences of string.replace('&','&') > (and vice versa) within the standard library can be replaced with the > appropriate encode/decode methods. Please see http://python.org/sf/432401. Walter is working on such a codec. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4