> str.encode() > str.decode() > uni.encode() > #uni.decode() # still missing It's not missing. str.decode and uni.encode go through a single codec; that's easy. str.encode is somewhat more confusing, because it really is unicode(str).encode. Now, you are not proposing that uni.decode is str(uni).decode, are you? If not that, what else would it mean? And if it means something else, it is clearly not symmetric to str.encode, so it is not "missing". > One very useful application for this method is XML unescaping > which turns numeric XML entities into Unicode chars. Ok. Please show me how that would work. More precisely, please write a PEP describing the rationale for this feature, including use case examples and precise semantics of the proposed addition. > The key argument for these interfaces is that they provide > an extensible transformation mechanism for string and binary > data. That is too general for me to understand; I need to see detailed examples that solve real-world problems. Regards, Martin P.S. I don't think that unescaping XML characters entities into Unicode characters is a useful application in itself. This is normally done by the XML parser, which not only has to deal with character entities, but also with general entities and a lot of other markup. Very few people write XML parsers, and they are using the string methods and the sre module successfully (if the parser is written in Python - a C parser would do the unescaping before even passing the text to Python).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4