"Martin v. Loewis" wrote: > >... > > Why is that? An encoding, by nature, is something that produces a byte > sequence from some input. So you can only decode byte sequences, not > character strings. According to this logic, it is not logical to "encode" a Unicode string into a base64'd Unicode string or "decode" a Unicode string from a base64'd Unicode string. But I have seen circumstances where one XML document is base64'd into another. In that circumstance, it would be useful to say node.nodeValue.decode("base64"). Let me turn the argument around? What would the *harm* in having 8-bit strings and Unicode strings behave similarly in this manner? >... > Not at all. Byte strings and character strings are as different as are > byte strings and lists of DOM child nodes (i.e. the only common thing > is that they are sequences). 8-bit strings are not purely byte strings. They are also "character strings". That's why they have methods like "capitalize", "isalpha", "lower", "swapcase", "title" and so forth. DOM nodes and byte strings have virtually no methods in common. We could argue angels on the head of a pin until the cows come home but 90% of all Python users think of 8-bit strings as strings of characters. So arguments based on the idea that they are not "really" character strings are wishful thinking. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4