[Guido] > > And this is exactly why encodings will remain important: entities > > encoded in ISO-2022-JP have no compelling reason to be recoded > > permanently into ISO10646, and there are lots of forces that make it > > convenient to keep it encoded in ISO-2022-JP (like existing tools). [Paul] > You cannot recode an ISO-2022-JP document into ISO10646 because 10646 is > a character *set* and not an encoding. ISO-2022-JP says how you should > represent characters in terms of bits and bytes. ISO10646 defines a > mapping from integers to characters. OK. I really meant recoding in UTF-8 -- I maintain that there are lots of forces that prevent recoding most ISO-2022-JP documents in UTF-8. > They are both important, but separate. I think that this automagical > re-encoding conflates them. Who is proposing any automagical re-encoding? Are you sure you understand what we are arguing about? *I* am not even sure what we are arguing about. I am simply saying that 8-bit strings (literals or otherwise) in Python have always been able to contain encoded strings. Earlier, you quoted some reference documentation that defines 8-bit strings as containing characters. That's taken out of context -- this was written in a time when there was (for most people anyway) no difference between characters and bytes, and I really meant bytes. There's plenty of use of 8-bit Python strings for non-character uses so your "proof" that 8-bit strings should contain "characters" according to your definition is invalid. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4