Guido van Rossum wrote: > > [Paul Prescod] > > I think that maybe an important point is getting lost here. I could be > > wrong, but it seems that all of this emphasis on encodings is misplaced. > > In practical applications that manipulate text, encodings creep up all > the time. I'm not saying that encodings are unimportant. I'm saying that that they are *different* than what Fredrik was talking about. He was talking about a coherent logical model for characters and character strings based on the conventions of more modern languages and systems than C and Python. > > How can we > > make the transition to a "binary goops are not strings" world easiest? > > I'm afraid that's a bigger issue than we can solve for Python 1.6. I understand that we can't fix the problem now. I just think that we shouldn't go out of our ways to make it worst. If we make byte-array strings "magically" cast themselves into character-strings, people will expect that behavior forever. > > It doesn't meet the definition of string used in the Unicode spec., nor > > in XML, nor in Java, nor at the W3C nor in most other up and coming > > specifications. > > OK, so that's a good indication of where you're coming from. Maybe > you should spend a little more time in the trenches and a little less > in standards bodies. Standards are good, but sometimes disconnected > from reality (remember ISO networking? :-). As far as I know, XML and Java are used a fair bit in the real world...even somewhat in Asia. In fact, there is a book titled "XML and Java" written by three Japanese men. > And this is exactly why encodings will remain important: entities > encoded in ISO-2022-JP have no compelling reason to be recoded > permanently into ISO10646, and there are lots of forces that make it > convenient to keep it encoded in ISO-2022-JP (like existing tools). You cannot recode an ISO-2022-JP document into ISO10646 because 10646 is a character *set* and not an encoding. ISO-2022-JP says how you should represent characters in terms of bits and bytes. ISO10646 defines a mapping from integers to characters. They are both important, but separate. I think that this automagical re-encoding conflates them. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4