RetroSearch Browse

Wed Nov 24 01:22:23 CET 2010 · http://mail.python.org/pipermail/python-dev/2010-November/105940.html

On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote:
> Maybe Python should have used UTF-8 as its internal unicode
> representation. Then people who were foolish enough to assume
> one character per string item would have their programs break
> rather soon under only light unicode testing. :-)

You put a smiley, but, in all seriousness, I think that's actually the right thing to do if anyone writes a new programming language. It is clearly the right thing if you don't have to be concerned with backwards-compatibility: nobody really needs to be able to access the Nth codepoint in a string in constant time, so there's not really any point in storing a vector of codepoints.

Instead, provide bidirectional iterators which can traverse the string by byte, codepoint, or by grapheme (that is: the set of combining characters + base character that go together, making up one thing which a human would think of as a character).

James

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://mail.python.org/pipermail/python-dev/2010-November/105940.html below:

[Python-Dev] len(chr(i)) = 2?