On Sep 14, 2004, at 2:54 AM, Terry Reedy wrote: > This is why I am not especially enamored of Unicode and the prospect of > Python becoming married to it. It is heavily weighted in favor of > efficiently representing Chinese and inefficiently representing > English. > To give English equivalent treatment, the 20,000 or so most common > words, > roots, prefixes, and suffixes would each get its own codepoint. Of course it is perfectly possible to have the Python unicode implementation choose to represent some unicode strings with only 8 bits per character. There is no (conceptual) reason it could not represent (u'a' * 8) with 8 bytes + class header overhead. That is simply an implementation detail and really has nothing to do with Unicode itself. It would also be possible to use UTF-8 string storage, although this has the tradeoff that indexing an element takes linear time w.r.t. position instead of constant time. James
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4