On Thu, 4 May 2000 22:22:38 +0100, Just van Rossum <just@letterror.com> wrote: >(Boy, is it quiet here all of a sudden ;-) > >Sorry for the duplication of stuff, but I'd like to reiterate my points,= to >separate them from my implementation proposal, as that's just what it = is: >an implementation detail. > >These things are important to me: >- get rid of the Unicode-ness of wide strings, in order to >- make narrow and wide strings as similar as possible >- implicit conversion between narrow and wide strings should > happen purely on the basis of the character codes; no > assumption at all should be made about the encoding, ie. > what the character code _means_. >- downcasting from wide to narrow may raise OverflowError if > there are characters in the wide string that are > 255 >- str(s) should always return s if s is a string, whether narrow > or wide >- file objects need to be responsible for handling wide strings >- the above two points should make it possible for >- if no encoding is known, Unicode is the default, whether > narrow or wide > >The above points seem to have the following consequences: >- the 'u' in \uXXXX notation no longer makes much sense, > since it is not neccesary for the character to be a Unicode > code point: it's just a 2-byte int. \wXXXX might be an option. >- the u"" notation is no longer neccesary: if a string literal > contains a character > 255 the string should automatically > become a wide string. >- narrow strings should also have an encode() method. >- the builtin unicode() function might be redundant if: > - it is possible to specify a source encoding. I'm not sure if > this is best done through an extra argument for encode() > or that it should be a new method, eg. transcode(). > - s.encode() or s.transcode() are allowed to output a wide > string, as in aNarrowString.encode("UCS-2") and > s.transcode("Mac-Roman", "UCS-2"). One other pleasant consequence: - String comparisons work character-by character, even if the representation of those characters have different widths. >My proposal to extend the "old" string type to be able to contain wide >strings is of course largely unrelated to all this. Yet it may provide = some >additional C compatibility (especially now that silent conversion to = utf-8 >is out) as well as a workaround for the >str()-having-to-return-a-narrow-string bottleneck. Toby Dickenson tdickenson@geminidataloggers.com
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4