Guido van Rossum wrote, about how to represent strings: > Paul, we're both just saying the same thing over and over without > convincing each other. I'll wait till someone who wasn't in this > debate before chimes in. I'm with Paul and Federick on this one - at least about characters being the atoms of a string. We **have** to be able to refer to **characters** in a string, and without guessing. Otherwise, how could you ever construct a test, like theString[3]==[a particular japanese ideograph]? If we do it by having a "string" datatype, which is really a byte list, and a "unicodeString" datatype which is a list of abstract characters, I'd say everyone could get used to working with them. We'd have to supply conversion functions, of course. This route might be the easiest to understand for users. We'd have to be very clear about what file.read() would return, for example, and all those similar read and write functions. And we'd have to work out how real 8-bit calls (like writing to a socket?) would play with the new types. For extra clarity, we could leave string the way it is, introduce stringU (unicode string) **and** string8 (Latin-1 or byte list, whichever seems to be the best equivalent to the current string). Then we would deprecate string in favor of string8. Then if tcl and perl go to unicode strings we pass them a stringU, and if they go some other way, we pass them something else. COme to think of it, we need some some data type that will continue to work with c and c++. Would that be string8 or would we keep string for that purpose? Clarity and ease of use for the user should be primary, fast implementations next. If we didn't care about ease of use and clarity, we could all use Scheme or c, don't use sight of it. I'd suggest we could create some use cases or scenarios for this area - needs input from those who know encodings and low level Python stuff better than I. Then we could examine more systematically how well various approaches would work out. Regards, Tom Passin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4