Guido van Rossum wrote: > > <PEP: 261> > > > > The problem I have with this PEP is that it is a compile time option > > which makes it hard to work with both 32 bit and 16 bit strings in one > > program. Can not the 32 bit string type be introduced as an additional type? > > Not without an outrageous amount of additional coding (every place in > the code that currently uses PyUnicode_Check() would have to be > bifurcated in a 16-bit and a 32-bit variant). Alternatively, a Unicode object could *internally* be either 8, 16 or 32 bits wide (to be clear: not per character, but per string). Also a lot of work, but it'll be a lot less wasteful. > I doubt that the desire to work with both 16- and 32-bit characters in > one program is typical for folks using Unicode -- that's mostly > limited to folks writing conversion tools. Python will offer the > necessary codecs so you shouldn't have this need very often. Not a lot of people will want to work with 16 or 32 bit chars directly, but I think a less wasteful solution to the surrogate pair problem *will* be desired by people. Why use 32 bits for all strings in a program when only a tiny percentage actually *needs* more than 16? (Or even 8...) > > Iteration through the code units in a string is a problem waiting to bite > > you and string APIs should encourage behaviour which is correct when faced > > with variable width characters, both DBCS and UTF style. > > But this is not the Unicode philosophy. All the variable-length > character manipulation is supposed to be taken care of by the codecs, > and then the application can deal in arrays of characteres. Right: this is the way it should be. My difficulty with PEP 261 is that I'm afraid few people will actually enable 32-bit support (*what*?! all unicode strings become 32 bits wide? no way!), therefore making programs non-portable in very subtle ways. Just
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4