[GvR, on string.encoding ] >Marc-Andre took this idea a bit further, but I think it's not >practical given the current implementation: there are too many places >where the C code would have to be changed in order to propagate the >string encoding information, I may miss something, but the encoding attr just travels with the string object, no? Like I said in my reply to MAL, I think it's undesirable to do *anything* with the encoding attr if not in combination with a unicode string. >and there are too many sources of strings >with unknown encodings to make it very useful. That's why the default encoding must be settable as well, as Fredrik suggested. >Plus, it would slow down 8-bit string ops. Not if you ignore it most of the time, and just pass it along when concatenating. >I have a better idea: rather than carrying around 8-bit strings with >an encoding, use Unicode literals in your source code. Explain that to newbies... I guess is that they will want simple 8 bit strings in their native encoding. Dunno. >If the source >encoding is known, these will be converted using the appropriate >codec. > >If you object to having to write u"..." all the time, we could say >that "..." is a Unicode literal if it contains any characters with the >top bit on (of course the source file encoding would be used just like >for u"..."). Only if "\377" would still yield an 8-bit string, for binary goop... Just
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4