The recent discussion about repr() et al. brought up the idea of a locale based string encoding again. A support module for querying the encoding used in the current locale together with the experimental hook to set the string encoding could yield a compromise which satisfies ASCII, Latin-1 and UTF-8 proponents. The idea is to use the site.py module to customize the interpreter from within Python (rather than making the encoding a compile time option). This is easily doable using the (yet to be written) support module and the sys.setstringencoding() hook. The default encoding would be 'ascii' and could then be changed to whatever the user or administrator wants it to be on a per site basis. Furthermore, the encoding should be settable on a per thread basis inside the interpreter (Python threads do not seem to inherit any per-thread globals, so the encoding would have to be set for all new threads). E.g. a site.py module could look like this: """ import locale,sys # Get encoding, defaulting to 'ascii' in case it cannot be # determined defenc = locale.get_encoding('ascii') # Set main thread's string encoding sys.setstringencoding(defenc) This would result in the Unicode implementation to assume defenc as encoding of strings. """ Minor nit: due to the implementation, the C parser markers "s" and "t" and the hash() value calculation will still need to work with a fixed encoding which still is UTF-8. C APIs which want to support Unicode should be fixed to use "es" or query the object directly and then apply proper, possibly OS dependent conversion. Before starting off into implementing the above, I'd like to hear some comments... Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4