Guido van Rossum wrote: > > > Umm... maybe I missed something, but I thought there was pretty broad > > feelings *against* having a global like this. This kind of thing is just > > nasty. > > > > 1) Python modules can't change it, nor can they rely on it being a > > particular value > > 2) a mutable, global variable is just plain wrong. The InterpreterState > > and ThreadState structures were created *specifically* to avoid adding > > crap variables like this. > > 3) allowing a default other than utf-8 is sure to cause gotchas and > > surprises. Some code is going to rightly assume that the default is > > just that, but be horribly broken when an application changes it. Hmm, the patch notice says it all I guess: This patch fixes a few bugglets and adds an experimental feature which allows setting the string encoding assumed by the Unicode implementation at run-time. The current implementation uses a process global for the string encoding. This should subsequently be changed to a thread state variable, so that the setting can be done on a per thread basis. Note that only the coercions from strings to Unicode are affected by the encoding parameter. The "s" parser marker still returns UTF-8. (str(unicode) also returns the string encoding -- unlike what I wrote in the original patch notice.) The main intent of this patch is to provide a test bed for the ongoing Unicode debate, e.g. to have the implementation use 'latin-1' as default string encoding, put import sys sys.set_string_encoding('latin-1') in you site.py file. > > Somebody please say this is hugely experimental. And then say why it isn't > > just a private patch, rather than sitting in CVS. > > Watch your language. > > Marc did this at my request. It is my intention that the encoding be > hardcoded at compile time. But while there's a discussion going about > what the hardcoded encoding should *be*, it would seem handy to have a > quick way to experiment. Right and that's what the intent was behind adding a global and some APIs to change it first... there are a few ways this could one day get finalized: 1. hardcode the encoding (UTF-8 was previously hard-coded) 2. make the encoding a compile time option 3. make the encoding a per-process option 4. make the encoding a per-thread option 5. make the encoding a per-process setting which is deduced from env. vars such as LC_ALL, LC_CTYPE, LANG or system APIs which can be used to get at the currently active local encoding Note that I have named the APIs sys.get/set_string_encoding()... I've done that on purpose, because I have a feeling that changing the conversion from Unicode to strings from UTF-8 to an encoding not capable of representing all Unicode characters won't get us very far. Also, changing this is rather tricky due to the way the buffer API works. The other way around needs some experimenting though and this is what the patch implements: it allows you to change the string encoding assumption to test various possibilities, e.g. ascii, latin-1, unicode-escape, <your favourite local encoding> etc. without having to recompile the interpreter every time. Have fun with it :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4