Fredrik Lundh wrote: > > mal wrote: > > > Given the new 7-bit-ASCII-as-default-encoding-for-8-bit-strings > > > convention, shouldn't just hashing the character values work > > > fine? That is, hash('abc') should == hash(u'abc'), no conversion > > > required. > > > > Yes, and it does so already for pure ASCII values. The problem > > comes from the fact that the default encoding can be changed to > > a locale specific value (site.py does the lookup for you), e.g. > > given you have defined LANG to be us_en, Python will default > > to Latin-1 as default encoding. > > footnote: in practice, this is a Unix-only feature. > > I suggest adding code to the _locale module (or maybe sys is > better?) which can be used to dig up a suitable encoding for > non-Unix platforms. On Windows, the code page should be > "cp%d" % GetACP(). > > I'll look into this later today. Could you add code to the _locale module which interfaces to GetACP() on win32 ? locale.get_default could then make use of this API to figure out the encoding. Ideal would be another API for win32 which allows querying the active language (it would have to return an RFC 1766 language code or we could add aliasis to the locale_alias database). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4