There currently is a problem with the Unicode objects which I'd like to resolve: Since Unicode object are comparable to strings, they should have the same hash value as their string correspondents (the 8-bit strings which compare equal -- this can depend on the default encoding which again depends on the locale setting). Previously, Unicode used UTF-8 as basis for calculating the hash value (the Unicode object created a UTF-8 string object and delegated the hash value calculation to it, caching the result and the string for future use). Since I would like to make the internal encoding cache use the default encoding instead, I have two problems to solve: 1. It is sometimes not possible to encode the Unicode value using the default encoding. A different strategy for calculating the hash value would have to be used. 2. In some locales 'äöü' == u'äöü' is true, while in others this is not the case. If they do compare equal, the hash values must match. How serious is the need for objects which compare equal to have the same hash value ? (I would much prefer to calculate the hash value using the internal UTF-16 buffer rather than first creating an encoded string.) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4