> MAL wrotw: >> Bill wrote: >> u'\ud800' causes the interpreter to crash >> example: >> print u'\ud800' >> What happens: >> The code failes to compile because while adding the constant, the unicode_hash >> function is called which for some reason requires the UTF-8 string format. > The reasoning at the time was that dictionaries should accept > Unicode objects as keys which match their string equivalents > as the same key, e.g. 'abc' works just as well as u'abc'. > UTF-8 was the default encoding back then. I'm not sure how > to fix the hash value given the new strategy w/r to the > default encoding... > According to the docs, objects comparing equal should have the > same hash value, yet this would require the hash value to be > calculated using the default encoding and that > would not only cause huge performance problems, but could > effectively render Unicode useless, because not all default > encodings are lossless (ok, one could work around this by > falling back to some other way of calculating the hash > value in case the conversion fails). Yeah, yeah, yeah. I know all that, just never liked it. :) The current problem is that the UTF-8 can't round trip surrogate characters atm. This is easy to fix, so I'm doing a patch to fix this oversight, unless you beat me to it. Anything else is slightly more annoying to fix. Bill
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4