Fredrik Lundh wrote: > > mal wrote: > > > * change hash value calculation to work on the Py_UNICODE data > > instead of creating a default encoded cached object (what > > now is .utf8str) > > it this what you had in mind? > > static long > unicode_hash(PyUnicodeObject *self) > { > register int len; > register Py_UNICODE *p; > register long x; > > if (self->hash != -1) > return self->hash; > len = PyUnicode_GET_SIZE(self); > p = PyUnicode_AS_UNICODE(self); > x = *p << 7; > while (--len >= 0) > x = (1000003*x) ^ *p++; > x ^= a->ob_size; > if (x == -1) > x = -2; > self->hash = x; > return x; > } > > </F> Well, sort of. It should be done in such a way that Unicode strings which only use the lower byte produce the same hash value as normal 8-bit strings -- is this the case for the above code ? My first idea was to apply a kind of two pass scan which first only uses the lower byte and then the higher byte to calculate a hash value. Both passes would use the same algorithm as the one for 8-bit strings. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4