M.-A. Lemburg wrote: >> (I suppose it's too late for 2.4, but it would probably be a good >> idea to switch to this algorithm in 2.5) > > Here's a reference that might be interesting for you: > > http://citeseer.ist.psu.edu/boldi02compact.html > > They use statistical approaches to dealing with the problem of > large alphabets. Their motivation is making Java's Unicode string > implementation faster... sounds familiar, eh :-) thanks for the reference. but I have to admit that I found the following paper by the same authors to be more interesting ... http://citeseer.ist.psu.edu/boldi03rethinking.html ... both because they've looked into efficient designs for mutable strings, and because of how they use a 32-bit "bloom filter" hashed by the least significant bits in the Unicode characters... oh well, there are never any new ideas ;-) </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4