Fredrik Lundh wrote: > M.-A. Lemburg wrote: > > >>You mean: a compressed shift table for Unicode patterns ? >>I'll have a look. > > > It's a lossy compression: the entire delta1 table is represented as > two 32-bit values, independent of the size of the source alphabet. > Works amazingly well, at least when combined with the BM-variant > it was designed for... > > (I suppose it's too late for 2.4, but it would probably be a good > idea to switch to this algorithm in 2.5) Here's a reference that might be interesting for you: http://citeseer.ist.psu.edu/boldi02compact.html They use statistical approaches to dealing with the problem of large alphabets. Their motivation is making Java's Unicode string implementation faster... sounds familiar, eh :-) Their motivation was based on work done for the "Managing Gigabytes" project: http://www.cs.mu.oz.au/mg/ and http://www.mds.rmit.edu.au/mg/ Too bad their code is GPLed, but I suppose getting some ideas is OK ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 13 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4