Fredrik Lundh wrote: > M.-A. Lemburg wrote: > > >>>(google for "stringlib" for some work I'm doing in this area) >> >>Ah, now I know where you're coming from :-) Shift tables >>don't work well in the Unicode world with its large alphabet. > > since most real-life text use characters from only a small number of regions > in that alphabet, compressed shift tables work extremely well (the algorithm > on the stringlib page shows one way to do that, in constant space and O(m) > time). You mean: a compressed shift table for Unicode patterns ? I'll have a look. >>BTW, you might want to look at the BMS implementation I did >>for mxTextTools. > > > did you ever get around to add Unicode support to mxTextTools ? Yes in egenix-mx-base 2.1.0. It's not yet released, but Google will find the most recent snapshot :-) The package has been available as beta for more than a year now; just haven't found time to cut a release. The search functions from 2.0 were replaced with search objects that can deal with both 8-strings and Unicode. However, the Unicode search implementation uses a rather naive approach due to the shift table problem (and my lack of time). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 11 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4