RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://mail.python.org/pipermail/python-dev/2000-August/007855.html below:

[Python-Dev] SRE 0.9.8 benchmarks

[Python-Dev] SRE 0.9.8 benchmarksM.-A. Lemburg mal@lemburg.com
Thu, 03 Aug 2000 15:31:34 +0200

Previous message: [Python-Dev] SRE 0.9.8 benchmarks
Next message: [Python-Dev] Fork on Win32 - was (test_fork1 failing...)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > Just for compares: would you mind running the search
> > routines in mxTextTools on the same machine ?
> 
> > > searching for "spam" in a string padded with "spaz" (1000 bytes on
> > > each side of the target):
> > >
> > > string.find     0.112 ms
> 
> texttools.find    0.080 ms
> 
> > > sre8.search     0.059
> > > pre.search      0.122
> > >
> > > unicode.find    0.130
> > > sre16.search    0.065
> > >
> > > same test, without any false matches (padded with "-"):
> > >
> > > string.find     0.035 ms
> 
> texttools.find    0.083 ms
> 
> > > sre8.search     0.050
> > > pre.search      0.116
> > >
> > > unicode.find    0.031
> > > sre16.search    0.055
> >
> > Those results are probably due to the fact that string.find
> > does a brute force search. If it would do a last match char
> > first search or even Boyer-Moore (this only pays off for long
> > search targets) then it should be a lot faster than [s|p]re.
> 
> does the TextTools algorithm work with arbitrary character
> set sizes, btw?

The find function creates a Boyer-Moore search object
for the search string (on every call). It compares 1-1
or using a translation table which is applied
to the searched text prior to comparing it to the search
string (this enables things like case insensitive
search and character sets, but is about 45% slower). Real-life
usage would be to create the search objects once per process
and then reuse them. The Boyer-Moore table calcuation takes
some time...

But to answer your question: mxTextTools is 8-bit throughout.
A Unicode aware version will follow by the end of this year.

Thanks for checking,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/

Previous message: [Python-Dev] SRE 0.9.8 benchmarks
Next message: [Python-Dev] Fork on Win32 - was (test_fork1 failing...)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4