Some comments... On 4/9/2012 11:09 AM, antoine.pitrou wrote: > http://hg.python.org/cpython/rev/704630a9c5d5 > changeset: 76179:704630a9c5d5 > user: Antoine Pitrou<solipsis at pitrou.net> > date: Mon Apr 09 17:03:32 2012 +0200 > summary: > Issue #13165: stringbench is now available in the Tools/stringbench folder. ... > diff --git a/Tools/stringbench/stringbench.py b/Tools/stringbench/stringbench.py > new file mode 100755 > --- /dev/null > +++ b/Tools/stringbench/stringbench.py > @@ -0,0 +1,1483 @@ > + Did you mean to start with a blank line? > +# Various microbenchmarks comparing unicode and byte string performance > +# Please keep this file both 2.x and 3.x compatible! Which versions of 2.x? In particular > +dups = {} > + dups[f.__name__] = 1 Is the use of a dict for a set a holdover that could be updated, or intentional for back compatibility with 2.whatever and before? > +# Try with regex > + at uses_re > + at bench('s="ABC"*33; re.compile(s+"D").search((s+"D")*300+s+"E")', > + "late match, 100 characters", 100) > +def re_test_slow_match_100_characters(STR): > + m = STR("ABC"*33) > + d = STR("D") > + e = STR("E") > + s1 = (m+d)*300 + m+e > + s2 = m+e > + pat = re.compile(s2) > + search = pat.search > + for x in _RANGE_100: > + search(s1) If regex is added to stdlib as other than re replacement, we might want option to use that instead or in addition to the current re. > +#### Benchmark join > + > +def get_bytes_yielding_seq(STR, arg): > + if STR is BYTES and sys.version_info>= (3,): > + raise UnsupportedType > + return STR(arg) > + at bench('"A".join("")', > + "join empty string, with 1 character sep", 100) I am puzzled by this. Does str.join(iterable) internally branch on whether the iterable is a str or not, so that that these timings might be different from equivalent timings with list of strings? What might be interesting, especially for 3.3, is timing with non-ascii BMP and non-BMP chars both as joiner and joined. tjr
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4