John Machin wrote: > Hi Matthew, > > Your post in c.l.py about your re rewrite didn't mention where to report > bugs etc so I dug this address out of Google Groups ... > > Environment: Python 2.6.2, Windows XP SP3, your latest (29 July) regex > from the Python bugtracker. > > Problem is repeated calls of e.g. compiled_pattern.search(some_text) -- > Task Manager performance panel shows increasing memory usage with regex > but not with re. It appears to be cumulative i.e. changing to another > pattern or text doesn't release memory. > > Example: > > 8<-- regex_timer.py > import sys > import time > if sys.platform == 'win32': > timer = time.clock > else: > timer = time.time > module = __import__(sys.argv[1]) > count = int(sys.argv[2]) > pattern = sys.argv[3] > expected = sys.argv[4] > text = 80 * '~' + 'qwerty' > rx = module.compile(pattern) > t0 = timer() > for i in xrange(count): > assert rx.search(text).group(0) == expected > t1 = timer() > print "%d iterations in %.6f seconds" % (count, t1 - t0) > 8<--- > > Here are the results of running this (plus observed difference between > peak memory usage and base memory usage): > > dos-prompt>\python26\python regex_timer.py regex 1000000 "~" "~" > 1000000 iterations in 3.811500 seconds [60 Mb] > > dos-prompt>\python26\python regex_timer.py regex 2000000 "~" "~" > 2000000 iterations in 7.581335 seconds [128 Mb] > > dos-prompt>\python26\python regex_timer.py re 2000000 "~" "~" > 2000000 iterations in 2.549738 seconds [3 Mb] > > This happens on a variety of patterns: "w", "wert", "[a-z]+", "[a-z]+t", > ... > Thanks for that, John. I've should've kept an eye on the Task Manager! :-) Now fixed. It's surprising how much time and effort is needed just to manage the memory!
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4