> Lexers are painful in Python. They hit the language in a weak spot > created by the immutability of strings. I've found this an obstacle > more than once, but then I'm a battle-scarred old compiler jock who > attacks *everything* with lexers and parsers. I think you're exaggerating the problem, or at least underestimating the re module. The re module is pretty fast! Reading a file line-by-line is very fast in Python 2.3 with the new "for line in open(filename)" idiom. I just scanned nearly a megabyte of ugly data (a Linux kernel) in 0.6 seconds using the regex '\w+', finding 177,000 words. The regex (?:\d+|[a-zA-Z_]+) took 1 second, yielding 1 second, finding 190,000 words. I expect that the list creation (one hit at a time) took more time than the matching. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4