> > Rewritten using the tokenize module, which gives us a real tokenizer > > rather than a number of approximating regular expressions. > > Alas, it is 3-4 times slower. Let that be a challenge for the > > tokenize module. > > Was this just for purity, or did it fix a bug? The regexps there > were close to being heroically careful, and even so it was somtimes > uncomfortably slow using the class browser in IDLE (based on > pyclbr), and even on a fast machine. A factor of 3 or 4 might make > that unbearable. > > If it was for purity, note that tokenize is also based on mounds of > regexp tricks <wink>. It was for purity, with an eye towards future improvements (I want to teach it more about packages and import-aliasing). While tokenize uses regexp tricks, they are much closer to 100% correct than those in pyclbr. E.g. the pyclbr regexps don't cope with continuation backslashes (which often occur in long import statements), or comments or expressions inside the list of superclasses. It also didn't cope well with 'import M as N' which is showing up more and more frequently. I think there are still bugs in that area, but they will be much simpler to fix now. I was going to use this as an excuse to learn how to use the hotshot profiler to find out if there are any bottlenecks in the tokenize module. pyclbr.readmodule_ex('Tkinter') takes under 1.2 seconds on my home machine now. I find that acceptable (it's a lot quicker than IDLE takes to colorize Tkinter.py :-). --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4