> I looked at it a bit when Tcl 8.1 was in beta; it derives from > Henry Spencer's 1998-vintage code, which seems to try to do a lot of > optimization and analysis. It may even compile DFAs instead of NFAs > when possible, though it's hard for me to be sure. This might give it > a substantial speed advantage over engines that do less analysis, but > I haven't benchmarked it. The code is easy to read, but difficult to > understand because the theory underlying the analysis isn't explained > in the comments; one feels there should be an accompanying paper to > explain how everything works, and it's why I'm not sure if it really > is producing DFAs for some expressions. > > Tcl seems to represent everything as UTF-8 internally, so > there's only one regex engine; there's . Hmm... I looked when Tcl 8.1 was in alpha, and I *think* that at that point the regex engine was compiled twice, once for 8-bit chars and once for 16-bit chars. But this may have changed. I've noticed that Perl is taking the same position (everything is UTF-8 internally). On the other hand, Java distinguishes 16-bit chars from 8-bit bytes. Python is currently in the Java camp. This might be a good time to make sure that we're still convinced that this is the right thing to do! > The code is scattered over > more files: > > amarok generic>ls re*.[ch] > regc_color.c regc_locale.c regcustom.h regerrs.h regfree.c > regc_cvec.c regc_nfa.c rege_dfa.c regex.h regfronts.c > regc_lex.c regcomp.c regerror.c regexec.c regguts.h > amarok generic>wc -l re*.[ch] > 742 regc_color.c > 170 regc_cvec.c > 1010 regc_lex.c > 781 regc_locale.c > 1528 regc_nfa.c > 2124 regcomp.c > 85 regcustom.h > 627 rege_dfa.c > 82 regerror.c > 18 regerrs.h > 308 regex.h > 952 regexec.c > 25 regfree.c > 56 regfronts.c > 388 regguts.h > 8896 total > amarok generic> > > This would be an issue for using it with Python, since all > these files would wind up scattered around the Modules directory. For > comparison, pypcre.c is around 4700 lines of code. I'm sure that if it's good code, we'll find a way. Perhaps a more interesting question is whether it is Perl5 compatible. I contacted Henry Spencer at the time and he was willing to let us use his code. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4