On Wed, Jun 28, 2000 at 10:53:16AM -0700, Paul Prescod wrote: >Note that the document doesn't yet cover the regular expression engine >or the "PerlInterpreter". The regex engine's pretty hard to read, mostly because comments are infrequent and not very helpful, and disentangling it from the rest of Perl would require a skilled wizard. (PCRE, if slower, is at least much clearer and easier to understand, though the compile() function is pretty ugly.) A while ago I saw a p5p post from Ilya Zakharevich who did most of the recent regex hacking; he draw attention to one flag variable in the code and said basically "I don't know what this flag means; I think it's some sort of UTF-8 setting, but Larry didn't explain it." >I can't think of a disclaimer that doesn't sound like it is tongue in >cheek but I do feel bad about beating up on a design which, in its own >way, has a certain kind of quality (just not one I happen to prefer). Agreed; it could be made much simpler, but maybe at a performance cost. (Though performance is tricky, and maybe the extra work costs more than it saves.) For example, note the flag bits in SvNULL, which have values like GMAGICAL. You could imagine a Python implementation that added flag bits to every object, and set a bit if there was a __getattr__ method defined; code could then do 'if (obj->flags & GMAGICAL) ...' instead of the more complicated 'if (PyObject_HasAttrString(obj, "__getattr__")'. It would be interesting to know if Topaz, Chip Salzenberg's experimental C++ implementation, preserves this complexity or aims to cut it away. The use of several levels of C structs is also reminiscent of the way you do OO in C, as in X toolkits. You can also see the importance of text processing in the SvPVBM type, for attaching a Boyer-Moore related table to a string and speeding up regex searches. --amk
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4