Eric Abrahamsen <eric@ericabrahamsen.net> writes: >> Is there any progress merging peg.el to Emacs? >> I do not see any obvious blockers in the discussion, but the merge never >> happened? > > I will say that I tried to use PEG to resolve some gruesome text-parsing > issues in EBDB very recently, and failed to make it work in the hour or > two I'd allotted to the problem. The file-comment docs are pretty good, > but I think they would need to be expanded in a few crucial ways, > particularly to help those who don't necessarily know how PEGs work. > > Specifically, it is not obvious (to me) the ways in which PEGs (or maybe > just peg.el) are not fully declarative. It doesn't backtrack, and I > suspect it won't ever backtrack or isn't even supposed to, which means > users should be made explicitly aware of the ways in which their rules > can fail, and the ways in which declaration order matter. The comment > for the `or' construct reads: > > Prioritized Choice > > And that's about the only hint you get. As the comment in peg.el states, the definitions are adapted from the original PEG paper. There is even a link to paper and also to presentation explaining how peg works. I strongly advice you to read that. Prioritized Choice is explained there. > I was trying to parse a > multiword name like > > Eric Edwin Abrahamsen > > into the structure > > (("Eric" "Edwin") "Abrahamsen") > > using rules like > > (plain-name (substring (+ [word])) (* [space])) > (full-name (list (+ plain-name) plain-name) > `(names -- (list (butlast names) (car (last names))))) > > Which always fails to match because (+ plain-name) is greedy and eats up > all the words. It doesn't ever try leaving out the last word in an > attempt to make the rule match. One way is (with-peg-rules ((name (substring (+ [word])) (* [blank])) (given-name name (not (eol))) (last-name name (and (eol))) (full-name (list (+ given-name) last-name) `(names -- (list (butlast names) (car (last names)))))) (peg-run (peg full-name))) A simple-minded non-greedy version would be ambiguous. You must necessarily indicate end of input. A more appropriate non-ambiguous non-greedy statement would involve or (which you admittedly did not understand): (with-peg-rules ((name (substring (+ [word])) (* [blank])) (given-name name) (last-name name (and (eol))) (full-name (list (+ (or last-name given-name)) (and (eol))) `(names -- (list (butlast names) (car (last names)))))) ;;;;;;;;;;;;;;;;;;;;;^^ (peg-run (peg full-name))) > I'm happy to write the docs (should it have its own info manual > section?), if we really think there are no other necessary > fixes/improvements. I find PEG to be a nice addition when regexps do not cut the necessary parsing, while using Bovine or tree-sitter is an overkill. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4