Tim Peters <tim.one@home.com>: > If you throw almost everything out of Unix diff, that's what you'll be left > with. Offhand I don't know of enencumbered, industrial-strength C source; a > problem is that writing a program to compute this is a std homework exercise > (it's a common first "dynamic programming" example), so you can find tons of > bad C source. I found some formal descriptions of the algorithm and some unencumbered Oberon source. I'm coding up C now. It's not complicated if you're willing to hold the cost matrix in memory, which is reasonable for a string comparator in a way it wouldn't be for a file diff. > Caution: many people want small variations of "edit distance", usually via > assigning different weights to insertions, replacements and deletions. A > less common but still popular variant is to say that a transposition ("xy" > vs "yx") is less costly than a delete plus an insert. Etc. "edit distance" > is really a family of algorithms. Which about collapse into one if your function has three weight arguments for insert/replace/delete weights, as mine does. It don't get more general than that -- I can see that by looking at the formal description. OK, so I'll give you that I don't weight transpositions separately, but neither does any other variant I found on the web nor the formal descriptions. A fourth optional weight agument someday, maybe :-). > God forbid that core Python may lose the commercial OCR developer market > <wink>. It's not accepted that for every field F, core Python needs to > supply the algorithms F uses heavily. That's not my point -- I don't see OCR as a big Python market either. My point in observing that OCR uses Ratcliff/Obershelp heavily was simplty to show that it's a well-established algorithm, not `controversial'. > Heck, core Python doesn't even ship > with an FFT! Doesn't bother the folks working in signal processing. It probably won't surprise you that I considered writing an FFT extension module at one point :-). > > Tim, this isn't true. Any time you need to validate user input > > against a controlled vocabulary and give feedback on probable right > > choices, > > Which is something few apps need anyway I fundamentally disagree. Few application designers *know* they need it, but user interfaces would get a hell of a lot better if the technique were more commonly applied -- and that's why I want it in the Python library, so doing the right thing in Python will be a minimum-effort proposition. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> What if you were an idiot, and what if you were a member of Congress? But I repeat myself. -- Mark Twain
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4