Tim Peters <tim.one@home.com>: > All agreed, and it should be a straightforward task then. I'm assuming it > will work with Unicode strings too <wink>. Thought about that. Want to get it working for 8 bits first. > Guido will depart from you at a different point. I depart here: it's not > "the right thing". It's a bunch of hacks that appeal not because they solve > a problem, but because they're cute algorithms that are pretty easy to > implement and kinda solve part of a problem. Again, my experience says differently. I have actually *used* Ratcliff-Obershelp to implement Do What I Mean (actually, Tell Me What I Mean) -- and had it work very well for non-geek users. That's why I want other Python programmers to have easy access to the capability. > Working six years in commercial speech recog really hammered that home to > me: 95% solutions are on the margin of unsellable, because an error one try > in 20 is intolerable for real people. Developers writing for developers get > "whoa! cool!" where my sisters walk away going "what good is that?". Edit > distance doesn't get within screaming range of 95% in real life. I suspect your speech recognition experience has given you an unhelpful bias. For English, what you say is certainly true -- but that's a gross worst-case application of R/O and Levenshtein that I'm not interested in pursuing. Nor do I expect Python hackers to use my module for that. Where techniques like Ratcliff-Obershelp really shine (and what I expect the module to be used for) is with controlled vocabularies such as command interfaces. These tend to have better orthogonality than NL, so antinoise filtering by R/O or Levenshtein distance (a kindred technique I somehow didn't learn until today -- there are disadvantages to being an autodidact) can really go to town on them. (Actually, my gut after thinking about both algorithms hard is that R/O is still a better technique than Levenshtein for the kind of application I have in mind. But I also suspect the difference is marginal.) (Other good uses for algorithms in this class include cladistics and genomic analysis.) > Even for most developers, it would be better to package up the single best > approach you've got (f(list, word) -> list of possible matches sorted in > confidence order), instead of a module with 6 (or so) functions they don't > understand and a pile of equally mysterious knobs. That's why good documentation, with motivating usage hints, is important. I write good documentation, Tim. > PATTERN RECOGNITION OF STRINGS WITH SUBSTITUTIONS, INSERTIONS, > DELETIONS AND GENERALIZED TRANSPOSITIONS > B. J. Oommen and R. K. S. Loke > http://www.scs.carleton.ca/~oommen/papers/GnTrnsJ2.PDF Thanks for the pointer; I've downloaded it and will read it. If the description of Ooomen's algorithm is good enough, I'll implement it and add it to the module. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> Power concedes nothing without a demand. It never did, and it never will. Find out just what people will submit to, and you have found out the exact amount of injustice and wrong which will be imposed upon them; and these will continue until they are resisted with either words or blows, or with both. The limits of tyrants are prescribed by the endurance of those whom they oppress. -- Frederick Douglass, August 4, 1857
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4