Very quick (swamped): > I think you've just made an argument for replacing your > SequenceMatcher with simil.ratcliff. Actually, I'm certain they're the same algorithm now, except the C is showing through in ratcliff to the floating-point eye <wink>. For demonstration, I *always* printed the top three scorers (that's logic in the little driver I posted, not in SequenceMatcher), without any notion of cutoff (ndiff does use a cutoff). Add this line before the return (in the posted driver) to see the actual scores: print scores[:numchoices] For example: Module name? browser [(0.82352941176470584, 'webbrowser'), (0.55555555555555558, 'robotparser'), (0.54545454545454541, 'user')] Hmm. My best guesses are webbrowser, robotparser, user Module name? On this example you reported: >>> simil.ratcliff("browser", "webbrowser") 0.82352942228317261 >>> simil.ratcliff("browser", "robotparser") 0.55555558204650879 >>> simil.ratcliff("browser", "user") 0.54545456171035767 which strongly suggests you're using C floats instead of Python floats to compute the final score. I didn't try every example in your email, but it's the same story on the three I did try (scores identical modulo simil.ratcliff dropping about 30 of the low-order result bits -- which is about the difference between a C double and a C float on most boxes). > Mine's even documented. :-). Which I appreciate! I dreamt up the SequenceMatcher algorithm going on 20 years ago for a friendly diff generator, and never even considered using it for other purposes. But then I may have mentioned that these other purposes never come up in my apps <wink>. or-at-least-they-haven't-in-contexts-where-r/o-would-have-been- strong-enough-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4