Search Toolkit Book for CDictionaryUtil
Standard dictionary utility functions. More...
#include <util/dictionary_util.hpp>
Standard dictionary utility functions.
Definition at line 44 of file dictionary_util.hpp.
◆ anonymous enum ◆ EDistanceMethodReturn the Levenshtein edit distance between two words.
Two possible methods of computation are supported - an exact method with quadratic complexity and a method suitable for similar words with a near-linear complexity. The similar algorithm is suitable for almost all words we would encounter; it will render inaccuracies if the number of consecutive differences is greater than three.
Enumerator eEditDistance_ExactThis method performs an exhausive search, and has an algorithmic complexity of O(n x m), where n = length of str1 and m = length of str2.
eEditDistance_SimilarThis method performs a simpler search, looking for the distance between similar words.
Words with more than two consecutively different characters will be scored incorrectly.
Definition at line 110 of file dictionary_util.hpp.
◆ GetEditDistance() ◆ GetMetaphone()Compute the Metaphone key for a given word Metaphone is a more advanced algorithm than Soundex; instead of matching simple letters, Metaphone matches diphthongs.
The rules are complex, and try to match how languages are pronounced. The implementation here borrows some options from Double Metaphone; the modifications from the traditional Metaphone algorithm include:
Definition at line 47 of file dictionary_util.cpp.
References _ASSERT, CTempString::find(), in(), ITERATE, out(), and tolower().
Referenced by CSimpleDictionary::AddWord(), CStringMatching::CStringMatching(), CStringMatching::MatchString(), CSimpleDictionary::Read(), Score(), and CSimpleDictionary::SuggestAlternates().
◆ GetSoundex()Compute the Soundex key for a given word The Soundex key is defined as:
The final step is non-standard; the usual pad is ' '
Definition at line 332 of file dictionary_util.cpp.
References in(), int, ITERATE, out(), string, and toupper().
◆ Score() [1/2] ◆ Score() [2/2] ◆ Stem()Compute the Porter stem for a given word.
Porter's stemming algorithm is one of many automated stemming algorithms; unlike most, Porter's stemming algorithm is a widely accepted standard algorithm for generating word stems.
A description of the algorithm is available at
http://www.tartarus.org/~martin/PorterStemmer/def.txt
The essence of the algorithm is to repeatedly strip likely word suffixes such as -ed, -es, -s, -ess, -ness, -ability, -ly, and so forth, leaving a residue of a word that can be compared with other stem sources. The goal is to permit comparison of socuh words as:
compare comparable comparability comparably
since they all contain approximately the same meaning.
This algorithm assumes that word case has already been adjusted to lower case.
Definition at line 804 of file dictionary_util.cpp.
References eConsonant, eVowel, NULL, s_EndsWith(), s_FindFirstVowel(), s_GetCharType(), s_MeasureWord(), s_ReplaceEnding(), s_TruncateEnding(), and str().
Referenced by CTextUtil::GetStemFrequencies(), and CTextUtil::GetWordFrequencies().
The documentation for this class was generated from the following files:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4