A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/classCDictionaryUtil.html below:

NCBI C++ ToolKit: CDictionaryUtil Class Reference

Search Toolkit Book for CDictionaryUtil

Standard dictionary utility functions. More...

#include <util/dictionary_util.hpp>

static void  GetSoundex (const string &in, string *out, size_t max_chars=eMaxSoundex, char pad_char='0')   Compute the Soundex key for a given word The Soundex key is defined as: More...
  static void  GetMetaphone (const string &in, string *out, size_t max_chars=eMaxMetaphone)   Compute the Metaphone key for a given word Metaphone is a more advanced algorithm than Soundex; instead of matching simple letters, Metaphone matches diphthongs. More...
  static void  Stem (const string &in_str, string *out_str)   Compute the Porter stem for a given word. More...
  static size_t  GetEditDistance (const string &str1, const string &str2, EDistanceMethod method=eEditDistance_Exact)   static int  Score (const string &word1, const string &word2, size_t max_metaphone=eMaxMetaphone)   Compute a nearness score for two different words or phrases. More...
  static int  Score (const string &word1, const string &meta1, const string &word2, const string &meta2, EDistanceMethod method=eEditDistance_Similar)   Compute a nearness score based on metaphone as well as raw distance. More...
 

Standard dictionary utility functions.

Definition at line 44 of file dictionary_util.hpp.

◆ anonymous enum ◆ EDistanceMethod

Return the Levenshtein edit distance between two words.

Two possible methods of computation are supported - an exact method with quadratic complexity and a method suitable for similar words with a near-linear complexity. The similar algorithm is suitable for almost all words we would encounter; it will render inaccuracies if the number of consecutive differences is greater than three.

Enumerator eEditDistance_Exact 

This method performs an exhausive search, and has an algorithmic complexity of O(n x m), where n = length of str1 and m = length of str2.

eEditDistance_Similar 

This method performs a simpler search, looking for the distance between similar words.

Words with more than two consecutively different characters will be scored incorrectly.

Definition at line 110 of file dictionary_util.hpp.

◆ GetEditDistance() ◆ GetMetaphone()

Compute the Metaphone key for a given word Metaphone is a more advanced algorithm than Soundex; instead of matching simple letters, Metaphone matches diphthongs.

The rules are complex, and try to match how languages are pronounced. The implementation here borrows some options from Double Metaphone; the modifications from the traditional Metaphone algorithm include:

Definition at line 47 of file dictionary_util.cpp.

References _ASSERT, CTempString::find(), in(), ITERATE, out(), and tolower().

Referenced by CSimpleDictionary::AddWord(), CStringMatching::CStringMatching(), CStringMatching::MatchString(), CSimpleDictionary::Read(), Score(), and CSimpleDictionary::SuggestAlternates().

◆ GetSoundex()

Compute the Soundex key for a given word The Soundex key is defined as:

The final step is non-standard; the usual pad is ' '

Definition at line 332 of file dictionary_util.cpp.

References in(), int, ITERATE, out(), string, and toupper().

◆ Score() [1/2] ◆ Score() [2/2] ◆ Stem()

Compute the Porter stem for a given word.

Porter's stemming algorithm is one of many automated stemming algorithms; unlike most, Porter's stemming algorithm is a widely accepted standard algorithm for generating word stems.

A description of the algorithm is available at

http://www.tartarus.org/~martin/PorterStemmer/def.txt

The essence of the algorithm is to repeatedly strip likely word suffixes such as -ed, -es, -s, -ess, -ness, -ability, -ly, and so forth, leaving a residue of a word that can be compared with other stem sources. The goal is to permit comparison of socuh words as:

compare comparable comparability comparably

since they all contain approximately the same meaning.

This algorithm assumes that word case has already been adjusted to lower case.

Definition at line 804 of file dictionary_util.cpp.

References eConsonant, eVowel, NULL, s_EndsWith(), s_FindFirstVowel(), s_GetCharType(), s_MeasureWord(), s_ReplaceEnding(), s_TruncateEnding(), and str().

Referenced by CTextUtil::GetStemFrequencies(), and CTextUtil::GetWordFrequencies().

The documentation for this class was generated from the following files:


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4