Showing content from https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1StringSearch.html below:
ICU 77.1: icu::StringSearch Class Reference
StringSearch (const UnicodeString &pattern, const UnicodeString &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status) Creating a StringSearch
instance using the argument locale language rule set. More...
StringSearch (const UnicodeString &pattern, const UnicodeString &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status) Creating a StringSearch
instance using the argument collator language rule set. More...
StringSearch (const UnicodeString &pattern, CharacterIterator &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status) Creating a StringSearch
instance using the argument locale language rule set. More...
StringSearch (const UnicodeString &pattern, CharacterIterator &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status) Creating a StringSearch
instance using the argument collator language rule set. More...
StringSearch (const StringSearch &that) Copy constructor that creates a StringSearch instance with the same behavior, and iterating over the same text. More...
virtual ~StringSearch () Destructor. More...
StringSearch * clone () const Clone this object. More...
StringSearch & operator= (const StringSearch &that) Assignment operator. More...
virtual bool operator== (const SearchIterator &that) const override Equality operator. More...
virtual void setOffset (int32_t position, UErrorCode &status) override Sets the index to point to the given position, and clears any state that's affected. More...
virtual int32_t getOffset () const override Return the current index in the text being searched. More...
virtual void setText (const UnicodeString &text, UErrorCode &status) override Set the target text to be searched. More...
virtual void setText (CharacterIterator &text, UErrorCode &status) override Set the target text to be searched. More...
RuleBasedCollator * getCollator () const Gets the collator used for the language rules. More...
void setCollator (RuleBasedCollator *coll, UErrorCode &status) Sets the collator used for the language rules. More...
void setPattern (const UnicodeString &pattern, UErrorCode &status) Sets the pattern used for matching. More...
const UnicodeString & getPattern () const Gets the search pattern. More...
virtual void reset () override Reset the iteration. More...
virtual StringSearch * safeClone () const override Returns a copy of StringSearch with the same behavior, and iterating over the same text, as this one. More...
virtual UClassID getDynamicClassID () const override ICU "poor man's RTTI", returns a UClassID for the actual class. More...
SearchIterator (const SearchIterator &other) Copy constructor that creates a SearchIterator instance with the same behavior, and iterating over the same text. More...
virtual ~SearchIterator () Destructor. More...
void setAttribute (USearchAttribute attribute, USearchAttributeValue value, UErrorCode &status) Sets the text searching attributes located in the enum USearchAttribute with values from the enum USearchAttributeValue. More...
USearchAttributeValue getAttribute (USearchAttribute attribute) const Gets the text searching attributes. More...
int32_t getMatchedStart () const Returns the index to the match in the text string that was searched. More...
int32_t getMatchedLength () const Returns the length of text in the string which matches the search pattern. More...
void getMatchedText (UnicodeString &result) const Returns the text that was matched by the most recent call to first
, next
, previous
, or last
. More...
void setBreakIterator (BreakIterator *breakiter, UErrorCode &status) Set the BreakIterator that will be used to restrict the points at which matches are detected. More...
const BreakIterator * getBreakIterator () const Returns the BreakIterator that is used to restrict the points at which matches are detected. More...
const UnicodeString & getText () const Return the string text to be searched. More...
bool operator!= (const SearchIterator &that) const Not-equal operator. More...
int32_t first (UErrorCode &status) Returns the first index at which the string text matches the search pattern. More...
int32_t following (int32_t position, UErrorCode &status) Returns the first index equal or greater than position
at which the string text matches the search pattern. More...
int32_t last (UErrorCode &status) Returns the last index in the target text at which it matches the search pattern. More...
int32_t preceding (int32_t position, UErrorCode &status) Returns the first index less than position
at which the string text matches the search pattern. More...
int32_t next (UErrorCode &status) Returns the index of the next point at which the text matches the search pattern, starting from the current position The iterator is adjusted so that its current index (as returned by getOffset
) is the match position if one was found. More...
int32_t previous (UErrorCode &status) Returns the index of the previous point at which the string text matches the search pattern, starting at the current position. More...
virtual ~UObject () Destructor. More...
StringSearch
is a SearchIterator
that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object.
StringSearch ensures that language eccentricity can be handled, e.g. for the German collator, characters ß and SS will be matched if case is chosen to be ignored. See the "ICU Collation Design Document" for more information.
There are 2 match options for selection:
Let S' be the sub-string of a text string S between the offsets start and end [start, end].
A pattern string P matches a text string S at the offsets [start, end] if
option 1. Some canonical equivalent of P matches some canonical equivalent
of S'
option 2. P matches S' and if P starts or ends with a combining mark,
there exists no non-ignorable combining mark before or after S?
in S respectively.
Option 2. will be the default.
This search has APIs similar to that of other text iteration mechanisms such as the break iterators in BreakIterator
. Using these APIs, it is easy to scan through text looking for all occurrences of a given pattern. This search iterator allows changing of direction by calling a reset
followed by a next
or previous
. Though a direction change can occur without calling reset
first, this operation comes with some speed penalty. Match results in the forward direction will match the result matches in the backwards direction in the reverse order
SearchIterator
provides APIs to specify the starting position within the text string to be searched, e.g. setOffset
, preceding
and following
. Since the starting position will be set as it is specified, please take note that there are some danger points which the search may render incorrect results:
- The midst of a substring that requires normalization.
- If the following match is to be found, the position should not be the second character which requires to be swapped with the preceding character. Vice versa, if the preceding match is to be found, position to search from should not be the first character which requires to be swapped with the next character. E.g certain Thai and Lao characters require swapping.
- If a following pattern match is to be found, any position within a contracting sequence except the first will fail. Vice versa if a preceding pattern match is to be found, a invalid starting point would be any character within a contracting sequence except the last.
A BreakIterator
can be used if only matches at logical breaks are desired. Using a BreakIterator
will only give you results that exactly matches the boundaries given by the breakiterator. For instance the pattern "e" will not be found in the string "\u00e9" if a character break iterator is used.
Options are provided to handle overlapping matches. E.g. In English, overlapping matches produces the result 0 and 2 for the pattern "abab" in the text "ababab", where else mutually exclusive matches only produce the result of 0.
Though collator attributes will be taken into consideration while performing matches, there are no APIs here for setting and getting the attributes. These attributes can be set by getting the collator from getCollator
and using the APIs in coll.h
. Lastly to update StringSearch
to the new collator attributes, reset
has to be called.
Restriction:
Currently there are no composite characters that consists of a character with combining class > 0 before a character with combining class == 0. However, if such a character exists in the future, StringSearch
does not guarantee the results for option 1.
Consult the SearchIterator
documentation for information on and examples of how to use instances of this class to implement text searching.
UnicodeString target("The quick brown fox jumps over the lazy dog.");
UnicodeString pattern("fox");
UErrorCode error = U_ZERO_ERROR;
StringSearch iter(pattern, target, Locale::getUS(), nullptr, status);
for (int pos = iter.first(error);
pos != USEARCH_DONE;
pos = iter.next(error))
{
printf("Found match at %d pos, length is %d\n", pos, iter.getMatchedLength());
}
<p<blockquote>
Note, StringSearch
is not to be subclassed.
-
See also
-
SearchIterator
-
RuleBasedCollator
-
Since
-
ICU 2.0
Definition at line 135 of file stsearch.h.
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4