UnicodeString is a string class that stores Unicode characters directly and provides similar functionality as the Java String and StringBuffer/StringBuilder classes. More...
#include <unistr.h>
text
. More...
start
, start + length
) with the characters in the entire string text
. More...
start
, start + length
) with the characters in srcText
in the range [srcStart
, srcStart + srcLength
). More...
srcLength
characters in srcChars
. More...
start
, start + length
) with the first length
characters in srcChars
More...
start
, start + length
) with the characters in srcChars
in the range [srcStart
, srcStart + srcLength
). More...
start
, limit
) with the characters in srcText
in the range [srcStart
, srcLimit
). More...
text
More...
srcText
in the range [srcStart
, srcStart + srcLength
). More...
srcChars
More...
srcChars
in the range [srcStart
, srcStart + srcLength
). More...
text
More...
srcText
in the range [srcStart
, srcStart + srcLength
). More...
srcChars
More...
srcChars
in the range [srcStart
, srcStart + srcLength
). More...
text
, using bitwise comparison. More...
text
starting at offset start
, using bitwise comparison. More...
start
, start + length
) of the characters in text
, using bitwise comparison. More...
start
, start + length
) of the characters in srcText
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison. More...
srcChars
starting at offset start
, using bitwise comparison. More...
start
, start + length
) of the characters in srcChars
, using bitwise comparison. More...
start
, start + length
) of the characters in srcChars
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison. More...
c
, using bitwise comparison. More...
c
, using bitwise comparison. More...
c
, starting at offset start
, using bitwise comparison. More...
c
starting at offset start
, using bitwise comparison. More...
c
in the range [start
, start + length
), using bitwise comparison. More...
c
in the range [start
, start + length
), using bitwise comparison. More...
text
, using bitwise comparison. More...
text
starting at offset start
, using bitwise comparison. More...
start
, start + length
) of the characters in text
, using bitwise comparison. More...
start
, start + length
) of the characters in srcText
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison. More...
srcChars
starting at offset start
, using bitwise comparison. More...
start
, start + length
) of the characters in srcChars
, using bitwise comparison. More...
start
, start + length
) of the characters in srcChars
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison. More...
c
, using bitwise comparison. More...
c
, using bitwise comparison. More...
c
starting at offset start
, using bitwise comparison. More...
c
starting at offset start
, using bitwise comparison. More...
c
in the range [start
, start + length
), using bitwise comparison. More...
c
in the range [start
, start + length
), using bitwise comparison. More...
offset
. More...
offset
. More...
offset
. More...
start
, start + length
) into the array dst
, beginning at dstStart
. More...
start
, start + length
) into the UnicodeString target
. More...
start
, limit
) into the array dst
, beginning at dstStart
. More...
start
, limit
) into the UnicodeString target
. More...
start
, start + startLength
) into an array of characters. More...
start
, start + length
) into an array of characters in the platform's default codepage. More...
start
, start + length
) into an array of characters in a specified codepage. More...
start
, start + length
) into an array of characters in a specified codepage. More...
srcText
in the range [srcStart
, srcText.length()
). More...
srcText
in the range [srcStart
, srcStart + srcLength
). More...
srcText
. More...
srcChars
. More...
srcChar
. More...
srcChar
. More...
srcText
in the range [srcStart
, srcStart + srcLength
) to the UnicodeString object at offset start
. More...
srcText
to the UnicodeString object. More...
srcChars
in the range [srcStart
, srcStart + srcLength
) to the UnicodeString object at offset start
. More...
srcChars
to the UnicodeString object. More...
src
which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view, to the UnicodeString object. More...
srcChar
to the UnicodeString object. More...
srcChar
to the UnicodeString object. More...
srcText
in the range [srcStart
, srcStart + srcLength
) into the UnicodeString object at offset start
. More...
srcText
into the UnicodeString object at offset start
. More...
srcChars
in the range [srcStart
, srcStart + srcLength
) into the UnicodeString object at offset start
. More...
srcChars
into the UnicodeString object at offset start
. More...
srcChar
into the UnicodeString object at offset start
. More...
srcChar
into the UnicodeString object at offset start
. More...
start
, start + length
) with the characters in srcText
in the range [srcStart
, srcStart + srcLength
). More...
start
, start + length
) with the characters in srcText
. More...
start
, start + length
) with the characters in srcChars
in the range [srcStart
, srcStart + srcLength
). More...
start
, start + length
) with the characters in srcChars
. More...
start
, start + length
) with the code unit srcChar
. More...
start
, start + length
) with the code point srcChar
. More...
start
, limit
) with the characters in srcText
. More...
start
, limit
) with the characters in srcText
in the range [srcStart
, srcLimit
). More...
start
, start + length
). More...
oldStart
, oldStart + oldLength
) with the characters in newText in the range [newStart
, newStart + newLength
) in the range [start
, start + length
). More...
start
, start + length
) from the UnicodeString object. More...
start
, limit
) from the UnicodeString object. More...
start
, limit
) from the UnicodeString object. More...
padChar
. More...
padChar
. More...
targetLength
. More...
start
, start + length
) in this UnicodeString. More...
capacity
char16_ts. More...
text
which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view. More...
UnicodeString is a string class that stores Unicode characters directly and provides similar functionality as the Java String and StringBuffer/StringBuilder classes.
It is a concrete implementation of the abstract class Replaceable (for transliteration).
The UnicodeString equivalent of std::string’s clear() is remove().
A UnicodeString may "alias" an external array of characters (that is, point to it, rather than own the array) whose lifetime must then at least match the lifetime of the aliasing object. This aliasing may be preserved when returning a UnicodeString by value, depending on the compiler and the function implementation, via Return Value Optimization (RVO) or the move assignment operator. (However, the copy assignment operator does not preserve aliasing.) For details see the description of storage models at the end of the class API docs and in the User Guide chapter linked from there.
The UnicodeString class is not suitable for subclassing.
For an overview of Unicode strings in C and C++ see the User Guide Strings chapter.
In ICU, a Unicode string consists of 16-bit Unicode code units. A Unicode character may be stored with either one code unit (the most common case) or with a matched pair of special code units ("surrogates"). The data type for code units is char16_t. For single-character handling, a Unicode character code point is a value in the range 0..0x10ffff. ICU uses the UChar32 type for code points.
Indexes and offsets into and lengths of strings always count code units, not code points. This is the same as with multi-byte char* strings in traditional string handling. Operations on partial strings typically do not test for code point boundaries. If necessary, the user needs to take care of such boundaries by testing for the code unit values or by using functions like UnicodeString::getChar32Start() and UnicodeString::getChar32Limit() (or, in C, the equivalent macros U16_SET_CP_START() and U16_SET_CP_LIMIT(), see utf.h).
UnicodeString methods are more lenient with regard to input parameter values than other ICU APIs. In particular:
In string comparisons, two UnicodeString objects that are both "bogus" compare equal (to be transitive and prevent endless loops in sorting), and a "bogus" string compares less than any non-"bogus" one.
Const UnicodeString methods are thread-safe. Multiple threads can use const methods on the same UnicodeString object simultaneously, but non-const methods must not be called concurrently (in multiple threads) with any other (const or non-const) methods.
Similarly, const UnicodeString & parameters are thread-safe. One object may be passed in as such a parameter concurrently in multiple threads. This includes the const UnicodeString & parameters for copy construction, assignment, and cloning.
UnicodeString uses several storage methods. String contents can be stored inside the UnicodeString object itself, in an allocated and shared buffer, or in an outside buffer that is "aliased". Most of this is done transparently, but careful aliasing in particular provides significant performance improvements. Also, the internal buffer is accessible via special functions. For details see the User Guide Strings chapter.
Definition at line 295 of file unistr.h.
◆ EInvariantConstant to be used in the UnicodeString(char *, int32_t, EInvariant) constructor which constructs a Unicode string from an invariant-character char * string.
Use the macro US_INV instead of the full qualification for this value.
Definition at line 307 of file unistr.h.
◆ UnicodeString() [1/25] icu::UnicodeString::UnicodeString ( ) inline ◆ UnicodeString() [2/25] icu::UnicodeString::UnicodeString ( int32_t capacity, UChar32 c, int32_t count )Construct a UnicodeString with capacity to hold capacity
char16_ts.
Single char16_t (code unit) constructor.
It is recommended to mark this constructor "explicit" by -DUNISTR_FROM_CHAR_EXPLICIT=explicit
on the compiler command line or similar.
Single UChar32 (code point) constructor.
It is recommended to mark this constructor "explicit" by -DUNISTR_FROM_CHAR_EXPLICIT=explicit
on the compiler command line or similar.
nullptr_t constructor.
Effectively the same as the default constructor, makes an empty string object.
It is recommended to mark this constructor "explicit" by -DUNISTR_FROM_STRING_EXPLICIT=explicit
on the compiler command line or similar.
Definition at line 4186 of file unistr.h.
◆ UnicodeString() [6/25] icu::UnicodeString::UnicodeString ( const char16_t * text, int32_t textLength )char16_t* constructor.
Note, for string literals: Since C++17 and ICU 76, you can use UTF-16 string literals with compile-time length determination:
if (str == u"other literal") { ... }
UnicodeString()
Construct an empty UnicodeString.
text
to copy.
uint16_t * constructor.
Delegates to UnicodeString(const char16_t *, int32_t).
Note, for string literals: Since C++17 and ICU 76, you can use UTF-16 string literals with compile-time length determination:
if (str == u"other literal") { ... }
Definition at line 3225 of file unistr.h.
◆ UnicodeString() [8/25] icu::UnicodeString::UnicodeString ( const wchar_t * text, int32_t textLength ) inlinewchar_t * constructor.
(Only defined if U_SIZEOF_WCHAR_T==2.) Delegates to UnicodeString(const char16_t *, int32_t).
Note, for string literals: Since C++17 and ICU 76, you can use UTF-16 string literals with compile-time length determination:
if (str == u"other literal") { ... }
Definition at line 3247 of file unistr.h.
◆ UnicodeString() [9/25] icu::UnicodeString::UnicodeString ( const std::nullptr_t text, int32_t textLength ) inlinenullptr_t constructor.
Effectively the same as the default constructor, makes an empty string object.
Definition at line 4190 of file unistr.h.
◆ UnicodeString() [10/25]template<typename S , typename = std::enable_if_t<ConvertibleToU16StringView<S>>>
Constructor from text
which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view.
The string is bogus if the string view is too long.
If you need a UnicodeString but need not copy the string view contents, then you can call the UnicodeString::readOnlyAlias() function instead of this constructor.
Definition at line 3274 of file unistr.h.
◆ UnicodeString() [11/25] icu::UnicodeString::UnicodeString ( UBool isTerminated, ConstChar16Ptr text, int32_t textLength )Readonly-aliasing char16_t* constructor.
The text will be used for the UnicodeString object, but it will not be released when the UnicodeString is destroyed. This has copy-on-write semantics: When the string is modified, then the buffer is first copied into newly allocated memory. The aliased buffer is never modified.
In an assignment to another UnicodeString, when using the copy constructor or the assignment operator, the text will be copied. When using fastCopyFrom(), the text will be aliased again, so that both strings then alias the same readonly-text.
Note, for string literals: Since C++17 and ICU 76, you can use UTF-16 string literals with compile-time length determination:
if (str == u"other literal") { ... }
static UnicodeString readOnlyAlias(const S &text)
Readonly-aliasing factory method.
text
is NUL
-terminated. This must be true if textLength==-1
. text The characters to alias for the UnicodeString. textLength The number of Unicode characters in text
to alias. If -1, then this constructor will determine the length by calling u_strlen()
.
Writable-aliasing char16_t* constructor.
The text will be used for the UnicodeString object, but it will not be released when the UnicodeString is destroyed. This has write-through semantics: For as long as the capacity of the buffer is sufficient, write operations will directly affect the buffer. When more capacity is necessary, then a new buffer will be allocated and the contents copied as with regularly constructed strings. In an assignment to another UnicodeString, the buffer will be copied. The extract(Char16Ptr dst) function detects whether the dst pointer is the same as the string buffer itself and will in this case not copy the contents.
buffer
to alias. buffCapacity The size of buffer
in char16_ts.
Writable-aliasing uint16_t * constructor.
Delegates to UnicodeString(const char16_t *, int32_t, int32_t).
Definition at line 3343 of file unistr.h.
◆ UnicodeString() [14/25] icu::UnicodeString::UnicodeString ( wchar_t * buffer, int32_t buffLength, int32_t buffCapacity ) inlineWritable-aliasing wchar_t * constructor.
(Only defined if U_SIZEOF_WCHAR_T==2.) Delegates to UnicodeString(const char16_t *, int32_t, int32_t).
Definition at line 3357 of file unistr.h.
◆ UnicodeString() [15/25] icu::UnicodeString::UnicodeString ( std::nullptr_t buffer, int32_t buffLength, int32_t buffCapacity ) inlineWritable-aliasing nullptr_t constructor.
Effectively the same as the default constructor, makes an empty string object.
Definition at line 4194 of file unistr.h.
◆ UnicodeString() [16/25]char* constructor.
Uses the default converter (and thus depends on the ICU conversion code) unless U_CHARSET_IS_UTF8 is set to 1.
For ASCII (really "invariant character") strings it is more efficient to use the constructor that takes a US_INV (for its enum EInvariant).
Note, for string literals: Since C++17 and ICU 76, you can use UTF-16 string literals with compile-time length determination:
if (str == u"other literal") { ... }
It is recommended to mark this constructor "explicit" by -DUNISTR_FROM_STRING_EXPLICIT=explicit
on the compiler command line or similar.
char* constructor.
Uses the default converter (and thus depends on the ICU conversion code) unless U_CHARSET_IS_UTF8 is set to 1.
codepageData
.
char* constructor.
codepageData
. The special value 0 for codepage
indicates that the text is in the platform's default codepage.
If codepage
is an empty string (""
), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. Recommendation: For invariant-character strings use the constructor UnicodeString(const char *src, int32_t length, enum EInvariant inv) because it avoids object code dependencies of UnicodeString on the conversion code.
char* constructor.
codepageData
. codepage the encoding of codepageData
. The special value 0 for codepage
indicates that the text is in the platform's default codepage. If codepage
is an empty string (""
), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. Recommendation: For invariant-character strings use the constructor UnicodeString(const char *src, int32_t length, enum EInvariant inv) because it avoids object code dependencies of UnicodeString on the conversion code.
char * / UConverter constructor.
This constructor uses an existing UConverter object to convert the codepage string to Unicode and construct a UnicodeString from that.
The converter is reset at first. If the error code indicates a failure before this constructor is called, or if an error occurs during conversion or construction, then the string will be bogus.
This function avoids the overhead of opening and closing a converter if multiple strings are constructed.
Constructs a Unicode string from an invariant-character char * string.
About invariant characters see utypes.h. This constructor has no runtime dependency on conversion code and is therefore recommended over ones taking a charset name string (where the empty string "" indicates invariant-character conversion).
Use the macro US_INV as the third, signature-distinguishing parameter.
For example:
void fn(const char *s) {
}
#define US_INV
Constant to be used in the UnicodeString(char *, int32_t, EInvariant) constructor which constructs a ...
Note, for string literals: Since C++17 and ICU 76, you can use UTF-16 string literals with compile-time length determination:
if (str == u"other literal") { ... }
Copy constructor.
Starting with ICU 2.4, the assignment operator and the copy constructor allocate a new buffer and copy the buffer contents even for readonly aliases. By contrast, the fastCopyFrom() function implements the old, more efficient but less safe behavior of making this string also a readonly alias to the same buffer.
If the source object has an "open" buffer from getBuffer(minCapacity), then the copy is an empty string.
Move constructor; might leave src in bogus state.
This string will have the same contents and state that the source string had.
'Substring' constructor from tail of source string.
src
at which to start copying.
'Substring' constructor from subrange of source string.
src
at which to start copying. srcLength The number of characters from src
to copy.
Append the code unit srcChar
to the UnicodeString object.
Definition at line 4981 of file unistr.h.
◆ append() [2/7] UnicodeString & icu::UnicodeString::append ( const char16_t * srcChars, int32_t srcStart, int32_t srcLength ) inlineAppend the characters in srcChars
in the range [srcStart
, srcStart + srcLength
) to the UnicodeString object at offset start
.
srcChars
is not modified.
srcChars
where new characters will be obtained srcLength the number of characters in srcChars
in the append string; can be -1 if srcChars
is NUL-terminated
Definition at line 4970 of file unistr.h.
◆ append() [3/7]template<typename S , typename = std::enable_if_t<ConvertibleToU16StringView<S>>>
Appends the characters in src
which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view, to the UnicodeString object.
Definition at line 2300 of file unistr.h.
◆ append() [4/7]Append the characters in srcText
to the UnicodeString object.
srcText
is not modified.
Definition at line 4966 of file unistr.h.
References length().
◆ append() [5/7]Append the characters in srcText
in the range [srcStart
, srcStart + srcLength
) to the UnicodeString object at offset start
.
srcText
is not modified.
srcText
where new characters will be obtained srcLength the number of characters in srcText
in the append string
Definition at line 4960 of file unistr.h.
Referenced by icu::Transliterator::setID().
◆ append() [6/7]Append the characters in srcChars
to the UnicodeString object.
srcChars
is not modified.
srcChars
; can be -1 if srcChars
is NUL-terminated
Definition at line 4976 of file unistr.h.
◆ append() [7/7]Append the code point srcChar
to the UnicodeString object.
Compare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(text.foldCase(options)).
Definition at line 4443 of file unistr.h.
References length().
◆ caseCompare() [2/6] int8_t icu::UnicodeString::caseCompare ( ConstChar16Ptr srcChars, int32_t srcLength, uint32_t options ) const inlineCompare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).
Definition at line 4456 of file unistr.h.
◆ caseCompare() [3/6] int8_t icu::UnicodeString::caseCompare ( int32_t start, int32_t length, const char16_t * srcChars, int32_t srcStart, int32_t srcLength, uint32_t options ) const inlineCompare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).
Definition at line 4481 of file unistr.h.
◆ caseCompare() [4/6] int8_t icu::UnicodeString::caseCompare ( int32_t start, int32_t length, const char16_t * srcChars, uint32_t options ) const inlineCompare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcChars.foldCase(options)).
Definition at line 4473 of file unistr.h.
◆ caseCompare() [5/6] int8_t icu::UnicodeString::caseCompare ( int32_t start, int32_t length, const UnicodeString & srcText, int32_t srcStart, int32_t srcLength, uint32_t options ) const inlineCompare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcText.foldCase(options)).
Definition at line 4463 of file unistr.h.
◆ caseCompare() [6/6] int8_t icu::UnicodeString::caseCompare ( int32_t start, int32_t length, const UnicodeString & srcText, uint32_t options ) const inlineCompare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compare(srcText.foldCase(options)).
Definition at line 4448 of file unistr.h.
References length().
◆ caseCompareBetween() int8_t icu::UnicodeString::caseCompareBetween ( int32_t start, int32_t limit, const UnicodeString & srcText, int32_t srcStart, int32_t srcLimit, uint32_t options ) const inlineCompare two strings case-insensitively using full case folding.
This is equivalent to this->foldCase(options).compareBetween(text.foldCase(options)).
Definition at line 4491 of file unistr.h.
◆ char32At() UChar32 icu::UnicodeString::char32At ( int32_t offset ) constReturn the code point that contains the code unit at offset offset
.
If the offset is not valid (0..length()-1) then U+ffff is returned.
offset
or 0xffff if the offset is not valid for this string
Referenced by icu::DecimalFormatSymbols::setSymbol().
◆ charAt() char16_t icu::UnicodeString::charAt ( int32_t offset ) const inlineReturn the code unit at offset offset
.
If the offset is not valid (0..length()-1) then U+ffff is returned.
offset
or 0xffff if the offset is not valid for this string
Definition at line 4855 of file unistr.h.
◆ clone() ◆ compare() [1/6] int8_t icu::UnicodeString::compare ( const UnicodeString & text ) const inlineCompare the characters bitwise in this UnicodeString to the characters in text
.
text
, -1 if the characters in this are bitwise less than the characters in text
, +1 if the characters in this are bitwise greater than the characters in text
.
Definition at line 4320 of file unistr.h.
References length().
◆ compare() [2/6] int8_t icu::UnicodeString::compare ( ConstChar16Ptr srcChars, int32_t srcLength ) const inlineCompare the characters bitwise in this UnicodeString with the first srcLength
characters in srcChars
.
srcChars
to compare
srcChars
, -1 if the characters in this are bitwise less than the characters in srcChars
, +1 if the characters in this are bitwise greater than the characters in srcChars
.
Definition at line 4330 of file unistr.h.
◆ compare() [3/6] int8_t icu::UnicodeString::compare ( int32_t start, int32_t length, const char16_t * srcChars ) const inlineCompare the characters bitwise in the range [start
, start + length
) with the first length
characters in srcChars
srcChars
, -1 if the characters in this are bitwise less than the characters in srcChars
, +1 if the characters in this are bitwise greater than the characters in srcChars
.
Definition at line 4343 of file unistr.h.
◆ compare() [4/6] int8_t icu::UnicodeString::compare ( int32_t start, int32_t length, const char16_t * srcChars, int32_t srcStart, int32_t srcLength ) const inlineCompare the characters bitwise in the range [start
, start + length
) with the characters in srcChars
in the range [srcStart
, srcStart + srcLength
).
srcChars
to start comparison srcLength the number of characters in srcChars
to compare
srcChars
, -1 if the characters in this are bitwise less than the characters in srcChars
, +1 if the characters in this are bitwise greater than the characters in srcChars
.
Definition at line 4349 of file unistr.h.
◆ compare() [5/6] int8_t icu::UnicodeString::compare ( int32_t start, int32_t length, const UnicodeString & srcText, int32_t srcStart, int32_t srcLength ) const inlineCompare the characters bitwise in the range [start
, start + length
) with the characters in srcText
in the range [srcStart
, srcStart + srcLength
).
srcText
to start comparison srcLength the number of characters in src
to compare
srcText
, -1 if the characters in this are bitwise less than the characters in srcText
, +1 if the characters in this are bitwise greater than the characters in srcText
.
Definition at line 4335 of file unistr.h.
◆ compare() [6/6] int8_t icu::UnicodeString::compare ( int32_t start, int32_t length, const UnicodeString & text ) const inlineCompare the characters bitwise in the range [start
, start + length
) with the characters in the entire string text
.
(The parameters "start" and "length" are not applied to the other text "text".)
text
, -1 if the characters in this are bitwise less than the characters in text
, +1 if the characters in this are bitwise greater than the characters in text
.
Definition at line 4324 of file unistr.h.
References length().
◆ compareBetween() int8_t icu::UnicodeString::compareBetween ( int32_t start, int32_t limit, const UnicodeString & srcText, int32_t srcStart, int32_t srcLimit ) const inlineCompare the characters bitwise in the range [start
, limit
) with the characters in srcText
in the range [srcStart
, srcLimit
).
srcText
to start comparison srcLimit the offset into srcText
to limit comparison
srcText
, -1 if the characters in this are bitwise less than the characters in srcText
, +1 if the characters in this are bitwise greater than the characters in srcText
.
Definition at line 4357 of file unistr.h.
◆ compareCodePointOrder() [1/6] int8_t icu::UnicodeString::compareCodePointOrder ( const UnicodeString & text ) const inlineCompare two Unicode strings in code point order.
The result may be different from the results of compare(), operator<, etc. if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
Definition at line 4381 of file unistr.h.
References length().
◆ compareCodePointOrder() [2/6] int8_t icu::UnicodeString::compareCodePointOrder ( ConstChar16Ptr srcChars, int32_t srcLength ) const inlineCompare two Unicode strings in code point order.
The result may be different from the results of compare(), operator<, etc. if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
Definition at line 4391 of file unistr.h.
◆ compareCodePointOrder() [3/6] int8_t icu::UnicodeString::compareCodePointOrder ( int32_t start, int32_t length, const char16_t * srcChars ) const inlineCompare two Unicode strings in code point order.
The result may be different from the results of compare(), operator<, etc. if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
Definition at line 4404 of file unistr.h.
◆ compareCodePointOrder() [4/6] int8_t icu::UnicodeString::compareCodePointOrder ( int32_t start, int32_t length, const char16_t * srcChars, int32_t srcStart, int32_t srcLength ) const inlineCompare two Unicode strings in code point order.
The result may be different from the results of compare(), operator<, etc. if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
Definition at line 4410 of file unistr.h.
◆ compareCodePointOrder() [5/6] int8_t icu::UnicodeString::compareCodePointOrder ( int32_t start, int32_t length, const UnicodeString & srcText ) const inlineCompare two Unicode strings in code point order.
The result may be different from the results of compare(), operator<, etc. if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
Definition at line 4385 of file unistr.h.
References length().
◆ compareCodePointOrder() [6/6] int8_t icu::UnicodeString::compareCodePointOrder ( int32_t start, int32_t length, const UnicodeString & srcText, int32_t srcStart, int32_t srcLength ) const inlineCompare two Unicode strings in code point order.
The result may be different from the results of compare(), operator<, etc. if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
Definition at line 4396 of file unistr.h.
◆ compareCodePointOrderBetween() int8_t icu::UnicodeString::compareCodePointOrderBetween ( int32_t start, int32_t limit, const UnicodeString & srcText, int32_t srcStart, int32_t srcLimit ) const inlineCompare two Unicode strings in code point order.
The result may be different from the results of compare(), operator<, etc. if supplementary characters are present:
In UTF-16, supplementary characters (with code points U+10000 and above) are stored with pairs of surrogate code units. These have values from 0xd800 to 0xdfff, which means that they compare as less than some other BMP characters like U+feff. This function compares Unicode strings in code point order. If either of the UTF-16 strings is malformed (i.e., it contains unpaired surrogates), then the result is not defined.
Definition at line 4418 of file unistr.h.
◆ copy() virtual void icu::UnicodeString::copy ( int32_t start, int32_t limit, int32_t dest ) overridevirtualCopy a substring of this object, retaining attribute (out-of-band) information.
This method is used to duplicate or reorder substrings. The destination index must not overlap the source range.
0 <= start <= limit
. limit the ending index, exclusive; start <= limit <= length()
. dest the destination index. The characters from start..limit-1
will be copied to dest
. Implementations of this method may assume that dest <= start || dest >= limit
.
Implements icu::Replaceable.
◆ countChar32() int32_t icu::UnicodeString::countChar32 ( int32_t start =0
, int32_t length = INT32_MAX
) const
Count Unicode code points in the length char16_t code units of the string.
A code point may occupy either one or two char16_t code units. Counting code points involves reading all code units.
This functions is basically the inverse of moveIndex32().
Referenced by icu::DecimalFormatSymbols::setSymbol().
◆ endsWith() [1/4] UBool icu::UnicodeString::endsWith ( const char16_t * srcChars, int32_t srcStart, int32_t srcLength ) const inlineDetermine if this ends with the characters in srcChars
in the range [srcStart
, srcStart + srcLength
).
srcText
to start matching srcLength the number of characters in srcChars
to match
srcChars
, false otherwise
Definition at line 4716 of file unistr.h.
References u_strlen().
◆ endsWith() [2/4] UBool icu::UnicodeString::endsWith ( const UnicodeString & srcText, int32_t srcStart, int32_t srcLength ) const inlineDetermine if this ends with the characters in srcText
in the range [srcStart
, srcStart + srcLength
).
srcText
to start matching srcLength the number of characters in srcText
to match
text
, false otherwise
Definition at line 4698 of file unistr.h.
◆ endsWith() [3/4]Determine if this ends with the characters in text
text
, false otherwise
Definition at line 4693 of file unistr.h.
References length().
◆ endsWith() [4/4]Determine if this ends with the characters in srcChars
srcChars
srcChars
, false otherwise
Definition at line 4707 of file unistr.h.
References u_strlen().
◆ extract() [1/8] int32_t icu::UnicodeString::extract ( char * dest, int32_t destCapacity, UConverter * cnv, UErrorCode & errorCode ) constConvert the UnicodeString into a codepage string using an existing UConverter.
The output string is NUL-terminated if possible.
This function avoids the overhead of opening and closing a converter if multiple strings are extracted.
Copy the contents of the string into dest.
This is a convenience function that checks if there is enough space in dest, extracts the entire string if possible, and NUL-terminates dest if possible.
If the string fits into dest but cannot be NUL-terminated (length()==destCapacity) then the error code is set to U_STRING_NOT_TERMINATED_WARNING. If the string itself does not fit into dest (length()>destCapacity) then the error code is set to U_BUFFER_OVERFLOW_ERROR.
If the string aliases to dest
itself as an external buffer, then extract() will not copy the contents.
0
) const inline
Copy the characters in the range [start
, start + length
) into the array dst
, beginning at dstStart
.
If the string aliases to dst
itself as an external buffer, then extract() will not copy the contents.
dst
must be at least (dstStart + length
). dstStart the offset in dst
where the first character will be extracted
Definition at line 4802 of file unistr.h.
◆ extract() [4/8] void icu::UnicodeString::extract ( int32_t start, int32_t length, UnicodeString & target ) const inlineCopy the characters in the range [start
, start + length
) into the UnicodeString target
.
Definition at line 4809 of file unistr.h.
◆ extract() [5/8] int32_t icu::UnicodeString::extract ( int32_t start, int32_t startLength, char * target, const char * codepage =nullptr
) const inline
Copy the characters in the range [start
, start + length
) into an array of characters in a specified codepage.
The output string is NUL-terminated.
Recommendation: For invariant-character strings use extract(int32_t start, int32_t length, char *target, int32_t targetCapacity, enum EInvariant inv) const because it avoids object code dependencies of UnicodeString on the conversion code.
codepage
is an empty string (""
), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target
is nullptr, then the number of bytes required for target
is returned. It is assumed that the target is big enough to fit all of the characters.
Definition at line 4817 of file unistr.h.
◆ extract() [6/8] int32_t icu::UnicodeString::extract ( int32_t start, int32_t startLength, char * target, int32_t targetCapacity, enum EInvariant inv ) constCopy the characters in the range [start
, start + startLength
) into an array of characters.
All characters must be invariant (see utypes.h). Use US_INV as the last, signature-distinguishing parameter.
This function does not write any more than targetCapacity
characters but returns the length of the entire output string so that one can allocate a larger buffer and call the function again if necessary. The output string is NUL-terminated if possible.
Copy the characters in the range [start
, start + length
) into an array of characters in the platform's default codepage.
This function does not write any more than targetLength
characters but returns the length of the entire output string so that one can allocate a larger buffer and call the function again if necessary. The output string is NUL-terminated if possible.
target
is nullptr, then the number of bytes required for target
is returned.
Copy the characters in the range [start
, start + length
) into an array of characters in a specified codepage.
This function does not write any more than targetLength
characters but returns the length of the entire output string so that one can allocate a larger buffer and call the function again if necessary. The output string is NUL-terminated if possible.
Recommendation: For invariant-character strings use extract(int32_t start, int32_t length, char *target, int32_t targetCapacity, enum EInvariant inv) const because it avoids object code dependencies of UnicodeString on the conversion code.
codepage
is an empty string (""
), then a simple conversion is performed on the codepage-invariant subset ("invariant characters") of the platform encoding. See utypes.h. If target
is nullptr, then the number of bytes required for target
is returned.
0
) const inline
Copy the characters in the range [start
, limit
) into the array dst
, beginning at dstStart
.
dst
must be at least (dstStart + (limit - start)
). dstStart the offset in dst
where the first character will be extracted
Definition at line 4830 of file unistr.h.
◆ extractBetween() [2/2] virtual void icu::UnicodeString::extractBetween ( int32_t start, int32_t limit, UnicodeString & target ) const overridevirtualCopy the characters in the range [start
, limit
) into the UnicodeString target
.
Replaceable API.
Implements icu::Replaceable.
◆ fastCopyFrom()Almost the same as the assignment operator.
Replace the characters in this UnicodeString with the characters from srcText
.
This function works the same as the assignment operator for all strings except for ones that are readonly aliases.
Starting with ICU 2.4, the assignment operator and the copy constructor allocate a new buffer and copy the buffer contents even for readonly aliases. This function implements the old, more efficient but less safe behavior of making this string also a readonly alias to the same buffer.
The fastCopyFrom function must be used only if it is known that the lifetime of this UnicodeString does not exceed the lifetime of the aliased buffer including its contents, for example for strings from resource bundles or aliases to string constants.
If the source object has an "open" buffer from getBuffer(minCapacity), then the copy is an empty string.
Replace all occurrences of characters in oldText with the characters in newText.
Definition at line 4779 of file unistr.h.
References length().
◆ findAndReplace() [2/3]Replace all occurrences of characters in oldText with characters in newText in the range [start
, start + length
).
Definition at line 4785 of file unistr.h.
References length().
◆ findAndReplace() [3/3] UnicodeString& icu::UnicodeString::findAndReplace ( int32_t start, int32_t length, const UnicodeString & oldText, int32_t oldStart, int32_t oldLength, const UnicodeString & newText, int32_t newStart, int32_t newLength )Replace all occurrences of characters in oldText in the range [oldStart
, oldStart + oldLength
) with the characters in newText in the range [newStart
, newStart + newLength
) in the range [start
, start + length
).
oldText
oldLength the length of the search range in oldText
newText the text containing the replacement text newStart the start of the replacement range in newText
newLength the length of the replacement range in newText
0
)
Case-folds the characters in this string.
Case-folding is locale-independent and not context-sensitive, but there is an option for whether to include or exclude mappings for dotted I and dotless i that are marked with 'T' in CaseFolding.txt.
The result may be longer or shorter than the original.
Create a UnicodeString from a UTF-32 string.
Illegal input is replaced with U+FFFD. Otherwise, errors result in a bogus string. Calls u_strFromUTF32WithSub().
Get a read-only pointer to the internal buffer.
This can be called at any time on a valid UnicodeString.
It returns 0 if the string is bogus, or during an "open" getBuffer(minCapacity).
It can be called as many times as desired. The pointer that it returns will remain valid until the UnicodeString object is modified, at which time the pointer is semantically invalidated and must not be used any more.
The capacity of the buffer can be determined with getCapacity(). The part after length() may or may not be initialized and valid, depending on the history of the UnicodeString object.
The buffer contents is (probably) not NUL-terminated. You can check if it is with (s.length() < s.getCapacity() && buffer[s.length()]==0)
. (See getTerminatedBuffer().)
The buffer may reside in read-only memory. Its contents must not be modified.
Definition at line 4245 of file unistr.h.
◆ getBuffer() [2/2] char16_t* icu::UnicodeString::getBuffer ( int32_t minCapacity )Get a read/write pointer to the internal buffer.
The buffer is guaranteed to be large enough for at least minCapacity char16_ts, writable, and is still owned by the UnicodeString object. Calls to getBuffer(minCapacity) must not be nested, and must be matched with calls to releaseBuffer(newLength). If the string buffer was read-only or shared, then it will be reallocated and copied.
An attempted nested call will return 0, and will not further modify the state of the UnicodeString object. It also returns 0 if the string is bogus.
The actual capacity of the string buffer may be larger than minCapacity. getCapacity() returns the actual capacity. For many operations, the full capacity should be used to avoid reallocations.
While the buffer is "open" between getBuffer(minCapacity) and releaseBuffer(newLength), the following applies:
Referenced by icu::Normalizer::compare(), icu::UnicodeSet::span(), and icu::UnicodeSet::spanBack().
◆ getCapacity() int32_t icu::UnicodeString::getCapacity ( ) const inlineReturn the capacity of the internal buffer of the UnicodeString object.
This is useful together with the getBuffer functions. See there for details.
Definition at line 4219 of file unistr.h.
◆ getChar32At() virtual UChar32 icu::UnicodeString::getChar32At ( int32_t offset ) const overrideprotectedvirtual ◆ getChar32Limit() int32_t icu::UnicodeString::getChar32Limit ( int32_t offset ) constAdjust a random-access offset so that it points behind a Unicode character.
The offset that is passed in points behind any code unit of a code point, while the returned offset will point behind the last code unit of the same code point. In UTF-16, if the input offset points behind the first surrogate (i.e., to the second surrogate) of a surrogate pair, then the returned offset will point behind the second surrogate (i.e., to the first surrogate).
Adjust a random-access offset so that it points to the beginning of a Unicode character.
The offset that is passed in points to any code unit of a code point, while the returned offset will point to the first code unit of the same code point. In UTF-16, if the input offset points to a second surrogate of a surrogate pair, then the returned offset will point to the first surrogate.
ICU "poor man's RTTI", returns a UClassID for the actual class.
Reimplemented from icu::UObject.
◆ getLength() virtual int32_t icu::UnicodeString::getLength ( ) const overrideprotectedvirtual ◆ getStaticClassID() static UClassID icu::UnicodeString::getStaticClassID ( ) staticICU "poor man's RTTI", returns a UClassID for this class.
Get a read-only pointer to the internal buffer, making sure that it is NUL-terminated.
This can be called at any time on a valid UnicodeString.
It returns 0 if the string is bogus, or during an "open" getBuffer(minCapacity), or if the buffer cannot be NUL-terminated (because memory allocation failed).
It can be called as many times as desired. The pointer that it returns will remain valid until the UnicodeString object is modified, at which time the pointer is semantically invalidated and must not be used any more.
The capacity of the buffer can be determined with getCapacity(). The part after length()+1 may or may not be initialized and valid, depending on the history of the UnicodeString object.
The buffer contents is guaranteed to be NUL-terminated. getTerminatedBuffer() may reallocate the buffer if a terminating NUL is written. For this reason, this function is not const, unlike getBuffer(). Note that a UnicodeString may also contain NUL characters as part of its contents.
The buffer may reside in read-only memory. Its contents must not be modified.
Replace a substring of this object with the given text.
0 <= start <= limit
. limit the ending index, exclusive; start <= limit <= length()
. text the text to replace characters start
to limit - 1
Implements icu::Replaceable.
◆ hashCode() int32_t icu::UnicodeString::hashCode ( ) const inline ◆ hasMetaData() virtual UBool icu::UnicodeString::hasMetaData ( ) const overridevirtual ◆ hasMoreChar32Than() UBool icu::UnicodeString::hasMoreChar32Than ( int32_t start, int32_t length, int32_t number ) constCheck if the length char16_t code units of the string contain more Unicode code points than a certain number.
This is more efficient than counting all code points in this part of the string and comparing that number with a threshold. This function may not need to scan the string at all if the length falls within a certain range, and never needs to count more than 'number+1' code points. Logically equivalent to (countChar32(start, length)>number). A Unicode code point may occupy either one or two char16_t code units.
Locate in this the first occurrence of the BMP code point c
, using bitwise comparison.
c
, or -1 if not found.
Definition at line 4561 of file unistr.h.
◆ indexOf() [2/13] int32_t icu::UnicodeString::indexOf ( char16_t c, int32_t start ) const inlineLocate in this the first occurrence of the BMP code point c
, starting at offset start
, using bitwise comparison.
c
, or -1 if not found.
Definition at line 4569 of file unistr.h.
◆ indexOf() [3/13] int32_t icu::UnicodeString::indexOf ( char16_t c, int32_t start, int32_t length ) const inlineLocate in this the first occurrence of the BMP code point c
in the range [start
, start + length
), using bitwise comparison.
c
, or -1 if not found.
Definition at line 4549 of file unistr.h.
◆ indexOf() [4/13] int32_t icu::UnicodeString::indexOf ( const char16_t * srcChars, int32_t srcLength, int32_t start ) const inlineLocate in this the first occurrence of the characters in srcChars
starting at offset start
, using bitwise comparison.
srcChars
to match start the offset into this at which to start matching
text
, or -1 if not found.
Definition at line 4534 of file unistr.h.
◆ indexOf() [5/13] int32_t icu::UnicodeString::indexOf ( const char16_t * srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length ) constLocate in this the first occurrence in the range [start
, start + length
) of the characters in srcChars
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison.
srcChars
at which to start matching srcLength the number of characters in srcChars
to match start the offset into this at which to start matching length the number of characters in this to search
text
, or -1 if not found.
Locate in this the first occurrence in the range [start
, start + length
) of the characters in srcText
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison.
srcText
at which to start matching srcLength the number of characters in srcText
to match start the offset into this at which to start matching length the number of characters in this to search
text
, or -1 if not found.
Definition at line 4501 of file unistr.h.
References isBogus().
◆ indexOf() [7/13] int32_t icu::UnicodeString::indexOf ( const UnicodeString & text ) const inlineLocate in this the first occurrence of the characters in text
, using bitwise comparison.
text
, or -1 if not found.
Definition at line 4517 of file unistr.h.
References length().
◆ indexOf() [8/13] int32_t icu::UnicodeString::indexOf ( const UnicodeString & text, int32_t start ) const inlineLocate in this the first occurrence of the characters in text
starting at offset start
, using bitwise comparison.
text
, or -1 if not found.
Definition at line 4521 of file unistr.h.
References length().
◆ indexOf() [9/13] int32_t icu::UnicodeString::indexOf ( const UnicodeString & text, int32_t start, int32_t length ) const inlineLocate in this the first occurrence in the range [start
, start + length
) of the characters in text
, using bitwise comparison.
text
, or -1 if not found.
Definition at line 4528 of file unistr.h.
References length().
◆ indexOf() [10/13] int32_t icu::UnicodeString::indexOf ( ConstChar16Ptr srcChars, int32_t srcLength, int32_t start, int32_t length ) const inlineLocate in this the first occurrence in the range [start
, start + length
) of the characters in srcChars
, using bitwise comparison.
srcChars
start The offset at which searching will start. length The number of characters to search
srcChars
, or -1 if not found.
Definition at line 4542 of file unistr.h.
◆ indexOf() [11/13] int32_t icu::UnicodeString::indexOf ( UChar32 c ) const inlineLocate in this the first occurrence of the code point c
, using bitwise comparison.
c
, or -1 if not found.
Definition at line 4565 of file unistr.h.
◆ indexOf() [12/13] int32_t icu::UnicodeString::indexOf ( UChar32 c, int32_t start ) const inlineLocate in this the first occurrence of the code point c
starting at offset start
, using bitwise comparison.
c
, or -1 if not found.
Definition at line 4576 of file unistr.h.
◆ indexOf() [13/13] int32_t icu::UnicodeString::indexOf ( UChar32 c, int32_t start, int32_t length ) const inlineLocate in this the first occurrence of the code point c
in the range [start
, start + length
), using bitwise comparison.
c
, or -1 if not found.
Definition at line 4555 of file unistr.h.
◆ insert() [1/6] UnicodeString & icu::UnicodeString::insert ( int32_t start, char16_t srcChar ) inlineInsert the code unit srcChar
into the UnicodeString object at offset start
.
Definition at line 5023 of file unistr.h.
◆ insert() [2/6] UnicodeString & icu::UnicodeString::insert ( int32_t start, const char16_t * srcChars, int32_t srcStart, int32_t srcLength ) inlineInsert the characters in srcChars
in the range [srcStart
, srcStart + srcLength
) into the UnicodeString object at offset start
.
srcChars
is not modified.
srcChars
where new characters will be obtained srcLength the number of characters in srcChars
in the insert string
Definition at line 5010 of file unistr.h.
◆ insert() [3/6]Insert the characters in srcText
into the UnicodeString object at offset start
.
srcText
is not modified.
Definition at line 5005 of file unistr.h.
References length().
◆ insert() [4/6]Insert the characters in srcText
in the range [srcStart
, srcStart + srcLength
) into the UnicodeString object at offset start
.
srcText
is not modified.
srcText
where new characters will be obtained srcLength the number of characters in srcText
in the insert string
Definition at line 4998 of file unistr.h.
◆ insert() [5/6]Insert the characters in srcChars
into the UnicodeString object at offset start
.
srcChars
is not modified.
Definition at line 5017 of file unistr.h.
◆ insert() [6/6]Insert the code point srcChar
into the UnicodeString object at offset start
.
Definition at line 5028 of file unistr.h.
◆ isBogus() UBool icu::UnicodeString::isBogus ( ) const inline ◆ isEmpty() UBool icu::UnicodeString::isEmpty ( ) const inlineDetermine if this string is empty.
Definition at line 4863 of file unistr.h.
◆ lastIndexOf() [1/13] int32_t icu::UnicodeString::lastIndexOf ( char16_t c ) const inlineLocate in this the last occurrence of the BMP code point c
, using bitwise comparison.
c
, or -1 if not found.
Definition at line 4644 of file unistr.h.
◆ lastIndexOf() [2/13] int32_t icu::UnicodeString::lastIndexOf ( char16_t c, int32_t start ) const inlineLocate in this the last occurrence of the BMP code point c
starting at offset start
, using bitwise comparison.
c
, or -1 if not found.
Definition at line 4653 of file unistr.h.
◆ lastIndexOf() [3/13] int32_t icu::UnicodeString::lastIndexOf ( char16_t c, int32_t start, int32_t length ) const inlineLocate in this the last occurrence of the BMP code point c
in the range [start
, start + length
), using bitwise comparison.
c
, or -1 if not found.
Definition at line 4631 of file unistr.h.
◆ lastIndexOf() [4/13] int32_t icu::UnicodeString::lastIndexOf ( const char16_t * srcChars, int32_t srcLength, int32_t start ) const inlineLocate in this the last occurrence of the characters in srcChars
starting at offset start
, using bitwise comparison.
srcChars
to match start the offset into this at which to start matching
text
, or -1 if not found.
Definition at line 4590 of file unistr.h.
◆ lastIndexOf() [5/13] int32_t icu::UnicodeString::lastIndexOf ( const char16_t * srcChars, int32_t srcStart, int32_t srcLength, int32_t start, int32_t length ) constLocate in this the last occurrence in the range [start
, start + length
) of the characters in srcChars
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison.
srcChars
at which to start matching srcLength the number of characters in srcChars
to match start the offset into this at which to start matching length the number of characters in this to search
text
, or -1 if not found.
Locate in this the last occurrence in the range [start
, start + length
) of the characters in srcText
in the range [srcStart
, srcStart + srcLength
), using bitwise comparison.
srcText
at which to start matching srcLength the number of characters in srcText
to match start the offset into this at which to start matching length the number of characters in this to search
text
, or -1 if not found.
Definition at line 4598 of file unistr.h.
References isBogus().
◆ lastIndexOf() [7/13] int32_t icu::UnicodeString::lastIndexOf ( const UnicodeString & text ) const inlineLocate in this the last occurrence of the characters in text
, using bitwise comparison.
text
, or -1 if not found.
Definition at line 4627 of file unistr.h.
References length().
◆ lastIndexOf() [8/13] int32_t icu::UnicodeString::lastIndexOf ( const UnicodeString & text, int32_t start ) const inlineLocate in this the last occurrence of the characters in text
starting at offset start
, using bitwise comparison.
text
, or -1 if not found.
Definition at line 4620 of file unistr.h.
References length().
◆ lastIndexOf() [9/13] int32_t icu::UnicodeString::lastIndexOf ( const UnicodeString & text, int32_t start, int32_t length ) const inlineLocate in this the last occurrence in the range [start
, start + length
) of the characters in text
, using bitwise comparison.
text
, or -1 if not found.
Definition at line 4614 of file unistr.h.
References length().
◆ lastIndexOf() [10/13] int32_t icu::UnicodeString::lastIndexOf ( ConstChar16Ptr srcChars, int32_t srcLength, int32_t start, int32_t length ) const inlineLocate in this the last occurrence in the range [start
, start + length
) of the characters in srcChars
, using bitwise comparison.
srcChars
start The offset at which searching will start. length The number of characters to search
srcChars
, or -1 if not found.
Definition at line 4583 of file unistr.h.
◆ lastIndexOf() [11/13] int32_t icu::UnicodeString::lastIndexOf ( UChar32 c ) const inlineLocate in this the last occurrence of the code point c
, using bitwise comparison.
c
, or -1 if not found.
Definition at line 4648 of file unistr.h.
◆ lastIndexOf() [12/13] int32_t icu::UnicodeString::lastIndexOf ( UChar32 c, int32_t start ) const inlineLocate in this the last occurrence of the code point c
starting at offset start
, using bitwise comparison.
c
, or -1 if not found.
Definition at line 4660 of file unistr.h.
◆ lastIndexOf() [13/13] int32_t icu::UnicodeString::lastIndexOf ( UChar32 c, int32_t start, int32_t length ) const inlineLocate in this the last occurrence of the code point c
in the range [start
, start + length
), using bitwise comparison.
c
, or -1 if not found.
Definition at line 4637 of file unistr.h.
◆ length() int32_t icu::UnicodeString::length ( ) const inlineReturn the length of the UnicodeString object.
The length is the number of char16_t code units are in the UnicodeString. If you want the number of code points, please use countChar32().
Definition at line 4214 of file unistr.h.
Referenced by append(), caseCompare(), icu::Normalizer::compare(), compare(), compareCodePointOrder(), endsWith(), findAndReplace(), indexOf(), insert(), lastIndexOf(), operator+=(), operator<(), operator<=(), operator==(), operator>(), operator>=(), replace(), replaceBetween(), setTo(), icu::UnicodeSet::span(), icu::UnicodeSet::spanBack(), and startsWith().
◆ moveIndex32() int32_t icu::UnicodeString::moveIndex32 ( int32_t index, int32_t delta ) constMove the code unit index along the string by delta code points.
Interpret the input index as a code unit-based offset into the string, move the index forward or backward by delta code points, and return the resulting index. The input index should point to the first code unit of a code point, if there is more than one.
Both input and output indexes are code unit-based as for all string indexes/offsets in ICU (and other libraries, like MBCS char*). If delta<0 then the index is moved backward (toward the start of the string). If delta>0 then the index is moved forward (toward the end of the string).
This behaves like CharacterIterator::move32(delta, kCurrent).
Behavior for out-of-bounds indexes: moveIndex32
pins the input index to 0..length(), i.e., if the input index<0 then it is pinned to 0; if it is index>length() then it is pinned to length(). Afterwards, the index is moved by delta
code points forward or backward, but no further backward than to 0 and no further forward than to length(). The resulting index return value will be in between 0 and length(), inclusively.
Examples:
int32_t index=1;
index=s.moveIndex32(index, 2);
index=s.moveIndex32(0, 3);
index=s.moveIndex32(s.length(), -2);
Converts to a std::u16string_view.
Definition at line 3035 of file unistr.h.
References icu::Replaceable::length().
◆ operator std::wstring_view() icu::UnicodeString::operator std::wstring_view ( ) const inlineConverts to a std::wstring_view.
Note: This should remain draft until C++ standard plans about char16_t vs. wchar_t become clearer.
Definition at line 3049 of file unistr.h.
References icu::Replaceable::length(), and U_ALIASING_BARRIER.
◆ operator!=() [1/2]template<typename S , typename = std::enable_if_t<ConvertibleToU16StringView<S>>>
bool icu::UnicodeString::operator!= ( const S & text ) const inlineInequality operator.
Performs only bitwise comparison with text
which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view.
For performance, you can use std::u16string_view literals with compile-time length determination:
#include <string_view>
using namespace std::string_view_literals;
if (str != u"literal"sv) { ... }
text
contains the same characters as this one, true otherwise.
Definition at line 382 of file unistr.h.
References icu::operator==().
◆ operator!=() [2/2] bool icu::UnicodeString::operator!= ( const UnicodeString & text ) const inlineInequality operator.
Performs only bitwise comparison.
text
contains the same characters as this one, true otherwise.
Definition at line 4300 of file unistr.h.
◆ operator+=() [1/4]Append operator.
Append the code unit ch
to the UnicodeString object.
Definition at line 4985 of file unistr.h.
◆ operator+=() [2/4]template<typename S , typename = std::enable_if_t<ConvertibleToU16StringView<S>>>
Append operator.
Appends the characters in src
which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view, to the UnicodeString object.
Definition at line 2227 of file unistr.h.
◆ operator+=() [3/4]Append operator.
Append the characters in srcText
to the UnicodeString object. srcText
is not modified.
Definition at line 4994 of file unistr.h.
References length().
◆ operator+=() [4/4]Append operator.
Append the code point ch
to the UnicodeString object.
Definition at line 4989 of file unistr.h.
◆ operator<()Less than operator.
Performs only bitwise comparison.
text
, false otherwise
Definition at line 4307 of file unistr.h.
References length().
◆ operator<=()Less than or equal operator.
Performs only bitwise comparison.
text
, false otherwise
Definition at line 4315 of file unistr.h.
References length().
◆ operator=() [1/5]Assignment operator.
Replace the characters in this UnicodeString with the code unit ch
.
Definition at line 4906 of file unistr.h.
◆ operator=() [2/5]template<typename S , typename = std::enable_if_t<ConvertibleToU16StringView<S>>>
Assignment operator.
Replaces the characters in this UnicodeString with a copy of the characters from the src
which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view.
Definition at line 1960 of file unistr.h.
References icu::Replaceable::length().
◆ operator=() [3/5]Assignment operator.
Replace the characters in this UnicodeString with the characters from srcText
.
Starting with ICU 2.4, the assignment operator and the copy constructor allocate a new buffer and copy the buffer contents even for readonly aliases. By contrast, the fastCopyFrom() function implements the old, more efficient but less safe behavior of making this string also a readonly alias to the same buffer.
If the source object has an "open" buffer from getBuffer(minCapacity), then the copy is an empty string.
Assignment operator.
Replace the characters in this UnicodeString with the code point ch
.
Definition at line 4910 of file unistr.h.
◆ operator=() [5/5]Move assignment operator; might leave src in bogus state.
This string will have the same contents and state that the source string had. The behavior is undefined if *this and src are the same object.
template<typename S , typename = std::enable_if_t<ConvertibleToU16StringView<S>>>
bool icu::UnicodeString::operator== ( const S & text ) const inlineEquality operator.
Performs only bitwise comparison with text
which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view.
For performance, you can use UTF-16 string literals with compile-time length determination:
if (str == u"literal") { ... }
text
contains the same characters as this one, false otherwise.
Definition at line 347 of file unistr.h.
References icu::Replaceable::length().
◆ operator==() [2/2] bool icu::UnicodeString::operator== ( const UnicodeString & text ) const inlineEquality operator.
Performs only bitwise comparison.
text
contains the same characters as this one, false otherwise.
Definition at line 4289 of file unistr.h.
References isBogus(), and length().
◆ operator>()Greater than operator.
Performs only bitwise comparison.
text
, false otherwise
Definition at line 4304 of file unistr.h.
References length().
◆ operator>=()Greater than or equal operator.
Performs only bitwise comparison.
text
, false otherwise
Definition at line 4312 of file unistr.h.
References length().
◆ operator[]() char16_t icu::UnicodeString::operator[] ( int32_t offset ) const inlineReturn the code unit at offset offset
.
If the offset is not valid (0..length()-1) then U+ffff is returned.
offset
Definition at line 4859 of file unistr.h.
◆ padLeading() UBool icu::UnicodeString::padLeading ( int32_t targetLength, char16_t padChar =0x0020
)
Pad the start of this UnicodeString with the character padChar
.
If the length of this UnicodeString is less than targetLength, length() - targetLength copies of padChar will be added to the beginning of this UnicodeString.
0x0020
)
Pad the end of this UnicodeString with the character padChar
.
If the length of this UnicodeString is less than targetLength, length() - targetLength copies of padChar will be added to the end of this UnicodeString.
template<typename S , typename = std::enable_if_t<ConvertibleToU16StringView<S>>>
static UnicodeString icu::UnicodeString::readOnlyAlias ( const S & text ) inlinestaticReadonly-aliasing factory method.
Aliases the same buffer as the input text
which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view. The string is bogus if the string view is too long.
The text will be used for the UnicodeString object, but it will not be released when the UnicodeString is destroyed. This has copy-on-write semantics: When the string is modified, then the buffer is first copied into newly allocated memory. The aliased buffer is never modified.
In an assignment to another UnicodeString, when using the copy constructor or the assignment operator, the text will be copied. When using fastCopyFrom(), the text will be aliased again, so that both strings then alias the same readonly-text.
Definition at line 3600 of file unistr.h.
◆ readOnlyAlias() [2/2]Readonly-aliasing factory method.
Aliases the same buffer as the input text
.
The text will be used for the UnicodeString object, but it will not be released when the UnicodeString is destroyed. This has copy-on-write semantics: When the string is modified, then the buffer is first copied into newly allocated memory. The aliased buffer is never modified.
In an assignment to another UnicodeString, when using the copy constructor or the assignment operator, the text will be copied. When using fastCopyFrom(), the text will be aliased again, so that both strings then alias the same readonly-text.
Definition at line 3623 of file unistr.h.
◆ releaseBuffer() void icu::UnicodeString::releaseBuffer ( int32_t newLength =-1
)
Release a read/write buffer on a UnicodeString object with an "open" getBuffer(minCapacity).
This function must be called in a matched pair with getBuffer(minCapacity). releaseBuffer(newLength) must be called if and only if a getBuffer(minCapacity) is "open".
It will set the string length to newLength, at most to the current capacity. If newLength==-1 then it will set the length according to the first NUL in the buffer, or to the capacity if there is no NUL.
After calling releaseBuffer(newLength) the UnicodeString is back to normal operation.
static_cast<int32_t>(INT32_MAX)
) inline
Remove the characters in the range [start
, start + length
) from the UnicodeString object.
Definition at line 5046 of file unistr.h.
References INT32_MAX.
◆ removeBetween() UnicodeString & icu::UnicodeString::removeBetween ( int32_t start, int32_t limit =static_cast<int32_t>(INT32_MAX)
) inline
Remove the characters in the range [start
, limit
) from the UnicodeString object.
Definition at line 5057 of file unistr.h.
◆ replace() [1/6] UnicodeString & icu::UnicodeString::replace ( int32_t start, int32_t length, char16_t srcChar ) inlineReplace the characters in the range [start
, start + length
) with the code unit srcChar
.
start + length
is not modified. srcChar the new code unit
Definition at line 4759 of file unistr.h.
◆ replace() [2/6] UnicodeString & icu::UnicodeString::replace ( int32_t start, int32_t length, const char16_t * srcChars, int32_t srcStart, int32_t srcLength ) inlineReplace the characters in the range [start
, start + length
) with the characters in srcChars
in the range [srcStart
, srcStart + srcLength
).
srcChars
is not modified.
start + length
is not modified. srcChars the source for the new characters srcStart the offset into srcChars
where new characters will be obtained srcLength the number of characters in srcChars
in the replace string
Definition at line 4751 of file unistr.h.
◆ replace() [3/6]Replace the characters in the range [start
, start + length
) with the characters in srcText
.
srcText
is not modified.
start + length
is not modified. srcText the source for the new characters
Definition at line 4730 of file unistr.h.
References length().
◆ replace() [4/6] UnicodeString & icu::UnicodeString::replace ( int32_t start, int32_t length, const UnicodeString & srcText, int32_t srcStart, int32_t srcLength ) inlineReplace the characters in the range [start
, start + length
) with the characters in srcText
in the range [srcStart
, srcStart + srcLength
).
srcText
is not modified.
start + length
is not modified. srcText the source for the new characters srcStart the offset into srcText
where new characters will be obtained srcLength the number of characters in srcText
in the replace string
Definition at line 4736 of file unistr.h.
◆ replace() [5/6]Replace the characters in the range [start
, start + length
) with the characters in srcChars
.
srcChars
is not modified.
start + length
is not modified. srcChars the source for the new characters srcLength the number of Unicode characters in srcChars
Definition at line 4744 of file unistr.h.
◆ replace() [6/6]Replace the characters in the range [start
, start + length
) with the code point srcChar
.
start + length
is not modified. srcChar the new code point
Replace the characters in the range [start
, limit
) with the characters in srcText
.
srcText
is not modified.
Definition at line 4765 of file unistr.h.
References length().
◆ replaceBetween() [2/2] UnicodeString & icu::UnicodeString::replaceBetween ( int32_t start, int32_t limit, const UnicodeString & srcText, int32_t srcStart, int32_t srcLimit ) inlineReplace the characters in the range [start
, limit
) with the characters in srcText
in the range [srcStart
, srcLimit
).
srcText
is not modified.
srcChars
where new characters will be obtained srcLimit the offset immediately following the range to copy in srcText
Definition at line 4771 of file unistr.h.
◆ retainBetween()Retain only the characters in the range [start
, limit
) from the UnicodeString object.
Removes characters before start
and at and after limit
.
Definition at line 5062 of file unistr.h.
◆ reverse() [1/2] ◆ reverse() [2/2] UnicodeString & icu::UnicodeString::reverse ( int32_t start, int32_t length ) inlineReverse the range [start
, start + length
) in this UnicodeString.
Definition at line 5087 of file unistr.h.
◆ setCharAt() UnicodeString& icu::UnicodeString::setCharAt ( int32_t offset, char16_t ch )Set the character at the specified offset to the specified character.
Aliasing setTo() function, analogous to the writable-aliasing char16_t* constructor.
The text will be used for the UnicodeString object, but it will not be released when the UnicodeString is destroyed. This has write-through semantics: For as long as the capacity of the buffer is sufficient, write operations will directly affect the buffer. When more capacity is necessary, then a new buffer will be allocated and the contents copied as with regularly constructed strings. In an assignment to another UnicodeString, the buffer will be copied. The extract(Char16Ptr dst) function detects whether the dst pointer is the same as the string buffer itself and will in this case not copy the contents.
buffer
to alias. buffCapacity The size of buffer
in char16_ts.
Set the characters in the UnicodeString object to the code unit srcChar
.
Definition at line 4946 of file unistr.h.
◆ setTo() [3/8] UnicodeString & icu::UnicodeString::setTo ( const char16_t * srcChars, int32_t srcLength ) inlineSet the characters in the UnicodeString object to the characters in srcChars
.
srcChars
is not modified.
Definition at line 4938 of file unistr.h.
◆ setTo() [4/8]Set the text in the UnicodeString object to the characters in srcText
.
srcText
is not modified.
Definition at line 4932 of file unistr.h.
◆ setTo() [5/8] ◆ setTo() [6/8]Set the text in the UnicodeString object to the characters in srcText
in the range [srcStart
, srcStart + srcLength
).
srcText
is not modified.
srcText
where new characters will be obtained srcLength the number of characters in srcText
in the replace string.
Definition at line 4914 of file unistr.h.
◆ setTo() [7/8]Aliasing setTo() function, analogous to the readonly-aliasing char16_t* constructor.
The text will be used for the UnicodeString object, but it will not be released when the UnicodeString is destroyed. This has copy-on-write semantics: When the string is modified, then the buffer is first copied into newly allocated memory. The aliased buffer is never modified.
In an assignment to another UnicodeString, when using the copy constructor or the assignment operator, the text will be copied. When using fastCopyFrom(), the text will be aliased again, so that both strings then alias the same readonly-text.
text
is NUL
-terminated. This must be true if textLength==-1
. text The characters to alias for the UnicodeString. textLength The number of Unicode characters in text
to alias. If -1, then this constructor will determine the length by calling u_strlen()
.
Set the characters in the UnicodeString object to the code point srcChar
.
Definition at line 4953 of file unistr.h.
◆ setToBogus() void icu::UnicodeString::setToBogus ( )Make this UnicodeString object invalid.
The string will test true with isBogus().
A bogus string has no value. It is different from an empty string. It can be used to indicate that no string value is available. getBuffer() and getTerminatedBuffer() return nullptr, and length() returns 0.
This utility function is used throughout the UnicodeString implementation to indicate that a UnicodeString operation failed, and may be used in other functions, especially but not exclusively when such functions do not take a UErrorCode for simplicity.
The following methods, and no others, will clear a string object's bogus flag:
The simplest ways to turn a bogus string into an empty one is to use the remove() function. Examples for other functions that are equivalent to "set to empty string":
if(s.isBogus()) {
s.remove();
s.truncate(0);
s.setTo(u"", 0);
}
int32_t UChar32
Define UChar32 as a type for single Unicode code points.
#define INT32_MAX
The largest value a 32 bit signed integer can hold.
Referenced by icu::ures_getNextUnicodeString(), icu::ures_getUnicodeString(), icu::ures_getUnicodeStringByIndex(), and icu::ures_getUnicodeStringByKey().
◆ startsWith() [1/4] UBool icu::UnicodeString::startsWith ( const char16_t * srcChars, int32_t srcStart, int32_t srcLength ) const inlineDetermine if this ends with the characters in srcChars
in the range [srcStart
, srcStart + srcLength
).
srcText
to start matching srcLength the number of characters in srcChars
to match
srcChars
, false otherwise
Definition at line 4685 of file unistr.h.
References u_strlen().
◆ startsWith() [2/4] UBool icu::UnicodeString::startsWith ( const UnicodeString & srcText, int32_t srcStart, int32_t srcLength ) const inlineDetermine if this starts with the characters in srcText
in the range [srcStart
, srcStart + srcLength
).
srcText
to start matching srcLength the number of characters in srcText
to match
text
, false otherwise
Definition at line 4671 of file unistr.h.
◆ startsWith() [3/4]Determine if this starts with the characters in text
text
, false otherwise
Definition at line 4667 of file unistr.h.
References length().
◆ startsWith() [4/4]Determine if this starts with the characters in srcChars
srcChars
srcChars
, false otherwise
Definition at line 4677 of file unistr.h.
References u_strlen().
◆ swap()Swap strings.
Create a temporary substring for the specified range.
Unlike the substring constructor and setTo() functions, the object returned here will be a read-only alias (using getBuffer()) rather than copying the text. As a result, this substring operation is much faster but requires that the original string not be modified or deleted during the lifetime of the returned substring object.
Referenced by icu::MessagePattern::getSubstring().
◆ tempSubStringBetween()Create a temporary substring for the specified range.
Same as tempSubString(start, length) except that the substring range is specified as a (start, limit) pair (with an exclusive limit index) rather than a (start, length) pair.
Definition at line 4840 of file unistr.h.
◆ toLower() [1/2]Convert the characters in this to lower case following the conventions of the default locale.
Convert the characters in this to lower case following the conventions of a specific locale.
Titlecase this string, convenience function using the default locale.
Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others.
The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. It may be more efficient to always provide an iterator to avoid opening and closing one for each string. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
This function uses only the setText(), first() and next() methods of the provided break iterator.
Titlecase this string.
Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others.
The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. It may be more efficient to always provide an iterator to avoid opening and closing one for each string. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
This function uses only the setText(), first() and next() methods of the provided break iterator.
Titlecase this string, with options.
Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with options.)
The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. It may be more efficient to always provide an iterator to avoid opening and closing one for each string. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
This function uses only the setText(), first() and next() methods of the provided break iterator.
Convert the characters in this to UPPER CASE following the conventions of the default locale.
Convert the characters in this to UPPER CASE following the conventions of a specific locale.
Convert the UnicodeString to UTF-32.
Unpaired surrogates are replaced with U+FFFD. Calls u_strToUTF32WithSub().
template<typename StringClass >
StringClass& icu::UnicodeString::toUTF8String ( StringClass & result ) const inlineConvert the UnicodeString to UTF-8 and append the result to a standard string.
Unpaired surrogates are replaced with U+FFFD. Calls toUTF8().
Definition at line 1777 of file unistr.h.
References icu::Replaceable::length().
◆ trim()Trims leading and trailing whitespace from this UnicodeString.
Unescape a string of characters and return a string containing the result.
The following escape sequences are recognized:
\uhhhh 4 hex digits; h in [0-9A-Fa-f] \Uhhhhhhhh 8 hex digits \xhh 1-2 hex digits \ooo 1-3 octal digits; o in [0-7] \cX control-X; X is masked with 0x1F
as well as the standard ANSI C escapes:
\a => U+0007, \b => U+0008, \t => U+0009, \n => U+000A, \v => U+000B, \f => U+000C, \r => U+000D, \e => U+001B, \" => U+0022, \' => U+0027, \? => U+003F, \\ => U+005C
Anything else following a backslash is generically escaped. For example, "[a\\-z]" returns "[a-z]".
If an escape sequence is ill-formed, this method returns an empty string. An example of an ill-formed sequence is "\\u" followed by fewer than 4 hex digits.
This function is similar to u_unescape() but not identical to it. The latter takes a source char*, so it does escape recognition and also invariant conversion.
Unescape a single escape sequence and return the represented character.
See unescape() for a listing of the recognized escape sequences. The character at offset-1 is assumed (without checking) to be a backslash. If the escape sequence is ill-formed, or the offset is out of range, U_SENTINEL=-1 is returned.
Non-member UnicodeString swap function.
Definition at line 1990 of file unistr.h.
The documentation for this class was generated from the following file:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4