C API: Unicode Character Iteration. More...
Go to the source code of this file.
U_CAPI UChar32 uiter_current32 (UCharIterator *iter) Helper function for UCharIterator to get the code point at the current index. More...C API: Unicode Character Iteration.
Definition in file uiter.h.
◆ UITER_NO_STATE #define UITER_NO_STATE ((uint32_t)0xffffffff)Constant for UCharIterator getState() indicating an error or an unknown state.
Returned by uiter_getState()/UCharIteratorGetState when an error occurs. Also, some UCharIterator implementations may not be able to return a valid state for each position. This will be clearly documented for each such iterator (none of the public ones here).
Definition at line 86 of file uiter.h.
◆ UCharIterator ◆ UCharIteratorCurrentFunction type declaration for UCharIterator.current().
Return the code unit at the current position, or U_SENTINEL if there is none (index is at the limit).
Definition at line 188 of file uiter.h.
◆ UCharIteratorGetIndexFunction type declaration for UCharIterator.getIndex().
Gets the current position, or the start or limit of the iteration range.
This function may perform slowly for UITER_CURRENT after setState() was called, or for UITER_LENGTH, because an iterator implementation may have to count UChars if the underlying storage is not UTF-16.
Definition at line 107 of file uiter.h.
◆ UCharIteratorGetState typedef uint32_t UCharIteratorGetState(const UCharIterator *iter)Function type declaration for UCharIterator.getState().
Get the "state" of the iterator in the form of a single 32-bit word. It is recommended that the state value be calculated to be as small as is feasible. For strings with limited lengths, fewer than 32 bits may be sufficient.
This is used together with setState()/UCharIteratorSetState to save and restore the iterator position more efficiently than with getIndex()/move().
The iterator state is defined as a uint32_t value because it is designed for use in ucol_nextSortKeyPart() which provides 32 bits to store the state of the character iterator.
With some UCharIterator implementations (e.g., UTF-8), getting and setting the UTF-16 index with existing functions (getIndex(UITER_CURRENT) followed by move(pos, UITER_ZERO)) is possible but relatively slow because the iterator has to "walk" from a known index to the requested one. This takes more time the farther it needs to go.
An opaque state value allows an iterator implementation to provide an internal index (UTF-8: the source byte array index) for fast, constant-time restoration.
After calling setState(), a getIndex(UITER_CURRENT) may be slow because the UTF-16 index may not be restored as well, but the iterator can deliver the correct text contents and move relative to the current position without performance degradation.
Some UCharIterator implementations may not be able to return a valid state for each position, in which case they return UITER_NO_STATE instead. This will be clearly documented for each such iterator (none of the public ones here).
Definition at line 281 of file uiter.h.
◆ UCharIteratorHasNextFunction type declaration for UCharIterator.hasNext().
Check if current() and next() can still return another code unit.
Definition at line 159 of file uiter.h.
◆ UCharIteratorHasPrevious ◆ UCharIteratorMoveFunction type declaration for UCharIterator.move().
Use iter->move(iter, index, UITER_ZERO) like CharacterIterator::setIndex(index).
Moves the current position relative to the start or limit of the iteration range, or relative to the current position itself. The movement is expressed in numbers of code units forward or backward by specifying a positive or negative delta. Out of bounds movement will be pinned to the start or limit.
This function may perform slowly for moving relative to UITER_LENGTH because an iterator implementation may have to count the rest of the UChars if the native storage is not UTF-16.
When moving relative to the limit or length, or relative to the current position after setState() was called, move() may return UITER_UNKNOWN_INDEX (-2) to avoid an inefficient determination of the actual UTF-16 index. The actual index can be determined with getIndex(UITER_CURRENT) which will count the UChars if necessary. See UITER_UNKNOWN_INDEX for details.
Definition at line 144 of file uiter.h.
◆ UCharIteratorNextFunction type declaration for UCharIterator.next().
Return the code unit at the current index and increment the index (post-increment, like s[i++]), or return U_SENTINEL if there is none (index is at the limit).
Definition at line 204 of file uiter.h.
◆ UCharIteratorOrigin ◆ UCharIteratorPreviousFunction type declaration for UCharIterator.previous().
Decrement the index and return the code unit from there (pre-decrement, like s[–i]), or return U_SENTINEL if there is none (index is at the start).
Definition at line 220 of file uiter.h.
◆ UCharIteratorReserved typedef int32_t UCharIteratorReserved(UCharIterator *iter, int32_t something) ◆ UCharIteratorSetStateFunction type declaration for UCharIterator.setState().
Restore the "state" of the iterator using a state word from a getState() call. The iterator object need not be the same one as for which getState() was called, but it must be of the same type (set up using the same uiter_setXYZ function) and it must iterate over the same string (binary identical regardless of memory address). For more about the state word see UCharIteratorGetState.
After calling setState(), a getIndex(UITER_CURRENT) may be slow because the UTF-16 index may not be restored as well, but the iterator can deliver the correct text contents and move relative to the current position without performance degradation.
Definition at line 309 of file uiter.h.
◆ anonymous enumConstants for UCharIterator.
Constant value that may be returned by UCharIteratorMove indicating that the final UTF-16 index is not known, but that the move succeeded.
This can occur when moving relative to limit or length, or when moving relative to the current index after a setState() when the current UTF-16 index is not known.
It would be very inefficient to have to count from the beginning of the text just to get the current/limit/length index after moving relative to it. The actual index can be determined with getIndex(UITER_CURRENT) which will count the UChars if necessary.
Definition at line 56 of file uiter.h.
◆ UCharIteratorOrigin ◆ uiter_current32()Helper function for UCharIterator to get the code point at the current index.
Return the code point that includes the code unit at the current position, or U_SENTINEL if there is none (index is at the limit). If the current code unit is a lead or trail surrogate, then the following or preceding surrogate is used to form the code point value.
Get the "state" of the iterator in the form of a single 32-bit word.
This is a convenience function that calls iter->getState(iter) if iter->getState is not NULL; if it is NULL or any other error occurs, then UITER_NO_STATE is returned.
Some UCharIterator implementations may not be able to return a valid state for each position, in which case they return UITER_NO_STATE instead. This will be clearly documented for each such iterator (none of the public ones here).
Helper function for UCharIterator to get the next code point.
Return the code point at the current index and increment the index (post-increment, like s[i++]), or return U_SENTINEL if there is none (index is at the limit).
Helper function for UCharIterator to get the previous code point.
Decrement the index and return the code point from there (pre-decrement, like s[–i]), or return U_SENTINEL if there is none (index is at the start).
Set up a UCharIterator to wrap around a C++ CharacterIterator.
Sets the UCharIterator function pointers for iteration using the CharacterIterator charIter.
The CharacterIterator pointer charIter is set into UCharIterator.context without copying or cloning the CharacterIterator object. The other "protected" UCharIterator fields are set to 0 and will be ignored. The iteration index and boundaries are controlled by the CharacterIterator.
getState() simply returns the current index. move() will always return the final index.
Set up a UCharIterator to iterate over a C++ Replaceable.
Sets the UCharIterator function pointers for iteration over the Replaceable rep with iteration boundaries start=index=0 and length=limit=rep->length(). The "provider" may set the start, index, and limit values at any time within the range 0..length=rep->length(). The length field will be ignored.
The Replaceable pointer rep is set into UCharIterator.context without copying or cloning/reallocating the Replaceable object.
getState() simply returns the current index. move() will always return the final index.
Restore the "state" of the iterator using a state word from a getState() call.
This is a convenience function that calls iter->setState(iter, state, pErrorCode) if iter->setState is not NULL; if it is NULL, then U_UNSUPPORTED_ERROR is set.
Set up a UCharIterator to iterate over a string.
Sets the UCharIterator function pointers for iteration over the string s with iteration boundaries start=index=0 and length=limit=string length. The "provider" may set the start, index, and limit values at any time within the range 0..length. The length field will be ignored.
The string pointer s is set into UCharIterator.context without copying or reallocating the string contents.
getState() simply returns the current index. move() will always return the final index.
Set up a UCharIterator to iterate over a UTF-16BE string (byte vector with a big-endian pair of bytes per UChar).
Everything works just like with a normal UChar iterator (uiter_setString), except that UChars are assembled from byte pairs, and that the length argument here indicates an even number of bytes.
getState() simply returns the current index. move() will always return the final index.
Set up a UCharIterator to iterate over a UTF-8 string.
Sets the UCharIterator function pointers for iteration over the UTF-8 string s with UTF-8 iteration boundaries 0 and length. The implementation counts the UTF-16 index on the fly and lazily evaluates the UTF-16 length of the text.
The start field is used as the UTF-8 offset, the limit field as the UTF-8 length. When the reservedField is not 0, then it contains a supplementary code point and the UTF-16 index is between the two corresponding surrogates. At that point, the UTF-8 index is behind that code point.
The UTF-8 string pointer s is set into UCharIterator.context without copying or reallocating the string contents.
getState() returns a state value consisting of
getState() cannot also encode the UTF-16 index in the state value. move(relative to limit or length), or move(relative to current) after setState(), may return UITER_UNKNOWN_INDEX.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4