Selection constants for Unicode properties.
These constants are used in functions like u_hasBinaryProperty to select one of the Unicode properties.
The properties APIs are intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
Important: If ICU is built with UCD files from Unicode versions below, e.g., 3.2, then properties marked with "new in Unicode 3.2" are not or not fully available. Check u_getUnicodeVersion to be sure.
Enumerator UCHAR_ALPHABETICBinary property Alphabetic.
Same as u_isUAlphabetic, different from u_isalpha. Lu+Ll+Lt+Lm+Lo+Nl+Other_Alphabetic
First constant for binary Unicode properties.
Binary property ASCII_Hex_Digit.
0-9 A-F a-f
Binary property Bidi_Control.
Format controls which have specific functions in the Bidi Algorithm.
Binary property Bidi_Mirrored.
Characters that may change display in RTL text. Same as u_isMirrored. See Bidi Algorithm, UTR 9.
Binary property Dash.
Variations of dashes.
Binary property Default_Ignorable_Code_Point (new in Unicode 3.2).
Ignorable in most processing. <2060..206F, FFF0..FFFB, E0000..E0FFF>+Other_Default_Ignorable_Code_Point+(Cf+Cc+Cs-White_Space)
Binary property Deprecated (new in Unicode 3.2).
The usage of deprecated characters is strongly discouraged.
Binary property Diacritic.
Characters that linguistically modify the meaning of another character to which they apply.
Binary property Extender.
Extend the value or shape of a preceding alphabetic character, e.g., length and iteration marks.
Binary property Full_Composition_Exclusion.
CompositionExclusions.txt+Singleton Decompositions+ Non-Starter Decompositions.
Binary property Grapheme_Base (new in Unicode 3.2).
For programmatic determination of grapheme cluster boundaries. [0..10FFFF]-Cc-Cf-Cs-Co-Cn-Zl-Zp-Grapheme_Link-Grapheme_Extend-CGJ
Binary property Grapheme_Extend (new in Unicode 3.2).
For programmatic determination of grapheme cluster boundaries. Me+Mn+Mc+Other_Grapheme_Extend-Grapheme_Link-CGJ
Binary property Grapheme_Link (new in Unicode 3.2).
For programmatic determination of grapheme cluster boundaries.
Binary property Hex_Digit.
Characters commonly used for hexadecimal numbers.
Binary property Hyphen.
Dashes used to mark connections between pieces of words, plus the Katakana middle dot.
Binary property ID_Continue.
Characters that can continue an identifier. DerivedCoreProperties.txt also says "NOTE: Cf characters should be filtered out." ID_Start+Mn+Mc+Nd+Pc
Binary property ID_Start.
Characters that can start an identifier. Lu+Ll+Lt+Lm+Lo+Nl
Binary property Ideographic.
CJKV ideographs.
Binary property IDS_Binary_Operator (new in Unicode 3.2).
For programmatic determination of Ideographic Description Sequences.
Binary property IDS_Trinary_Operator (new in Unicode 3.2).
For programmatic determination of Ideographic Description Sequences.
Binary property Join_Control.
Format controls for cursive joining and ligation.
Binary property Logical_Order_Exception (new in Unicode 3.2).
Characters that do not use logical order and require special handling in most processing.
Binary property Lowercase.
Same as u_isULowercase, different from u_islower. Ll+Other_Lowercase
Binary property Math.
Sm+Other_Math
Binary property Noncharacter_Code_Point.
Code points that are explicitly defined as illegal for the encoding of characters.
Binary property Quotation_Mark.
Binary property Radical (new in Unicode 3.2).
For programmatic determination of Ideographic Description Sequences.
Binary property Soft_Dotted (new in Unicode 3.2).
Characters with a "soft dot", like i or j. An accent placed on these characters causes the dot to disappear.
Binary property Terminal_Punctuation.
Punctuation characters that generally mark the end of textual units.
Binary property Unified_Ideograph (new in Unicode 3.2).
For programmatic determination of Ideographic Description Sequences.
Binary property Uppercase.
Same as u_isUUppercase, different from u_isupper. Lu+Other_Uppercase
Binary property White_Space.
Same as u_isUWhiteSpace, different from u_isspace and u_isWhitespace. Space characters+TAB+CR+LF-ZWSP-ZWNBSP
Binary property XID_Continue.
ID_Continue modified to allow closure under normalization forms NFKC and NFKD.
Binary property XID_Start.
ID_Start modified to allow closure under normalization forms NFKC and NFKD.
Binary property Case_Sensitive.
Either the source of a case mapping or in the target of a case mapping. Not the same as the general category Cased_Letter.
Binary property STerm (new in Unicode 4.0.1).
Sentence Terminal. Used in UAX #29: Text Boundaries (http://www.unicode.org/reports/tr29/)
Binary property Variation_Selector (new in Unicode 4.0.1).
Indicates all those characters that qualify as Variation Selectors. For details on the behavior of these characters, see StandardizedVariants.html and 15.6 Variation Selectors.
Binary property NFD_Inert.
ICU-specific property for characters that are inert under NFD, i.e., they do not interact with adjacent characters. See the documentation for the Normalizer2 class and the Normalizer2::isInert() method.
Binary property NFKD_Inert.
ICU-specific property for characters that are inert under NFKD, i.e., they do not interact with adjacent characters. See the documentation for the Normalizer2 class and the Normalizer2::isInert() method.
Binary property NFC_Inert.
ICU-specific property for characters that are inert under NFC, i.e., they do not interact with adjacent characters. See the documentation for the Normalizer2 class and the Normalizer2::isInert() method.
Binary property NFKC_Inert.
ICU-specific property for characters that are inert under NFKC, i.e., they do not interact with adjacent characters. See the documentation for the Normalizer2 class and the Normalizer2::isInert() method.
Binary Property Segment_Starter.
ICU-specific property for characters that are starters in terms of Unicode normalization and combining character sequences. They have ccc=0 and do not occur in non-initial position of the canonical decomposition of any character (like a-umlaut in NFD and a Jamo T in an NFD(Hangul LVT)). ICU uses this property for segmenting a string for generating a set of canonically equivalent strings, e.g. for canonical closure while processing collation tailoring rules.
Binary property Pattern_Syntax (new in Unicode 4.1).
See UAX #31 Identifier and Pattern Syntax (http://www.unicode.org/reports/tr31/)
Binary property Pattern_White_Space (new in Unicode 4.1).
See UAX #31 Identifier and Pattern Syntax (http://www.unicode.org/reports/tr31/)
Binary property alnum (a C/POSIX character class).
Implemented according to the UTS #18 Annex C Standard Recommendation. See the uchar.h file documentation.
Binary property blank (a C/POSIX character class).
Implemented according to the UTS #18 Annex C Standard Recommendation. See the uchar.h file documentation.
Binary property graph (a C/POSIX character class).
Implemented according to the UTS #18 Annex C Standard Recommendation. See the uchar.h file documentation.
Binary property print (a C/POSIX character class).
Implemented according to the UTS #18 Annex C Standard Recommendation. See the uchar.h file documentation.
Binary property xdigit (a C/POSIX character class).
Implemented according to the UTS #18 Annex C Standard Recommendation. See the uchar.h file documentation.
Binary property Cased.
For Lowercase, Uppercase and Titlecase characters.
Binary property Case_Ignorable.
Used in context-sensitive case mappings.
Binary property Changes_When_Lowercased.
Binary property Changes_When_Uppercased.
Binary property Changes_When_Titlecased.
Binary property Changes_When_Casefolded.
Binary property Changes_When_Casemapped.
Binary property Changes_When_NFKC_Casefolded.
Binary property Emoji.
See http://www.unicode.org/reports/tr51/#Emoji_Properties
Binary property Emoji_Presentation.
See http://www.unicode.org/reports/tr51/#Emoji_Properties
Binary property Emoji_Modifier.
See http://www.unicode.org/reports/tr51/#Emoji_Properties
Binary property Emoji_Modifier_Base.
See http://www.unicode.org/reports/tr51/#Emoji_Properties
Binary property Emoji_Component.
See http://www.unicode.org/reports/tr51/#Emoji_Properties
Binary property Regional_Indicator.
Binary property Prepended_Concatenation_Mark.
Binary property Extended_Pictographic.
See http://www.unicode.org/reports/tr51/#Emoji_Properties
Binary property of strings Basic_Emoji.
See https://www.unicode.org/reports/tr51/#Emoji_Sets
Binary property of strings Emoji_Keycap_Sequence.
See https://www.unicode.org/reports/tr51/#Emoji_Sets
Binary property of strings RGI_Emoji_Modifier_Sequence.
See https://www.unicode.org/reports/tr51/#Emoji_Sets
Binary property of strings RGI_Emoji_Flag_Sequence.
See https://www.unicode.org/reports/tr51/#Emoji_Sets
Binary property of strings RGI_Emoji_Tag_Sequence.
See https://www.unicode.org/reports/tr51/#Emoji_Sets
Binary property of strings RGI_Emoji_ZWJ_Sequence.
See https://www.unicode.org/reports/tr51/#Emoji_Sets
Binary property of strings RGI_Emoji.
See https://www.unicode.org/reports/tr51/#Emoji_Sets
Binary property IDS_Unary_Operator.
For programmatic determination of Ideographic Description Sequences.
Binary property ID_Compat_Math_Start.
Used in mathematical identifier profile in UAX #31.
Binary property ID_Compat_Math_Continue.
Used in mathematical identifier profile in UAX #31.
Binary property Modifier_Combining_Mark.
Used by the AMTRA algorithm in UAX #53.
One more than the last constant for binary Unicode properties.
Enumerated property Bidi_Class.
Same as u_charDirection, returns UCharDirection values.
First constant for enumerated/integer Unicode properties.
Enumerated property Block.
Same as ublock_getCode, returns UBlockCode values.
Enumerated property Canonical_Combining_Class.
Same as u_getCombiningClass, returns 8-bit numeric values.
Enumerated property Decomposition_Type.
Returns UDecompositionType values.
Enumerated property East_Asian_Width.
See http://www.unicode.org/reports/tr11/ Returns UEastAsianWidth values.
Enumerated property General_Category.
Same as u_charType, returns UCharCategory values.
Enumerated property Joining_Group.
Returns UJoiningGroup values.
Enumerated property Joining_Type.
Returns UJoiningType values.
Enumerated property Line_Break.
Returns ULineBreak values.
Enumerated property Numeric_Type.
Returns UNumericType values.
Enumerated property Script.
Same as uscript_getScript, returns UScriptCode values.
Enumerated property Hangul_Syllable_Type, new in Unicode 4.
Returns UHangulSyllableType values.
Enumerated property NFD_Quick_Check.
Returns UNormalizationCheckResult values.
Enumerated property NFKD_Quick_Check.
Returns UNormalizationCheckResult values.
Enumerated property NFC_Quick_Check.
Returns UNormalizationCheckResult values.
Enumerated property NFKC_Quick_Check.
Returns UNormalizationCheckResult values.
Enumerated property Lead_Canonical_Combining_Class.
ICU-specific property for the ccc of the first code point of the decomposition, or lccc(c)=ccc(NFD(c)[0]). Useful for checking for canonically ordered text; see UNORM_FCD and http://www.unicode.org/notes/tn5/#FCD . Returns 8-bit numeric values like UCHAR_CANONICAL_COMBINING_CLASS.
Enumerated property Trail_Canonical_Combining_Class.
ICU-specific property for the ccc of the last code point of the decomposition, or tccc(c)=ccc(NFD(c)[last]). Useful for checking for canonically ordered text; see UNORM_FCD and http://www.unicode.org/notes/tn5/#FCD . Returns 8-bit numeric values like UCHAR_CANONICAL_COMBINING_CLASS.
Enumerated property Grapheme_Cluster_Break (new in Unicode 4.1).
Used in UAX #29: Text Boundaries (http://www.unicode.org/reports/tr29/) Returns UGraphemeClusterBreak values.
Enumerated property Sentence_Break (new in Unicode 4.1).
Used in UAX #29: Text Boundaries (http://www.unicode.org/reports/tr29/) Returns USentenceBreak values.
Enumerated property Word_Break (new in Unicode 4.1).
Used in UAX #29: Text Boundaries (http://www.unicode.org/reports/tr29/) Returns UWordBreakValues values.
Enumerated property Bidi_Paired_Bracket_Type (new in Unicode 6.3).
Used in UAX #9: Unicode Bidirectional Algorithm (http://www.unicode.org/reports/tr9/) Returns UBidiPairedBracketType values.
Enumerated property Indic_Positional_Category.
New in Unicode 6.0 as provisional property Indic_Matra_Category; renamed and changed to informative in Unicode 8.0. See http://www.unicode.org/reports/tr44/#IndicPositionalCategory.txt
Enumerated property Indic_Syllabic_Category.
New in Unicode 6.0 as provisional; informative since Unicode 8.0. See http://www.unicode.org/reports/tr44/#IndicSyllabicCategory.txt
Enumerated property Vertical_Orientation.
Used for UAX #50 Unicode Vertical Text Layout (https://www.unicode.org/reports/tr50/). New as a UCD property in Unicode 10.0.
Enumerated property Identifier_Status.
Used for UTS #39 General Security Profile for Identifiers (https://www.unicode.org/reports/tr39/#General_Security_Profile).
Enumerated property Indic_Conjunct_Break.
Used in the grapheme cluster break algorithm in UAX #29.
One more than the last constant for enumerated/integer Unicode properties.
Bitmask property General_Category_Mask.
This is the General_Category property returned as a bit mask. When used in u_getIntPropertyValue(c), same as U_MASK(u_charType(c)), returns bit masks for UCharCategory values where exactly one bit is set. When used with u_getPropertyValueName() and u_getPropertyValueEnum(), a multi-bit mask is used for sets of categories like "Letters". Mask values should be cast to uint32_t.
First constant for bit-mask Unicode properties.
One more than the last constant for bit-mask Unicode properties.
Double property Numeric_Value.
Corresponds to u_getNumericValue.
First constant for double Unicode properties.
One more than the last constant for double Unicode properties.
String property Age.
Corresponds to u_charAge.
First constant for string Unicode properties.
String property Bidi_Mirroring_Glyph.
Corresponds to u_charMirror.
String property Case_Folding.
Corresponds to u_strFoldCase in ustring.h.
Deprecated string property ISO_Comment.
Corresponds to u_getISOComment.
String property Lowercase_Mapping.
Corresponds to u_strToLower in ustring.h.
String property Name.
Corresponds to u_charName.
String property Simple_Case_Folding.
Corresponds to u_foldCase.
String property Simple_Lowercase_Mapping.
Corresponds to u_tolower.
String property Simple_Titlecase_Mapping.
Corresponds to u_totitle.
String property Simple_Uppercase_Mapping.
Corresponds to u_toupper.
String property Titlecase_Mapping.
Corresponds to u_strToTitle in ustring.h.
String property Unicode_1_Name.
This property is of little practical value. Beginning with ICU 49, ICU APIs return an empty string for this property. Corresponds to u_charName(U_UNICODE_10_CHAR_NAME).
String property Uppercase_Mapping.
Corresponds to u_strToUpper in ustring.h.
String property Bidi_Paired_Bracket (new in Unicode 6.3).
Corresponds to u_getBidiPairedBracket.
One more than the last constant for string Unicode properties.
Miscellaneous property Script_Extensions (new in Unicode 6.0).
Some characters are commonly used in multiple scripts. For more information, see UAX #24: http://www.unicode.org/reports/tr24/. Corresponds to uscript_hasScript and uscript_getScriptExtensions in uscript.h.
First constant for Unicode properties with unusual value types.
Miscellaneous property Identifier_Type.
Used for UTS #39 General Security Profile for Identifiers (https://www.unicode.org/reports/tr39/#General_Security_Profile).
Corresponds to u_hasIDType() and u_getIDTypes().
Each code point maps to a set of UIdentifierType values.
One more than the last constant for Unicode properties with unusual value types.
Represents a nonexistent or invalid property or property value.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4