struct text_encoding;
(since C++26)The class text_encoding
provides a mechanism for identifying character encodings. It is used to determine the ordinary character literal encoding of the translation environment at compile-time and the character encoding of the execution environment at runtime.
Each text_encoding
object encapsulates a character encoding scheme, uniquely identified by an enumerator in text_encoding::id
and a corresponding name represented by a null-terminated byte string. These can be accessed through the mib() and name() member functions, respectively. The determination of whether an object represents a character encoding scheme implemented in the translation or execution environment is implementation-defined.
The class text_encoding
is a TriviallyCopyable type. The array object representing the corresponding name of the character encoding scheme is nested within the text_encoding
object itself. The stored name is limited to a maximum of max_name_length characters excluding the null character '\0'.
The class supports both registered and non-registered character encodings. Registered encodings are those found in the IANA Character Sets Registry excluding the following character encodings:
In addition, the class provides access for registered character encodings to:
Non-registered encodings can be represented with an enumerator id::other or id::unknown and a custom name.
A text_encoding
object e whose MIBenum value is neither id::other nor id::unknown maintains the following invariants:
view
over aliases of the character encoding
mib_
(private) a MIBenum value with id::unknown as the default value
name_
(private) a stored primary name
text_encoding
object
text_encoding
representing the ordinary character literal encoding
text_encoding
representing the implementation-defined character encoding scheme of the execution environment
view
over aliases of the current character encoding
When working with character encodings, it is important to note that the primary names or aliases of two distinct registered character encodings are not equivalent when compared using Charset Alias Matching as described by the Unicode Technical Standard.
For convenience, the enumerators of text_encoding::id
are introduced as members of text_encoding
and can be accessed directly. This means that text_encoding::ASCII and text_encoding::id::ASCII refer to the same entity.
It is recommended that the implementation should treat registered encodings as not interchangeable. Additionally, the primary name of a registered encoding should not be used to describe a similar but different non-registered encoding, unless there is a clear precedent for doing so.
[edit] Example#include <locale> #include <print> #include <text_encoding> int main() { // literal encoding is known at compile-time constexpr std::text_encoding literal_encoding = std::text_encoding::literal(); // check for literal encoding static_assert(literal_encoding.mib() != std::text_encoding::other && literal_encoding.mib() != std::text_encoding::unknown); // environment encoding is only known at runtime std::text_encoding env_encoding = std::text_encoding::environment(); // associated encoding of the default locale std::text_encoding locale_encoding = std::locale("").encoding(); std::println("The literal encoding is {}", literal_encoding.name()); std::println("The aliases of literal encoding:"); for (const char* alias_name : literal_encoding.aliases()) std::println(" -> {}", alias_name); if (env_encoding == locale_encoding) std::println("Both environment and locale encodings are the same"); std::println("The environment encoding is {}", env_encoding.name()); std::println("The aliases of environment encoding:"); for (const char* alias_name : env_encoding.aliases()) std::println(" -> {}", alias_name); }
Possible output:
The literal encoding is UTF-8 The aliases of literal encoding: -> UTF-8 -> csUTF8 Both environment and locale encodings are the same The environment encoding is ANSI_X3.4-1968 The aliases of environment encoding: -> US-ASCII -> iso-ir-6 -> ANSI_X3.4-1968 -> ANSI_X3.4-1986 -> ISO_646.irv:1991 -> ISO646-US -> us -> IBM367 -> cp367 -> csASCII -> ASCII[edit] See also set of polymorphic facets that encapsulate cultural differences
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4