A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.unicode.org/versions/Unicode15.0.0/ below:

Unicode 15.0.0

Unicode® 15.0.0 2022 September 13 (Announcement)

Version 15.0.0 has been superseded by the latest version of the Unicode Standard.

This page summarizes the important changes for the Unicode Standard, Version 15.0.0. This version supersedes all previous versions of the Unicode Standard.

A. Summary
B. Technical Overview
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Changes in the Unicode Character Database
G. Changes in the Unicode Standard Annexes
H. Changes in Synchronized Unicode Technical Standards
M. Implications for Migration
A. Summary

Unicode 15.0 adds 4,489 characters, for a total of 149,186 characters. These additions include 2 new scripts, for a total of 161 scripts, along with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean) ideographs.

The new scripts and characters in Version 15.0 add support for lesser-used languages and unique written requirements worldwide, including numerous symbols additions. Funds from the Adopt-a-Character program provided support for some of these additions. The new scripts and characters include:

Popular symbol additions:

Other symbol and notational additions include:

Support for other languages and scholarly work worldwide includes:

Updates to the CJK blocks add:

Support for CJK unified ideographs was enhanced in Version 15.0 by significant corrections and improvements to the Unihan database. Changes to the Unihan database include updated source lists, regular expressions, and new and updated fields. See UAX #38, Unicode Han Database (Unihan) for more information on the updates.

Important chart font updates, including:

Synchronization

Several other important Unicode specifications have been updated for Version 15.0. The following four Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 15.0:

Some of the changes in Version 15.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51.

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

B. Technical Overview

Version 15.0 of the Unicode Standard consists of:

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Core Specification

The core specification is available as a single pdf for viewing. (14 MB) Links are also available in the navigation bar on the left of this page to access individual chapters and appendices of the core specification.

Code Charts

Several sets of code charts are available. They serve different purposes:

For Unicode 15.0.0 in particular two additional sets of code chart pages are provided:

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Unicode Standard Annexes

Links to the individual Unicode Standard Annexes are available in the navigation bar on the left of this page. The list of significant changes in the content of the Unicode Standard Annexes for Version 15.0 can be found in Section G below.

Unicode Character Database

Data files for Version 15.0 of the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap to the functions of the various subdirectories. Zipped versions of the UCD for bulk download are available, as well.

Version References

Version 15.0.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 15.0.0, (Mountain View, CA: The Unicode Consortium, 2022. ISBN 978-1-936213-32-0)
https://www.unicode.org/versions/Unicode15.0.0/

The terms “Version 15.0” or “Unicode 15.0” are abbreviations for the full version reference, Version 15.0.0.

The citation and permalink for the latest published version of the Unicode Standard is:

The Unicode Consortium. The Unicode Standard.
https://www.unicode.org/versions/latest/

A complete specification of the contributory files for Unicode 15.0 is found on the page Components for 15.0.0. That page also provides the recommended reference format for Unicode Standard Annexes. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.

Errata

Errata incorporated into Unicode 15.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 15.0, see the list of current Updates and Errata.

C. Stability Policy Update

The Alias Stability policy of the Unicode Character Encoding Stability Policies was updated between Versions 14.0 and 15.0. In addition to guaranteeing that no property alias or property value alias will ever be removed from the standard, it also now guarantees that the exact spelling of a property alias or property value alias will never change. This has already long been the UTC practice for maintaining these aliases, but the additional guarantee is intended to assist in keeping regular expressions which refer to Unicode property values valid and stable.

A new Property Domain Stability policy has been added to the Unicode Character Encoding Stability Policies as of Version 15.0. That stability policy guarantees that any existing property of characters can never be turned into a property of strings and that any existing property of strings can never be turned into a property of characters.

D. Textual Changes and Character Additions

Two new scripts were added with accompanying new block descriptions:

Script Number of
Characters Kawi 86 Nag Mundari 42

Changes in the Unicode Standard Annexes are listed in Section G.

Character Assignment Overview

4,489 characters have been added. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see delta code charts.

New Blocks

The newly-defined blocks in Version 15.0 are:

Range Block Name 10EC0..10EFF Arabic Extended-C 11B00..11B5F Devanagari Extended-A 11F00..11F5F Kawi 1D2C0..1D2DF Kaktovik Numerals 1E030..1E08F Cyrillic Extended-D 1E4D0..1E4FF Nag Mundari 31350..323AF CJK Unified Ideographs Extension H E. Conformance Changes

There are no significant new conformance requirements in Unicode 15.0.

F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 15.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in Section M.

G. Changes in the Unicode Standard Annexes

In Version 15.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex Changes UAX #9
Unicode Bidirectional Algorithm The text under UAX9-C2 was amended to emphasize that higher-level protocols should be used to mitigate misleading bidirectional ordering of source code, including potential spoofing attacks. An extended example of use of the higher-level protocol HL4 for program text was added in Section 4.3.2, HL Example 2 for Program Text. UAX #11
East Asian Width No significant changes in this version. UAX #14
Unicode Line Breaking Algorithm An outdated note regarding special behavior of U+23B6 was removed from Section 5.1, Description of Line Breaking Properties (Quotation). UAX #15
Unicode Normalization Forms The text in Section 5.1, Composition Exclusion Types was updated. UAX #24
Unicode Script Property No significant changes in this version. UAX #29
Unicode Text Segmentation No significant changes in this version. UAX #31
Unicode Identifier and Pattern Syntax The text now clarifies that contextual restrictions on ZWJ and ZWNJ are applicable only if the default identifier syntax is customized to add those characters. Important guidance on profiles for default identifiers is presented in UAX31-R1. The text now clarifies that requirement UAX31-R3 Pattern_White_Space and Pattern_Syntax Characters is applicable not only to pattern syntaxes, but also to programming languages. In particular, some Pattern_Whitespace characters are relevant to issues of bidirectional ordering and potential spoofing attacks. The two new scripts for Unicode 15.0 were added to the Excluded Scripts table. UAX #34
Unicode Named Character Sequences A further clarification was added about medial hyphen in UAX34-R3. The explanation of the Unicode namespace for character names was extended in UAX34-D3. UAX #38
Unicode Han Database (Unihan) Information about CJK Extension H and the single-character extension to CJK Extension C were added. The sources and syntax were updated for kIRG_GSource and kIRG_TSource. The syntax was updated for several fields dealing with variants. A new field, kAlternateTotalStrokes was added. Several new sections dealing with details of sources were added to the text. UAX #41
Common References for Unicode Standard Annexes All references were updated for Unicode 15.0. UAX #42
Unicode Character Database in XML New code point attributes, values, and patterns were added for Unicode 15.0. UAX #44
Unicode Character Database The documentation was updated to describe the changes to the UCD for Version 15.0. UAX #45
U-Source Ideographs The status "ExtH" was added for the new CJK Extension H block, and the status values for the existing CJK ideograph blocks were improved. A new section was added to the text, describing the Ideographic Description Sequence field in USourceData.txt. UAX #50
Unicode Vertical Text Layout A short section was added discussing the limits of the applicability of the Vertical_Orientation property when dealing with right-to-left scripts. H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical Standard Changes UTS #10
Unicode Collation Algorithm No significant changes in this version. UTS #39
Unicode Security Mechanisms The zero width joiner (ZWJ) and zero width non-joiner (ZWNJ) characters are changed from Identifier_Status=Allowed to Identifier_Status=Restricted; they are therefore no longer allowed by the General Security Profile by default. Implementations of the General Profile for Identifiers that need to retain ZWJ and ZWNJ should declare that they use a modification of the profile per Section 2, Conformance, and should ensure that they implement the restrictions described in Section 3.1.1, Joining Controls. UTS #46
Unicode IDNA Compatibility Processing A note was added to Section 4.2, ToASCII regarding the empty label for the DNS root. New data files were added, to define the IDNA Derived Property (for this version and all earlier versions back to Unicode 6.1). UTS #51
Unicode Emoji The definition of emoji_zwj_element was updated. The emoji flag sequence definition was updated to better align with the discussion in Annex B, Valid Emoji Flag Sequences. The rules in Section 1.4.9, EBNF and Regex were updated. The text in Section 2.7.1, Emoji and Text Presentation Selectors was updated to clarify the behavior of the text presentation selector on emoji ZWJ sequences. M. Implications for Migration

There are a significant number of changes in Unicode 15.0 which may impact implementations upgrading to Version 15.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.

Script-related Changes

Two new scripts have been added in Unicode 15.0.0. Some of these scripts have particular attributes which may cause issues for implementations. The more important of these attributes are summarized here.

Numeric Property Issues Multiple @missing Lines in UCD Property Files

Starting with Version 15.0, some data files in the UCD may contain multiple @missing lines defined for the same property. This is currently the case for DerivedBidiClass.txt, DerivedEastAsianWidth.txt, and DerivedLineBreak.txt.

The effect of this change on implementations that parse the UCD data files is a bit subtle. There are basically three categories to take into account when considering migration issues:

  1. UCD file parsers which completely ignore the @missing lines and which have been depending on hard-coded ranges for all default values will not be impacted by this change. However, such parsers may be in the minority, because they are always impacted whenever a default property assignment range is changed for a release. (See below for the change in default Bidi_Class values for unassigned characters in the newly defined Arabic Extended-C block in Unicode 15.0.)
  2. UCD file parsers which completely ignore the @missing lines but which have been depending on the derived extracted UCD data files such as DerivedBidiClass.txt to parse the correct default property values for all unassigned code points will be impacted by this change. Such parsers will either have to be updated to use hard-coded ranges or to interpret the multiple @missing lines correctly, as the unassigned code point values are no longer listed explicitly in DerivedBidiClass.txt (and similar data files).
  3. UCD file parsers which do interpret the @missing lines may be impacted by this change. If they have been treating @missing lines exactly like the data lines in the file, overriding defined ranges as they process each line, they should be unaffected. Such a parsing strategy will simply end up processing more @missing line ranges than before, but will produce identical results. However, parsers which special case the @missing lines and/or which expect only a single @missing line to occur, may need to be updated to get correct results.

See UAX #44 Section 4.2.10, @missing Conventions for more details.

Other Property Issues CJK/Unihan Changes

See UAX #38, Unicode Han Database (Unihan) for further details on these changes, especially Section 4.2, Listing by Date of Addition to the Unicode Standard, and Section 4.3, Listing by Location within Unihan.zip. UAX #38 also has updated regex values for numerous Unihan properties.

IDNA Changes Emoji Changes

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.3