Version 8.0.0 has been superseded by the latest version of the Unicode Standard.
This page summarizes the important changes for the Unicode Standard, Version 8.0.0. This version supersedes all previous versions of the Unicode Standard.
A. SummaryA. Summary
B. Technical Overview
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Changes in the Unicode Character Database
G. Changes in the Unicode Standard Annexes
H. Changes in Synchronized Unicode Technical Standards
M. Implications for Migration
Unicode 8.0 adds a total of 7,716 characters, encompassing six new scripts and many new symbols, as well as character additions to several existing scripts. Notable character additions include the following:
Other important updates in Unicode Version 8.0 include:
Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and include updates for the repertoire additions made in Version 8.0, as well as other modifications:
This version of the Unicode Standard is synchronized with ISO/IEC 10646:2014, plus Amendment 1. Additionally, it includes the accelerated publication of U+20BE LARI SIGN, nine CJK unified ideographs (U+9FCD..U+9FD5), and 41 emoji characters.
See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.
B. Technical OverviewVersion 8.0 of the Unicode Standard consists of the core specification (download), the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).
The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.
A complete specification of the contributory files for Unicode 8.0 is found on the page Components for 8.0.0. That page also provides the recommended reference format for Unicode Standard Annexes. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.
The navigation bar on the left of this page provides links to both the core specification as a single file, as well as to individual chapters, and the appendices. Also provided are links to the code charts, the radical-stroke indices to CJK ideographs, the Unicode Standard Annexes and the data files for Version 8.0 of the Unicode Character Database.
Version SpecificationVersion 8.0.0 of the Unicode Standard should be referenced as:
The Unicode Consortium. The Unicode Standard, Version 8.0.0, (Mountain View, CA: The Unicode Consortium, 2015. ISBN 978-1-936213-10-8)
http://www.unicode.org/versions/Unicode8.0.0/
The terms “Version 8.0” or “Unicode 8.0” are abbreviations for the full version reference, Version 8.0.0.
The citation and permalink for the latest published version of the Unicode Standard is:
Code ChartsThe Unicode Consortium. The Unicode Standard.
http://www.unicode.org/versions/latest/
Several sets of code charts are available. They serve different purposes:
For Unicode 8.0.0 in particular two additional sets of code chart pages are provided:
The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.
ErrataErrata incorporated into Unicode 8.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 8.0, see the list of current Updates and Errata.
C. Stability Policy UpdateSix new scripts were added with accompanying new block descriptions:
Ahom Anatolian Hieroglyphs Hatran Multani Old Hungarian Sutton SignWritingLetters used in Arabic and in a number of modern and historic writing systems of South Asia were added. Version 8.0 also has a new notational system, Sutton SignWriting, used for transcription of various sign languages.
A number of popular emoji and other pictographic symbols are now included, as well as a mechanism for supporting diversity in emoji representing faces or people. More user interface symbols were also added to the standard.
Changes in the Unicode Standard Annexes are listed in Section G.
Character Assignment Overview7,716 characters have been added, including 5,771 CJK unified ideographs. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see Delta Code Charts.
New BlocksThe newly-defined blocks in Version 8.0 are:
Range
Block NameAB70..ABBF
Cherokee Supplement108E0..108FF
Hatran10C80..10CFF
Old Hungarian11280..112AF
Multani11700..1173F
Ahom12480..1254F
Early Dynastic Cuneiform14400..1467F
Anatolian Hieroglyphs1D800..1DAAF
Sutton SignWriting1F900..1F9FF
Supplemental Symbols and Pictographs2B820..2CEAF
CJK Unified Ideographs Extension E E. Conformance ChangesThere were no significant changes to the conformance clauses of the core specification for Unicode 8.0. However, there were minor changes to the rules in the algorithms specified in UAX #9, UAX #14, and UAX #29. Those rule changes will impact conformant implementations of the respective algorithms. See Section G. Changes in the Unicode Standard Annexes.
F. Changes in the Unicode Character DatabaseThe detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 8.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in Section M.
In Version 8.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.
Unicode Standard Annex Changes UAX #9There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.
Unicode Technical Standard Changes UTS #10UTS #39, Security Mechanisms, has also been updated for Version 8.0.
M. Implications for MigrationThere are a significant number of changes in Unicode 8.0 which may impact implementations which are upgrading to Version 8.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.
Casing and Case Folding of CherokeeThe character encoding model for the Cherokee script changed from unicameral to bicameral. The conversion was done by reclassifying all existing syllables as uppercase and adding a corresponding set of lowercase syllables. In terms of properties, the General_Category of the existing characters changed from Other_Letter to Uppercase_Letter, and the new characters were given the value Lowercase_Letter. A new case pair for the archaic syllable mv was also added.
The casing was chosen in order to reduce the migration cost for implementations, allowing them to preserve the font metrics for the existing characters and reduce the implications on layout. However, the formation of case pairs by adding lowercase characters is unusual. As a result, case folding of Cherokee maps to uppercase instead of lowercase. This mapping also has consequences on identifiers, as described in the changes to UAX #31, Unicode Identifier and Pattern Syntax.
Change in Encoding Model for New Tai Lue to Visual OrderThe character encoding model for New Tai Lue changed from logical order, in which pre-base vowels are stored after an initial consonant, to visual order, in which the pre-base vowels are stored before the initial consonant, as for Thai, Lao, and Tai Viet. The model was changed to better serve the primary user community in the Xishuangbanna region of China, who have been accumulating data input and stored in visual order, and have been using fonts with a visual order encoding to render it.
The encoding model change incurred a uniform General_Category reclassification of all New Tai Lue vowels signs and tone marks from Spacing_Mark to Other_Letter, the assignment of the property value Logical_Order_Exception=Yes to the pre-base vowels U+19B5..U+19B7 and U+19BA, and the addition of 176 pre-base vowel + initial consonant contractions to the Default Unicode Collation Element Table.
A visual order model complicates syllable identification and the processes for searching and sorting. Implementations switching to the visual order model can take advantage of techniques developed for processing Thai script data to address the issues associated with visual order encoding, and data stored in logical order should be carefully migrated.
Other Script-related ChangesVersion 8.0 adds six new scripts, so implementations which process script data should be carefully checked.
Additionally, there was a significant Script property value change affecting the common Arabic-Indic digits (U+0660..U+0669). These were changed from having the value "Common" to the value "Arabic". Their use with scripts other than Arabic is now more consistently dealt with by the Script_Extensions property, instead. Implementations which may have had special treatment for the Script property value of the Arabic-Indic digits should be checked to ensure that the change in Script property value does not cause unexpected behavior.
Changes for Deprecation of Language TagsThe range of tag characters (U+E0020..U+E007E) was changed from Deprecated=True to Deprecated=False in Version 8.0. This change was done to clear the way for the potential future use of tag characters for a purpose other than to represent language tags.
Note that two characters, U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG, remain deprecated. Furthermore, the use of tag characters to represent language tags in a plain text stream is still a deprecated mechanism for conveying language information about text.
Implementations which deliberately remove or refuse to interpret deprecated characters may need updates to prepare them for the potential use of U+E0020..U+E007E in Unicode 8.0 data in the future.
Glyph ChangesThe representative glyph for U+301C WAVE DASH was updated, so that it now shows a tilde shape instead of a reversed tilde shape. The updated glyph now aligns with majority practice in fonts for this character.
The representative glyph for U+3127 BOPOMOFO LETTER I was changed from a vertical orientation to a horizontal orientation. The updated glyph now aligns with majority practice for both horizontal and vertical layout of Bopomofo text, but implementations should be checked to verify correct behavior for rendering of this character.
Segmentation-related ChangesVersion 8.0 made small adjustments to line break and other segmentation rules. In particular:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4