A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.unicode.org/versions/Unicode7.0.0/ below:

Unicode 7.0.0

Unicode® 7.0.0 Released: 2014 June 16 (Announcement)

Version 7.0.0 has been superseded by the latest version of the Unicode Standard.

This page summarizes the important changes for the Unicode Standard, Version 7.0.0. This version supersedes all previous versions of the Unicode Standard.

A. Summary
B. Technical Overview
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Changes in the Unicode Character Database
G. Changes in the Unicode Standard Annexes
H. Changes in Synchronized Unicode Technical Standards
M. Implications for Migration
A. Summary

Unicode 7.0 adds a total of 2,834 characters, encompassing 23 new scripts and many new symbols, as well as character additions to many existing scripts. Notable character additions include the following:

Other important updates in Unicode Version 7.0 include:

Synchronization

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and include updates for the repertoire additions made in Version 7.0, as well as other modifications:

This version of the Unicode Standard is synchronized with ISO/IEC 10646:2012, plus Amendments 1 and 2. Additionally, it includes the accelerated publication of U+20BD RUBLE SIGN.

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

B. Technical Overview

Version 7.0 of the Unicode Standard consists of the core specification (download), the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

A complete specification of the contributory files for Unicode 7.0 is found on the page Components for 7.0.0. That page also provides the recommended reference format for Unicode Standard Annexes. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.

The navigation bar on the left of this page provides links to both the core specification as a single file, as well as to individual chapters, and the appendices. Also provided are links to the code charts, the radical-stroke indices to CJK ideographs, the Unicode Standard Annexes and the data files for Version 7.0 of the Unicode Character Database.

Version Specification

Version 7.0.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 7.0.0, (Mountain View, CA: The Unicode Consortium, 2014. ISBN 978-1-936213-09-2)
http://www.unicode.org/versions/Unicode7.0.0/

The terms “Version 7.0” or “Unicode 7.0” are abbreviations for the full version reference, Version 7.0.0.

The citation and permalink for the latest published version of the Unicode Standard is:

The Unicode Consortium. The Unicode Standard.
http://www.unicode.org/versions/latest/

Code Charts

Several sets of code charts are available. They serve different purposes:

For Unicode 7.0.0 in particular two additional sets of code chart pages are provided:

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Errata

Errata incorporated into Unicode 7.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 7.0, see the list of current Updates and Errata.

C. Stability Policy Update D. Textual Changes and Character Additions

The block descriptions in the core spec were reorganized significantly. Twenty-three new scripts were added with accompanying new block descriptions:

Bassa Vah Mahajani Pahawh Hmong Caucasian Albanian Manichaean Palmyrene Duployan Mende Kikakui Pau Cin Hau Elbasan Modi Psalter Pahlavi Grantha Mro Siddham Khojki Nabataean Tirhuta Khudawadi Old North Arabian Warang Citi Linear A Old Permic  

With Version 7.0, support for lesser-used languages was extended worldwide, including:

Letters used in Teuthonista and other transcriptional systems and a new notational set, Duployan, used for writing certain shorthands and Native American languages were added. Many symbols originating from the Wingdings and Webdings sets were also added, as well as more emoji and other pictographic symbols.

Changes in the Unicode Standard Annexes are listed in Section G.

Character Assignment Overview

327 characters have been added to the BMP, while 2,507 characters have been added to Plane 1. Most character additions are in new blocks, but there are also character additions to a number of existing blocks.

New Blocks

The newly-defined blocks in Version 7.0 are:

Range

Block Name

1AB0..1AFF

Combining Diacritical Marks Extended

A9E0..A9FF

Myanmar Extended-B

AB30..AB6F

Latin Extended-E

102E0..102FF

Coptic Epact Numbers

10350..1037F

Old Permic

10500..1052F

Elbasan

10530..1056F

Caucasian Albanian

10600..1077F

Linear A

10860..1087F

Palmyrene

10880..108AF

Nabataean

10A80..10A9F

Old North Arabian

10AC0..10AFF

Manichaean

10B80..10BAF

Psalter Pahlavi

11150..1117F

Mahajani

111E0..111FF

Sinhala Archaic Numbers

11200..1124F

Khojki

112B0..112FF

Khudawadi

11300..1137F

Grantha

11480..114DF

Tirhuta

11580..115FF

Siddham

11600..1165F

Modi

118A0..118FF

Warang Citi

11AC0..11AFF

Pau Cin Hau

16A40..16A6F

Mro

16AD0..16AFF

Bassa Vah

16B00..16B8F

Pahawh Hmong

1BC00..1BC9F

Duployan

1BCA0..1BCAF

Shorthand Format Controls

1E800..1E8DF

Mende Kikakui

1F650..1F67F

Ornamental Dingbats

1F780..1F7FF

Geometric Shapes Extended

1F800..1F8FF

Supplemental Arrows-C E. Conformance Changes F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 7.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in Section M.

There were several changes to Unihan data, including the addition of nearly 3,000 new Cantonese pronunciation entries, significant modification to the syntax for kIICore, and the relocation of  kRSUnicode and kCompatibilityVariant  to Unihan_IRGSources.txt.

Major enhancements were made to the Indic script properties. New property values were added to enable a more algorithmic approach to rendering Indic scripts. These include values for joining behavior, new classes for numbers, and a further division of the syllabic categories of viramas and rephas. With these enhancements, the default rendering for newly added Indic scripts can be significantly improved.

Other updates include changes to the derivations of the Alphabetic and Case_Ignorable properties, and a number of updates to the Script and Script_Extensions property assignments. Also, the conventions for defining default property values for ranges of code points using “@missing” directives was regularized.

G. Changes in the Unicode Standard Annexes

In Version 7.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex Changes UAX #9
Unicode Bidirectional Algorithm No significant changes in this version. UAX #11
East Asian Width No significant changes in this version. UAX #14
Unicode Line Breaking Algorithm No significant changes in this version. UAX #15
Unicode Normalization Forms Corrected note for Table 3, Notational Conventions. UAX #24
Unicode Script Property No significant changes in this version. UAX #29
Unicode Text Segmentation Added U+AA7D MYANMAR SIGN TAI LAING TONE-5 to the exception list for SpacingMark in Table 2, Grapheme_Cluster_Break Property Values. Added a note to clarify that Format and Extend characters are not joined to separators like LF, as well as a note about the fact that words can span a sentence break in Section 5.1 Default Sentence Boundary Specification. UAX #31
Unicode Identifier and Pattern Syntax Added many new scripts to Table 4, Candidate Characters for Exclusion from Identifiers. The text on natural-language identifiers was changed to have a stronger recommendation for including the exception characters, and include the Catalan MIDDLE DOT. UAX #34
Unicode Named Character Sequences Added definitions for Unicode namespace and the Unicode namespace for character names. Major rewrite of Section 4, Names. UAX #38
Unicode Han Database (Unihan) The syntax for the kIICore field has been changed. The kCompatibilityVariant and kRSUnicode fields have been moved to Unihan_IRGSources.txt. UAX #41
Common References for Unicode Standard Annexes No significant changes in this version. UAX #42
Unicode Character Database in XML Added the value 7.0 for the age attribute, and new values for the attributes blk, jg, sc, KIICore, kIRG_GSource, and InSC. UAX #44
Unicode Character Database Updated the derivation of the Alphabetic property and of the Case_Ignorable property. Simplified the discussion of @missing in Section 4.2.10 @missing Conventions, to reflect the revised conventions in the UCD data files, which eliminated special edge cases. Corrected statement about aliases for provisional properties in Section 5.8 Property and Property Value Aliases. UAX #45
U-Source Ideographs Clarified meaning of status field. H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical Standard Changes UTS #10
Unicode Collation Algorithm Changed the text to discuss collation weights more generically, with fewer references to the 16-bit weights used in the DUCET, and Section 6.3.2, Large Values for Secondary or Tertiary Weights was merged into Section 6.2, Large Weight Values. UTS #46
Unicode IDNA Compatibility Processing Updated statistics for 7.0.0 in Table 4, IDNA Comparisons. Section 4 has been modified to clarify the input and results for each major step in the algorithm. In Section 5 IDNA Mapping Table, added a new value for field 3, XV8,with example. In Section 8.1 Format, made the definition of NV8 consistent with Section 5 IDNA Mapping Table. M. Implications for Migration

There are a significant number of changes in Unicode 7.0 which may impact implementations which are upgrading to Version 7.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.

Script-related Changes

Version 7.0 adds many new scripts, so implementations which process script data should be carefully checked. In particular:

Rendering Issues

A number of the newly added scripts, and in particular, Manichaean and Psalter Pahlavi, have complex shaping behavior. For those two scripts, additional values related to joining behavior appear in ArabicShaping.txt, which may not be expected. In particular:

Casing-related Changes

In addition to the usual scattering of new case pairs added for the Latin and Cyrillic scripts, there are noteworthy changes which impact casing behavior:

Segmentation-related Changes

Segmentation-related changes to existing property values were deliberately kept to a minimum for Version 7.0, and for the most part reflect just minor corrections to relatively rare characters. However, there was one significant set of changes impacting two fairly salient punctuation marks used in Arabic:

CJK Changes UCD File Format Changes

In general, the format of UCD data files is unchanged for Version 7.0. However, there were some minor updates which may impact some parsers.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4