A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://www.unicode.org/versions/Unicode4.0.1 below:

Unicode 4.0.1

Unicode 4.0.1

Version 4.0.1 has been superseded by the latest version of the Unicode Standard.

Version 4.0.1 of the Unicode Standard consists of the core specification, The Unicode Standard, Version 4.0, the additional specifications on this page, the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Version 4.0.1 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 4.0.1, defined by: The Unicode Standard, Version 4.0 (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), as amended by Unicode 4.0.1 (http://www.unicode.org/versions/Unicode4.0.1/).

A complete specification of the contributory files for Unicode 4.0.1 is found on the page Components for Version 4.0.1.

Online Edition

The text of The Unicode Standard, Version 4.0, as well as the delta and archival code charts, is available online via the navigation links on this page. These files may not be printed. The Unicode 4.0 Web Bookmarks page has links to all sections of the online text.

Overview

Unicode 4.0.1 is an update version of the Unicode Standard. It adds no new characters.

The main new features in Unicode 4.0.1 are the following:

  1. The first significant update of the Unihan Database (Unihan.txt) since Unicode 3.2.0, including a large number of fixes and additional data items.
  2. Significant clarifications in four definitions used in conformance.
  3. Unicode Character Database:
    • New character properties: STerm and Variation_Selector
    • Updated significantly: Terminal_Punctuation, Math, Script, and Line_Break
    • Changed: general category of U+200B ZERO WIDTH SPACE
    • Changed: bidi class of some characters including: +, -, / and FRACTION SLASH
    • Added: property value aliases
    • Revised: formats in some of the data files
  4. Changes in the recommended loose comparison of character name values. See Property and Property Value Matching
  5. Clearer definition of the encoding of Bengali Reph and Ya-phalaa
Changes to Definitions D13, D14, and D17

Unicode 4.0, Chapter 3 section 6 [page 70] contains the following definitions:

D13 Base Character: A character that does not graphically combine with preceding characters, and that is neither a control nor a format character.

D14 Combining character: A character that graphically combines with a preceding base character. The combining character is said to apply to that base character.

D17 Combining character sequence: A character sequence consisting of either a base character followed by a sequence of one or more combining characters, or a sequence of one or more combining characters.

These definitions are modified as follows in Unicode 4.0.1 for greater clarity and to allow U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH NON-JOINER to be used in combining character sequences. (Definition D13 has been split into two parts, D13a and D13b. The bullet items, not formally parts of the definitions, are also modified for clarity. See the above-cited reference for details.)

D13a Graphic character: A character with the General Categories of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs).

D13b Base character: Any graphic character except for those with the General Category of Combining Mark (M).

D14 Combining character: A character with the General Category of Combining Mark (M).

D17 Combining character sequence: A maximal character sequence consisting of either a base character followed by a sequence of one or more characters where each is a combining character, ZERO WIDTH JOINER, or ZERO WIDTH NON-JOINER; or a sequence of one or more characters where each is a combining character, ZERO WIDTH JOINER, or ZERO WIDTH NON-JOINER.

(The changes to D14 and D17 do not imply that any particular sequence is automatically meaningful or interoperable; sequences must still be documented and used in conventional ways to convey specific meanings.)

Change to Definition D9b

Unicode 4.0.1 explicitly acknowledges that provisional properties are not maintained. Unicode 4.0 contains this definition in Chapter 3, section 5 [page 67]:

D9b Provisional property: A Unicode character property whose values are unapproved and tentative, and which may be incomplete or otherwise not in a usable state.

This has been modified by addition of a bullet item, as follows:

D9b Provisional property: A Unicode character property whose values are unapproved and tentative, and which may be incomplete or otherwise not in a usable state.

Clarification of Bengali Reph and Ya-phalaa

The formation of the Reph form is defined in the Unicode 4.0 Book, Section 9.1, Rules for Rendering, R2. Basically, the Reph is formed when a Ra which has the inherent vowel killed by the virama/halant begins a syllable. This is shown in the following example.

The Ya-phalaa is a post-base form of Ya and I formed when the Ya is the final consonant of a syllable cluster. In this case, the previous consonant retains is base shape and the virama/halant is combined with the following Ya. This is shown in the following example.

An ambiguous situation is encountered when the combination of Ra + virama/halant + Ya is encountered.

To resolve the ambiguity with this combination and to have consistent behavior, the processing order of the Bengali script is taken into account. When parsing the text, the ability to form the Reph is identified first and therefore the Reph form should have priority in processing. Thus, it is necessary to insert a U+200C ZERO WIDTH NON-JOINER character into the stream between the Ra and virama/halant to allow the virama/halant and Ya to be grouped together during processing.

In the example above, the ZWNJ is used because two characters that would join by default are intended to remain as separate entities. In cases other than where the RA is the first character in the cluster, the ZWNJ is not required for the formation of the Ya-phalaa. However, for ease of placing the Ya-phalaa input as a single key input, it should be permissible for the Ya-phalaa to be consistently formed by “ZWNJ + VIRAMA + YA” (U+200C + U+09CD + U+09AF).

Unicode Character Database

The updated Unicode Character Database files for this version are available in the 4.0.1 Update directory. For the unchanged files, see the Components for Version 4.0.1. For more detailed information about the changes in the Unicode Character Database, see the file UCD.html in the Unicode Character Database.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.3