RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://www.unicode.org/reports/tr50/tr50-5.html below:

Unicode Properties for Horizontal and Vertical Text Layout

Proposed Draft Unicode Technical Report #50 Unicode Properties for Horizontal and Vertical Text Layout Summary

The Unicode code charts generally show characters in the orientation in which they appear in horizontal lines. However, there are a few exceptions, mostly for characters or scripts which are normally written in vertical lines. This report describes a property that documents that situation.

When text is presented in vertical lines, there are various conventions for the orientation of the characters with respect to the line. In many parts of the world, most characters are upright. In East Asia, Kanji and Kana characters are upright, Latin letters of acronyms are upright, while words and sentences in the Latin script are typically sideways. This report describes two Unicode character properties which can be used to determine a default orientation of characters in those two scenarios.

Status

This is a draft document which may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress.

A Unicode Technical Report (UTR) contains informative material. Conformance to the Unicode Standard does not imply conformance to any UTR. Other specifications, however, are free to make normative references to a UTR.

Please submit corrigenda and other comments with the online reporting form [Feedback]. Related information that is useful in understanding this document is found in References. For the latest version of the Unicode Standard see [Unicode]. For a list of current Unicode Technical Reports see [Reports]. For more information about versions of the Unicode Standard, see [Versions].

Contents

1 Editorial Warnings
2 Introduction
3 Conformance
4 Property values
5 Properties
- 5.1 Grapheme Clusters
- 5.2 Resulting orientation
6 Tailorings
- 6.1 The brackets
- 6.2 The arrows
7 Glyphs Changes for Vertical Orientation
8 Data File
Acknowledgments
References
Modifications

1 Editorial Warnings

The draft is currently structured around two properties, with values in the set V={R, U, T}. This is entirely equivalent to a single property with values in the set VxV. Going one step further, we can also give names to the values in VxV and use those names as the property values. Those are syntactic details which are easy to change as this draft progresses.

Another change that can be introduced is to have a level of indirection between the property values and the actual classes, and to bridge that indirection either via a simple mapping, or via rules (e.g. in the style of linebreak) or by some other machinery.

The motivation for the current choice is mostly to make the resulting orientation as clear as possible, and to delay the introduction of more complex machinery until a rationale is provided for doing so.

2 Introduction

The Unicode code charts generally show characters in the same orientation as in horizontal lines. However, there are a few exceptions, mostly for characters or scripts which are normally written in vertical lines; in those cases, the code charts show the characters in the same orientation as in vertical lines. Futhermore, those vertical characters and scripts are often rotated when displayed in horizontal lines; figure 1 shows an example of Mongolian text in horizontal lines, where the Mongolian characters are rotated 90 degree counter-clockwise with respect to the code charts. The property Horizontal Orientation documents that situation.

Figure 1. Mongalian text on horizontal lines

When text is displayed in vertical lines, there are various conventions for the orientation of the characters with respect to the line. In many parts of the world, most characters are upright, that is appear with the same orientation as in the code charts, as can be seen in figure 2. The property Stack Vertical Orientation documents the default orientation of characters in his scenario.

Figure 2. Western vertical text

In East Asia, Kanji and Kana characters are upright, Latin letters of acronyms are upright, while words and sentences in the Latin script are typically sideways, as can be seen in figure 3. The property Mixed Vertical Orientation documents the default orientation of characters in his scenario.

Figure 3. Japanese vertical text

3 Conformance

The properties and algorithms presented in this report are informative. The intent is to provide a reasonable determination of the orientation of characters which can be used in the absence of other information, but can be overridden by the context, such as markup in a document or preferences in a layout application. This default determination is based on the most common use of a character, but in no way implies that that character is used only in that way.

For more information on the conformance implications, see [Unicode], section 3.5, Properties, in particular the definition (D35) of an informative property.

4 Property values

The properties share the same set of values, which are given in table 1.

Table 1. Property Values

U characters which are displayed upright, with the same orientation as they appears in the code charts. R characters which are displayed sideways, rotated 90 degrees clockwise compared to the code charts. L characters which are displayed sideways, rotated 90 degrees counter-clockwise compared to the code charts. T, Tu, Tr characters which are not just upright or sideways, but require a different glyph than in the code charts when used in vertical texts. In addition, Tu indicates that as a fallback, the character can be displayed with the code chart glyph upright; similarly, Tr indicates a possible fallback using the code chart glyph rotated 90 degrees clockwise.

Note that the orientation is described with respect to the appearance in the code charts.

Currently, there are no code points with Horizontal Orientation property value L, T, Tu or Tr; there are no code points with Stacked and Mixed Orientation property value L.

5 Properties

The Horizontal Orientation (short name ho) property is intented to be used for horizontal lines.

The Stacked Vertical Orientation (short name svo) property is intended to be used for vertical lines in those parts of the world where characters are mostly upright.

The Mixed Vertical Orientation (short name mvo) property is intended to be used for vertical lines in East Asia, and more specifically in Japan, China and Korea.

The scope of these properties is limited by the scope of Unicode itself. For example, Unicode does not support directly the representation of texts and inscriptions using Egyptian Hieroglyphs. Instead, Unicode provides characters intended for use when writing about such texts or inscriptions, or for use in conjunction with a markup system such as the Manuel de Codage. While the properties are defined for Egyptian Hieroglyphs, they are meaningful only for occurrences of these characters in discursive texts; when the characters are used with markup, the markup controls the orientation. See [Unicode], section 14.8 for a more complete discussion of the scope of Egyptian Hieroglyph characters.

5.1 Grapheme Clusters

As in all matters of typography, the interesting unit of text is not the character, but a grapheme cluster: it does not make sense to use a base character upright and a combining mark attached to it sideways.

It is expected that the client of the properties defined here will select a notion of grapheme cluster, and is interested in obtaining an orientation for the cluster as a whole.

A possible choice for the notion of grapheme cluster is either that of legacy grapheme cluster or that of extended grapheme cluster, as defined in [UAX29].

The orientation for a grapheme cluster as a whole is then determined by taking the orientation of the first character in the cluster, with the following exceptions:

if the cluster contains an enclosing combining mark (general category Me), then the whole cluster has ho, svo and mvo orientation U.
to handle combining marks displayed in isolation:
- if the cluster is made of U+00A0 NO-BREAK SPACE and some combining mark(s), then the whole cluster has ho and svo orientation U and mvo orientation R.
- if the cluster is made of U+3000 IDEOGRAPHIC SPACE and some combining mark(s), then the whole cluster has ho, svo and mvo orientation U.

5.2 Resulting orientation

The properties are intended to provide only a default orientation, rather than to handle correctly all situations. It is expected by when used in the context of a markup system, the user will be able to 1) have some control over which property is used and 2) specify an explicit orientation. For example, one could have an attribute orientation with possible values auto, 0, 90, 180 and 270; when the value of the attribute is not auto, the explicit orientation is used; when the value is auto, the property values are used.

The property values, if used, are intended to be used directly.

There is actually one character for which a contextual determination would be useful and reliable: U+00AE ® REGISTERED SIGN, which can occur both following terms in kanji/kana and following terms in Latin. An occurrence of ® should be assigned the same class as the character it follows. Others? Enough to warrant the complexity of contextual rules?

There are other cases where the character is used routinely in both Japanese and Western contexts: quotation marks are a good example. While contextual determination would be useful, it's probably the case that it's not going to be reliable.

6 Tailorings

To facilitate tailorings, this reports identifies sets of characters which behave similarly, and for which it can useful to tailor the orientation as a group.

6.1 The brackets

This set contains brackets, which while they appear rotated, are commonly implemented as if they were transformed.

Table 2. The brackets set

00AB 00BB 201C..201F 2039..203A 2045..2046 3008..3011 3014..301B FE59..FE5E FF08..FF09 FF3B FF3D FF5B FF5D FF5F..FF60 FF62..FF63 6.2 The arrows

This set contains arrows.

Table 3. The arrows set

2190..21FF 261A..261F 2794 2798..27AF 27B1..27BE 27F0..27FF 2900..297F 2B00..2B11 2B30..2B4C FFE9..FFEC 7 Glyphs Changes for Vertical Orientation

Table 4 provides representative glyphs for the horizontal and vertical appearance of characters with the property value T.

Add glyphs for all the entries: 301F, 332C, FF61, FF64, 1F200, 1F201, halfwidth small kanas. Some glyphs (2018, 2019) may not be correct.

Recently, the brackets have been made T, because they have a slightly different position in their box between horizontal and vertical. It is arguable whether characters for which the difference is only a slight position adjustments should be included in T.

Table 4. Glyph Changes for Vertical Orientation

8 Data File

The data file, in UCD syntax.

To help during the review, a slightly more readable version is available.

U+2016 ‖ DOUBLE VERTICAL LINE; JRLEQ classifies this character as cl-19 ideographic; typically, this is a clue that it is upright; also, JIS 0213:2000 does not give a vertical variant. On the other hand, it seems that 'vert' often presents it sideways. Which is right? Could it be that font vendors have been influenced by U+30A0 ゠ KATAKANA-HIRAGANA DOUBLE HYPHEN?

Acknowledgments

Please let me know if I forgot your name or you prefer a different spelling/etc.

Thanks to the reviewers: Julie Allen, Ken Lunde, Nat McCully, Ken Whistler, Taro Yamamoto, htakenaka, John Cowan, Fantasai, Asmus Freytag, Van Anderson, Ishi Koji, sikeda, Shinyu Murakami, Tokushige Kobayashi, Addison Phillips, Martin Dürst, the W3C Internationalization Core Working Group, the W3C I18N Interest group, the W3C CSS Working group, Michael Everson, John Daggett, Laurentiu Iancu, Dwayne Robinson.

References Modifications

This section indicates the changes introduced by each revision.

Revision 5

TR renamed to include “Horizontal”.
New property for horizontal text. The current assignment is L for Mongolian and Phags-pa, U for all the other characters.
Proposal B has been accepted; removed proposal A.
Characters moved from U or R to T: 3008..3011 3014..301B 301D..301F 309B..309E 20A0 FF01 FF08..FF09 FF0C..FF0E FFaA..FF1E FF3B FF3F FF5b..FF60 FF62..FF63 FF70 FFE3, on the basis of small shift in the box, similar to small kana.
T moved to Tr or Tu, following MS proposal. The only T characters remaining are 2018 and 2019, which are R/R in MS proposal.
Arrow set introduced, as in MS proposal.
Yi blocks changed from svo/mvo R to U.
UCAS changed from mvo R to U, except for U+1400 ᐀ CANADIAN SYLLABICS HYPHEN.

Revision 4

Properties renamed to Stacked Vertical Orientation (previously Default Vertical Orientation) and Mixed Vertical Orientation (previously East Asian Vertical Orientation)
Introduced sets of characters for tailoring.
Property value S renamed to R.
Property value Sb merged with R; set created for brackets.

Revision 3

Mongolian and Egyptian Hieroglyphs changed to U.
Implementation of the UTC decisions made during meeting #130, February 2012.
- Removal of the East Asian Class property
- East Asian Orientation renamed East Asian Vertical Orientation
- New property, Default Vertical Orientation. The initial assignment is: T if EAVO=T, SB if EAVO=SB and the bracket is specific to CJK, S to align with CSS Sv value except for vertical presentation forms, Tibetan, Mongolian, sup/sub parens, sup punctuation, FD3E, FD3F, which remain U.

Revision 2

Clarification of the status of the properties (end of section 1)
Clarification of the handling of grapheme clusters
Removed the "comments" column in table 3.
Hangul characters: new class cl-19.4, hangul, orientation U
Yijing Hexagram symbosl are now cl-19-3, symbols, orientation U.
Small forms variants are treated like their fullwidth counterparts.
Superscripts and subscript characters are now cl-27, western, orientation S
Small kana: orientation U; class split in cl-11.1, smallHiragana and cl-11.2, smallKatakana
U+3030 〰 WAVY DASH has orientation T.
The two alternatives for math, etc. are described.

Revision 1

First working draft.

Copyright © 2011-2012 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode Terms of Use apply.

Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4