RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://www.unicode.org/reports/tr50/tr50-33.html below:

UAX #50: Unicode Vertical Text Layout

Unicode® Standard Annex #50 Unicode Vertical Text Layout Summary

The Unicode code charts generally show characters oriented for horizontal presentation. However, some of the glyphs are actually oriented for vertical presentation. A few characters change shape or orientation when the text is rotated from horizontal to vertical.

When text is presented, there are various conventions for the orientation of the characters with respect to the line. In most cases, characters are oriented in an upright manner similar to their presentation in the Unicode code charts. In a few cases, when presented in vertical lines, the characters will appear rotated or transformed in various ways. For example, in East Asia, Han ideographs, Kana syllables, Hangul syllables, and Latin letters in acronyms are upright, while words and sentences in the Latin script are typically sideways. This report describes a Unicode character property which can serve as a stable default orientation of characters for reliable document interchange.

Status

This document has been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium. This is a stable document and may be used as reference material or cited as a normative reference by other specifications.

A Unicode Standard Annex (UAX) forms an integral part of the Unicode Standard, but is published online as a separate document. The Unicode Standard may require conformance to normative content in a Unicode Standard Annex, if so specified in the Conformance chapter of that version of the Unicode Standard. The version number of a UAX document corresponds to the version of the Unicode Standard of which it forms a part.

Please submit corrigenda and other comments with the online reporting form [Feedback]. Related information that is useful in understanding this annex is found in Unicode Standard Annex #41, “Common References for Unicode Standard Annexes.” For the latest version of the Unicode Standard, see [Unicode]. For a list of current Unicode Technical Reports, see [Reports]. For more information about versions of the Unicode Standard, see [Versions]. For any errata which may apply to this annex, see [Errata].

Contents

1 Overview and Scope
2 Conformance
3 The Vertical_Orientation Property (vo)
- 3.1 Property Values
- 3.2 Scope of the Property
  - 3.2.1 Grapheme Clusters
  - 3.2.2 Squared Katakana and Ideographic Words
  - 3.2.3 Right-to-Left Scripts
  - 3.2.4 Quotation Marks
- 3.3 Vertical Glyphs in the Code Charts
4 Glyphs Changes for Vertical Orientation
5 Data File
Acknowledgments
References
Modifications

1 Overview and Scope

When text is displayed in vertical lines, there are various conventions for the orientation of the characters with respect to the line. In East Asia, Han ideographs, Kana syllables, and Hangul syllables, along with Latin letters of acronyms, remain upright, meaning that they appear with the same orientation as in the code charts, but words and sentences that are composed of characters of the Latin script are typically oriented sideways, as can be seen in Figure 1.

Figure 1. Japanese Vertical Text

In many parts of the world, most characters are upright, as can be seen in Figure 2.

Figure 2. Western Vertical Text

Most languages and scripts are written horizontally and vertical presentation is a special case, usually used for short runs of text (as in Figure 2). Some languages, however, have publishing traditions that provide for long-format vertical text presentation, notably East Asian languages such as Japanese. In those languages, the orientation in which characters are laid out can vary, depending on the scripts, the style, and sometimes the context. The preferred or desired orientation may also change over time.

While the choice of orientation for a character can vary across documents, it is important that the choice made by an author for a specific document be clearly established, so that a rendering system can display what the author intended. It is also important that this choice be established independently of the font resources, as the rendering systems may have to use other fonts than those intended or specified in the document. Finally, the expression of the author’s choice should be relatively concise, to facilitate document authoring and minimize document size.

This report describes a Unicode character property which can serve as a stable default character orientation for the purpose of reliable document interchange.

For the purpose of reliable document interchange, this property defines an unambiguous default value, so that implementations could reliably render a character stream based solely on the property values, without depending on other information such as provided in the tables of the selected font.

The intent is that document formats should offer to the author the possibility of specifying the desired orientation of a given character (either all occurrences or a particular occurrence), and that in the absence of an explicit specification, the orientation is implicitly that defined by the property presented in this report.

In plain text, which by definition does not allow the recording of any data beyond the characters, the orientations are by necessity those specified by the property.

The actual choice for the property values should result in a reasonable or legible default, but it may not be publishing-material quality, and it may not be a good choice if used in a specific style or context.

The property values are chosen first to match existing practice in Japanese context in Japan and then in other East Asian contexts in their respective environments. For characters that are not generally used in such environments, similarity to existing characters has been taken into consideration. Commonly used characters of Latin and other scripts that appear in Japanese and other East Asian environments are also taken into account, but with the lower priority.

2 Conformance

The property defined in this report is informative. The intent of this report is to provide, in the absence of other information, a reasonable way to determine the correct orientation of characters, but this behavior can be overridden by a higher-level protocol, such as through markup or by the preferences of a layout application. This default determination is defined in the data file [Data50] in the Unicode Character Database [UCD], but in no way implies that the character is used only in that orientation.

For more information on the conformance implications, see [Unicode], Section 3.5, Properties, in particular the definition (D35) of an informative property.

3 The Vertical_Orientation Property (vo) 3.1 Property Values

The possible Vertical_Orientation property values are given in Table 1.

Table 1. Property Values

U Characters which are displayed upright, with the same orientation that appears in the code charts. R Characters which are displayed sideways, rotated 90 degrees clockwise compared to the code charts. Tu Characters which are not just upright or sideways, but generally require a different glyph than in the code charts when used in vertical texts. In addition, as a fallback, the character can be displayed with the code chart glyph upright. Tr Same as Tu except that, as a fallback, the character can be displayed with the code chart glyph rotated 90 degrees clockwise.

Note that the orientation is described with respect to the appearance in the code charts.

3.2 Scope of the Property 3.2.1 Grapheme Clusters

As in all matters of typography, the interesting unit of text is not the character, but a grapheme cluster: it does not make sense to use a base character upright and a combining mark attached to it sideways. Implementations should apply the orientation to each grapheme cluster.

A possible choice for the notion of grapheme cluster is either that of legacy grapheme cluster or that of extended grapheme cluster, as defined in [UAX29].

The orientation for a grapheme cluster as a whole is then determined by taking the orientation of the first character in the cluster, with the following exception:

If the cluster contains an enclosing combining mark (general category Me), then the whole cluster has the Vertical_Orientation property value U.

3.2.2 Squared Katakana and Ideographic Words

There are special typographic conventions to consider, for Japanese text layout in particular. It is common practice to represent particular katakana words and ideographic sequences as a single precomposed glyph whose components are arranged within the confines of the em-box, and are therefore the same size as a conventional ideograph. Such characters are referred to as a squared word. Furthermore, the arrangement of the components in the em-box differs depending on whether the layout is horizontal or vertical.

There are a significant number of such compatibility characters encoded in the Unicode Standard that were inherited from legacy Japanese character encoding standards. As a result of the required layout rules, these characters must be supported in East Asian fonts using separate glyphs for horizontal and vertical layout. Accordingly, one of the primary motivations for the Vertical_Orientation property value Tu is to identify the compatibility characters that exhibit such behavior.

The same layout rules apply in cases where a katakana word or ideographic sequence is displayed as a squared word, but no single, encoded compatibility character exists for that sequence in the Unicode Standard. For example, 有限会社 (yūgen gaisha “limited liability company”) and 財団法人 (zaidan hōjin “foundation”) are commonly displayed the same way in text as U+337F ㍿ SQUARE CORPORATION whose horizontal and vertical glyphs are shown in Table 3.

The individual katakana characters or CJK unified ideographs that comprise these squared words are assigned the Vertical_Orientation property value U, Tu, or Tr, and therefore remain upright in vertical layout or have their own vertical form. For example, U+3312 ㌒ SQUARE KYURII, whose horizontal and vertical glyphs are shown in Table 3, is composed of U+30AD キ KATAKANA LETTER KI, U+30E5 ュ KATAKANA LETTER SMALL YU, U+30EA リ KATAKANA LETTER RI, and U+30FC ー KATAKANA-HIRAGANA PROLONGED SOUND MARK. U+30AD and U+30EA are assigned the Vertical_Orientation property value U, U+30E5 is assigned Tu, and U+30FC is assigned Tr.

3.2.3 Right-to-Left Scripts

This property has a current limitation in that the handling of right-to-left scripts is not specified. This includes scripts that are predominantly written right to left, such as Arabic, along with right-to-left scripts that are meant to be written vertically, such as Chorasmian.

3.2.4 Quotation Marks

When certain quotation marks are displayed using fullwidth glyphs, a different glyph is normally used in vertical layout. This is the case whether the implementation always uses a fullwidth glyph or does so in response to a Standardized Variation Sequence. It is ultimately up to the selected font’s tables, such as the presence of substitutions in the 'vert' (Vertical Alternates) layout feature, and the layout software to determine whether their glyphs should be simply rotated or substituted with a different glyph. The affected characters are the four quotation marks: U+2018 ‘ LEFT SINGLE QUOTATION MARK, U+2019 ’ RIGHT SINGLE QUOTATION MARK, U+201C “ LEFT DOUBLE QUOTATION MARK, and U+201D ” RIGHT DOUBLE QUOTATION MARK. The behavior of their fullwidth glyphs in vertical layout is shown in Table 2.

Table 2. Fullwidth Quotation Mark Glyph Changes for Vertical Orientation

Note that the vertical forms of U+201C and U+201D are the same as the vertical forms of U+301D 〝 REVERSED DOUBLE PRIME QUOTATION MARK and U+301F 〟 LOW DOUBLE PRIME QUOTATION MARK, respectively, as shown in Table 3, and that there are no horizontal equivalents of the vertical forms of U+2018 and U+2019. In addition, according to conventions in China, as shown in the Vertical—Hans column of Table 2, the vertical forms of all four quotation marks are the same as the vertical forms of U+300C 「 LEFT CORNER BRACKET, U+300D 」 RIGHT CORNER BRACKET, U+300E 『 LEFT WHITE CORNER BRACKET, and U+300F 』 RIGHT WHITE CORNER BRACKET, respectively, which are also shown in Table 3.

3.3 Vertical Glyphs in the Code Charts

The Unicode code charts generally show characters in the orientation they take when used in horizontal lines. However, prior to Unicode 7.0, there were a few exceptions, mostly for characters or scripts which are normally written in vertical lines; in those cases, the code charts used to show the characters in the same orientation as in vertical lines. Furthermore, such characters are often rotated when displayed in horizontal lines; Figure 3 shows an example of Mongolian text in horizontal lines in which the Mongolian characters are rotated 90 degrees counterclockwise with respect to the code charts prior to Unicode 7.0.

Figure 3. Mongolian Text on Horizontal Lines

The Unicode 7.0 code charts changed the orientation of characters for Mongolian and Phags-pa by rotating counterclockwise so that they match the orientation in horizontal lines. This change makes the code charts more consistent with other scripts in terms of the orientation of characters. It also aligns the code charts with many recent rendering systems such as OpenType, and therefore it is expected to make implementations of the property easier. However, implementations should be aware that underlying rendering systems may not have exactly the same orientation of characters as the code charts.

While this property defines only default orientations compared to the code charts, high-level protocols or applications could combine information provided in a font’s tables with the property values to more reliably calculate in which orientation they should render such glyphs, in order to achieve the desired visual result.

4 Glyphs Changes for Vertical Orientation

Table 3 provides representative glyphs for the horizontal and vertical appearance of characters with the Vertical_Orientation property values Tu and Tr.

The vertical glyphs that are shown in the table are exemplary, and their presence does not imply that font implementations should necessarily support them. Font developers should instead research the vertical glyph conventions for the intended regions to determine whether a vertical glyph is necessary for a particular character, and what the appropriate vertical glyph should be.

The Horizontal column may also specify more than one glyph when regional or other differences exist. Font developers should adhere to regional conventions when determining the appearance of horizontal glyphs.

Table 3. Glyph Changes for Vertical Orientation

5 Data File

Starting with Version 10.0.0 of the Unicode Standard, the data file listing the Vertical_Orientation property value assignments [Data50] is formally included in the Unicode Character Database [UCD]. (In Revisions 17 and prior of this specification, the data file was provided in versioned directories under the following stable URL: https://www.unicode.org/Public/vertical/)

Acknowledgments

Thanks to the original editor Eric Muller, to the subsequent co-editor Laurențiu Iancu who drove the status change of the specification and the incorporation of its data into the UCD for Version 10.0, and reviewers: Julie Allen, Van Anderson, John Cowan, John Daggett, Mark Davis, Martin Dürst, Elika J. Etemad, Michael Everson, Asmus Freytag, Soji Ikeda, Norikazu Ishizu, Nozomu Katō, Yasuo Kida, Nat McCully, Shinyu Murakami, Addison Phillips, Roozbeh Pournader, Dwayne Robinson, Kyoko Sato, Hiroshi Takenaka, Bobby Tung, Philippe Verdy, Ken Whistler, Taro Yamamoto, the W3C CSS Working Group, the W3C I18N Interest Group, and the W3C Internationalization Working Group.

References

For references for this annex, see Unicode Standard Annex #41, “Common References for Unicode Standard Annexes.”

Modifications

The following summarizes modifications from the previous published version of this annex.

Revision 33

Reissued for Unicode 17.0.0.
Section 3.2.4 was updated to remove references to tailoring (see UTC Action Item 182-A94).
Table 3 was updated to add characters that are now assigned the property value vo=Tr (U+2018, U+2019, U+201C, U+201D) or vo=Tu (U+31B4..U+31B7, U+31BB, U+1B132, U+1B150..U+1B152, U+1B155, U+1B164..U+1B167) (see UTC Action Items 182-A94 and 182-A107).
Changes made in the data file for existing characters:
- Characters assigned the property value vo=Tr that were previously assigned the property value vo=R: U+2018, U+2019, U+201C, U+201D (see UTC Consensus 182-C35)
- Characters assigned the property value vo=Tu that were previously assigned the property value vo=U: U+31B4..U+31B7, U+31BB, U+1B132, U+1B150..U+1B152, U+1B155, U+1B164..U+1B167 (see UTC Consensus 182-C40)

Revision 32 being a proposed update, only changes between revisions 31 and 33 are noted here.

Previous revisions can be accessed with the “Previous Version” link in the header.

© 2013–2025 Unicode, Inc. This publication is protected by copyright, and permission must be obtained from Unicode, Inc. prior to any reproduction, modification, or other use not permitted by the Terms of Use. Specifically, you may make copies of this publication and may annotate and translate it solely for personal or internal business purposes and not for public distribution, provided that any such permitted copies and modifications fully reproduce all copyright and other legal notices contained in the original. You may not make copies of or modifications to this publication for public distribution, or incorporate it in whole or in part into any product or publication without the express written permission of Unicode.

Use of all Unicode Products, including this publication, is governed by the Unicode Terms of Use. The authors, contributors, and publishers have taken care in the preparation of this publication, but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.

Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.5