RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://www.w3.org/TR/2008/NOTE-xml-i18n-bp-20080213/ below:

Best Practices for XML Internationalization

1 Introduction

This document is a complement to the W3C Recommendation Internationalization Tag Set (ITS) Version 1.0 [ITS]. However, not all internationalization-related issues can be resolved by the special markup described in ITS. The best practices in this document therefore go beyond application of ITS markup to address a number of problems that can be avoided by correctly designing the XML format, and by applying a few additional guidelines when developing content.

This document and Internationalization Tag Set (ITS) Version 1.0 [ITS] implement requirements formulated in Internationalization and Localization Markup Requirements [ITS REQ].

This set of best practices does not cover all topics about internationalization for XML. Other useful reference material includes: Character Model for the World Wide Web 1.0: Fundamentals [CharMod], and Unicode in XML and other Markup Languages [Unicode in XML].

1.1 Who should use this document

This document is divided into two main sections:

The first one is intended for the designers and developers of XML applications (also referred to here as 'schemas' or 'formats').
The second is intended for the XML content authors. This includes users modifying the original content, such as translators.

1.2 How to use this document 1.2.1 Designers and developers of XML applications

Section 2: When Designing an XML Application provides a list of some of the important design choices you should make in order to ensure the internationalization of your format.

Section 4: Generic Techniques provides additional generic techniques such as writing ITS rules or adding an attribute to a schema. Such techniques apply to many of the best practices.

Section 5: ITS Applied to Existing Formats provides a set of concrete examples on how to apply ITS to existing XML based formats. This section illustrates many of the guidelines in this document.

1.2.2 Users and authors of XML content

Section 3: When Authoring XML Content provides a number of guidelines on how to create content with internationalization in mind. Many of these best practices are relevant regardless of whether or not your XML format was developed especially for internationalization.

Section 4.1: Writing ITS Rules provides practical guidelines on how to write ITS rules. Such techniques may be useful when applying some of the more advanced authoring best practices.

2 When Designing an XML Application

Designers and developers of XML applications should take into account the following best practices:

Best Practice Implementing as a new feature Handling legacy markup Defining markup for natural language labelling Make sure the xml:lang attribute is defined for the root element of your document, and for any element where a change of language may occur. Provide an ITS Rules document where you use the its:langRule element to specify what attribute or element is used instead of xml:lang. Defining markup to specify text direction Make sure the its:dir attribute is defined for the root element of your document, and for any element that has text content. Provide an ITS Rules document where you use the its:dirRule element to associate the different directionality indicators with their equivalents in ITS. Avoiding translatable attribute values Make sure you store all translatable text as element content, not as attribute values. Provide an ITS Rules document where you use the its:translateRule element to specify what attributes are translatable. Indicating which elements and attributes should be translated Provide an ITS Rules document where you use its:translateRule elements to indicate which elements have non-translatable content. Defining markup to override translate information

Make sure the its:translate attribute is defined for the root element of your documents, and for any element that has text content.
It is also recommended that you define the its:rules element in your schema, for example in a header if there is one, and within that the its:translateRule element. Content authors can then use these elements to globally change the default translate rules for specific elements and attributes.

Provide an ITS Rules document where you use the its:translateRule element to associate this mechanism with the ITS Translate data category. Providing information related to text segmentation Provide an ITS Rules document where you use its:withinTextRule elements to indicate which elements should be treated as either part of their parents, or as a nested but independent run of text. By default, element boundaries are assumed to correspond to segmentation boundaries. Defining markup for ruby text Make sure the its:ruby element and its children are defined for all elements where there is text content. Provide an ITS Rules document where you use the its:rubyRule element to associate your ruby markup with its equivalent in ITS. Defining markup for notes to localizers

Make sure the attributes its:locNote, its:locNoteType and its:locNoteRef are defined in your schema. This markup allows content authors to provide localization-related notes as its:locNote attribute values, or to point to the location of the relevant note text using its:locNoteRef.
It is also recommended that you define the its:rules element in your schema, for example in a header if there is one, and within that the its:locNoteRule element and its related markup. Content authors can use this markup to specify localization-related notes. Within the its:locNoteRule element, notes can be stored in the its:locNote element.

Provide an ITS Rules document where you use the its:locNoteRule element to associate your notes markup with its equivalent in ITS. Defining markup for unique identifiers Make sure that elements with translatable content can be associated with a unique identifier. Identifying terminology-related elements Provide an ITS Rules document where you use its:termRule elements to indicate which elements are terms and information related to them (e.g. definitions). Defining markup for specifying or overriding terminology-related information

Make sure the its:term and the its:termInfoRef attributes are defined for any element that has text content.
It is also recommended to define the its:rules element in your schema, for example in a header if there is one. The its:rules element provides access to the its:termRule element which can be used to override terminology-related information globally.

Working with multilingual documents For documents that need to go through some localization tasks, always store the localized version of the text in a separate document. Naming elements and attributes

Make sure the names of the elements and attributes of your schema reflect their functions, rather than one possible way of rendering their content.
Also, if possible, avoid element names which do not follow a fixed naming scheme (for example, element names that serve also as identifiers).

Not applicable Defining a span-like element Make sure you define a span-like element in your schema that will allow authors to associate arbitrary content with properties such as directionality, language information, etc. If no span-like element already exists in your schema, you may be able to use its:span. Documenting internationalization and localization features of your schema Make sure you document the internationalization and localization aspects of your schema by providing a set of relevant ITS rules in a single standalone ITS Rules document.

Where it says "How to implement this as a new feature", this section describes how to create new schemas or add new features to existing schemas. When doing this you may need to take into account the following:

Think twice before creating your own schema. Seriously consider using existing formats such as DITA, DocBook, Open Document Format, Office Open XML, XML User Interface Language, Universal Business Language, etc. Those formats have many useful insights already built in.
Check carefully whether an existing format comes with a built-in capability for modification. DocBook and DITA, for example, come with their own set of features for adapting their format to special needs.
The modification mechanisms available will depend on the schema language (DTD, XML Schema, RELAX NG, etc.) For example, namespace-based modularization of schemas is difficult to achieve with DTDs.

NVDL is an example of a meta-schema language was designed especially to allow integration of several existing vocabularies into a single XML vocabulary without the need to know the details of source schemas. This means that with NVDL you can usually create a schema for compound documents more easily than with other schema technologies.
Each schema language provides different ways of extending or modifying existing schemas. Some examples are the include, import or redefine mechanisms in XML Schema.
Some processors do not implement support for all schema language constructs, due to erroneous implementations or differences in conformance profiles (e.g. see the conformance requirements to XML Schema part 1). Therefore a schema which works in one environment may not work in a different one.
What is possible also depends on the features of the schema which the modification is targeting. For example:
- An XML Schema redefine is only possible if the modified schema has been created with named types.
- If you are working with XML Schema, you can only apply the technique of 'chameleon' or 'proxy' schemas (see http://www.xfront.com/ZeroOneOrManyNamespaces.html) if the 'chameleon' schemas have no namespace. For example, the XML Schema document for ITS XML Schema document for ITS has a target namespace and therefore cannot be a 'chameleon' schema.

Note: The considerations above are only a portion of what you need to take into account. You need to know a lot more when diving into schema modularization.

Provide a way for authors to specify the natural language of content using ITS markup, or document equivalent legacy markup in an ITS Rules document.

The XML namespace provides the xml:lang attribute and the ITS Language Information data category provides the its:langRule element to address this requirement.

How to implement this as a new feature

Make sure the xml:lang attribute is defined for the root element of your document, and for any element where a change of language may occur.

For examples of how to add attributes in your existing schema see Section 4.2: Example of adding an attribute to an existing schema.

Some XML documents may be designed to store data without natural language content. In these cases, there is no need for the xml:lang attribute.

The scope of the xml:lang attribute applies to both the attributes and the content of the element where it appears, therefore one cannot specify different languages for an attribute and the element content. ITS does not provide a remedy for this. Instead, it is recommended that you avoid translatable attributes.

Make sure that the definition of the xml:lang attribute allows for empty values. That is:

In a DTD you must not use NMTOKEN as the data type, instead use CDATA.
In XML Schema the built-in data type language does not allow empty values. However, the declaration for xml:lang in the XML Schema document for the XML namespace at http://www.w3.org/2001/xml.xsd does allow for empty values and therefore can be used.

It is not recommended to use your own attribute or element to specify the language of the content. The xml:lang attribute is supported by various XML technologies such as XPath and XSLT (e.g. the lang() function). Using something different would diminish the interoperability of your documents and reduce your ability to take advantage of some XML applications.

Note: If you need to specify language as data or meta-data about something external to the document, do it with an attribute different from xml:lang. For more information see the article xml:lang in XML document schemas.

In XHTML the language of a file linked with the a element is indicated with a hreflang attribute because it does not apply to the content of the a element.

<a xml:lang="en" href="german.html" hreflang="de">Click here for German</a>

If you have different languages in the attribute values and content of an element, consider nesting elements, if possible. See Handling attribute values and element content in different languages.

Handling markup not in the ITS namespace

If you are working with an existing schema where there is a way to specify content language that uses something other than the xml:lang attribute (but still uses the same values as xml:lang), you should provide an ITS Rules document where you use the its:langRule element to specify what attribute or element is used instead of xml:lang.

In this document the langcode element is used to specify the language of the text element. The langcode element has no inheritance behavior equivalent to the one of xml:lang.

Note: This example is a multilingual document, which has its own set of issues (see Best Practice 12: Working with multilingual documents).

<myRes>
 <messages>
  <msg id="1">
   <langcode>en</langcode>
   <text>Cannot find file.</text>
  </msg>
  <msg id="2">
   <langcode>fr</langcode>
   <text>Fichier non trouvé.</text>
  </msg>
 </messages>
</myRes>