Showing content from https://www.w3.org/International/techniques/authoring-html below:
Authoring web pages
Characters Getting started Background reading
^ Find another task
Choosing and applying a character encoding
- Choose UTF-8 for all content. more
- If you really can't use a Unicode encoding, use only those legacy encodings listed in the Encoding specification. more
- Avoid the following encodings: UTF-16, UTF-32, JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB (Windows code page 1361), encodings based on ISO-2022, or encodings based on EBCDIC, CESU-8, UTF-7, BOCU-1, and SCSU. more
Useful reference links
-
Encoding, 4.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Background reading
-
Who uses Unicode?
Are corporate Web sites using Unicode right now? This article is somewhat outdated, now that Unicode accounts for around 97% of pages on the Web.
-
Document character set
What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?
^ Find another task
Changing to UTF-8
- Save the data as UTF-8, don't just change the encoding declaration. more
- Declare the encoding in your page. more
- Ensure that your server does the right thing. more
Background reading
^ Find another task
Declaring the character encoding for HTML
- Use the HTTP header if it is available. more
- Always use an in-document encoding declaration, even if you are also using the HTTP header. more
- Ensure that the encoding declaration fits within the first 1024 bytes of the page. more
- If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
- Do not use the
charset
attribute on a
or link
elements. more
Useful reference links
-
Encoding, 4.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Background reading
-
Serving HTML & XHTML
Introduces doctypes, mime-types, and the influence of standards- vs. quirks-mode on character encoding declarations.
-
Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.
^ Find another task
Declaring the character encoding for a CSS style sheet
- If you use UTF-8 as the character encoding for your style sheets and your HTML pages, and declare that encoding in your HTML, there is no need to declare the encoding for your style sheet. more
- If you use
@charset
, ensure that nothing (except a BOM) comes before it in the style sheet, and use the exact syntax. more
- If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification. more
- Do not use the
charset
attribute on a
or link
elements. more
Useful reference links
-
Encoding, 4.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
^ Find another task
Using escapes to represent characters
- Avoid using escapes whenever possible. When you use UTF-8 it supports all the characters you need. more
- Use escapes for invisible or ambiguous characters. more
- Use CSS escapes for CSS embedded in HTML, rather than HTML escapes. more
- Always use Unicode codepoints for the numeric part of a character escape. Do not use codepoint values of non-Unicode encodings. more
- Use a single escape (representing the Unicode codepoint value) for supplementary characters. Do not escape surrogate character pairs. more
- Ensure that all
href
attribute values have escaped ampersands in query parameters, ie. &
rather than just &
. more
- Avoid named character entities in XHTML. more
^ Find another task
Checking the encoding of a document Handling the byte-order mark (BOM)
- If you use the byte-order mark with UTF-8-encoded pages, check that any scripts and back-end processes can handle the BOM. more
- If you ignored the advice above and encoded your page as UTF-16, always ensure that it starts with a BOM. more
^ Find another task
Handling character normalization
- Ensure that all HTML class names and CSS selectors are saved using the same Unicode normalization form (NFC is recommended). more
^ Find another task
Handling encoding issues in forms
- Use UTF-8 for the character encoding of your page. more
- Consider checking on the server that form data is arriving in UTF-8. more
How to's
^ Find another task
Using Unicode control codes
- Don't use Unicode characters if there is markup to do the same job. more
- Use character escapes to represent control codes, so that they are visible. more
^ Find another task
Working around unavailable characters/glyphs Using non-ASCII web addresses Language Getting started Declaring the overall language of a page
- Always declare the default language for text in the page using attributes on the
html
tag. more
- Do NOT use the
meta
element with the content
attribute set to Content-Language
. more
- Use language attributes rather than HTTP to declare the default language for 'text processing' (ie. when language needs to be known for things such as font choice, styling, spell-checking, hyphentation, quote mark styling, etc.). more
- Do not declare the default language of a document in the
body
element, use the html
element. more
- Where a document contains content aimed at speakers of more than one language, decide whether you want to declare one language in the
html
tag, or leave the languages undefined until later.
- Where a document contains content aimed at speakers of more than one language, try to divide the document linguistically at the highest possible level, and declare the appropriate language for each of those divisions.
- For HTML use the
lang
attribute only, for XHTML 1.0 served as text/html
use the lang
and xml:lang
attributes, and for XHTML served as XML use the xml:lang
attribute only. more
^ Find another task
Identifying in-document language changes
- When the page contains content in another language, add a language attribute to an element surrounding that content. more
- For HTML use the
lang
attribute only, for XHTML 1.0 served as text/html
use the lang
and xml:lang
attributes, and for XHTML served as XML use the xml:lang
attribute only. more
- If the text in attribute values and element content is in different languages, consider using a nested approach. more
^ Find another task
Choosing language tags
- Use subtags as defined by BCP 47 for language attribute values. more
- Use the shortest possible language tag values. more
- Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified and Traditional Chinese, respectively. more
- Use the subtag zxx when the text is known to be not in any language. more
- When the language is undetermined and you have to label it, use lang="". more
- If you are serving XML, and the format you are using supports it, use xml:lang="", otherwise use xml:lang="und" when the language is undetermined and you have to label it. more
^ Find another task
Declaring metadata about the language(s) of the intended audience See also
This section is specifically about setting metadata for the document as an object. For information about declaring the language of the document for text-processing purposes, see Declaring the overall language of a page.
For detailed advice about how to select the right language tags, see Choosing language values.
- Consider using a
Content-Language
HTTP header to declare metadata about the language(s) of the intended audience of a document. more
- Where a document contains content aimed at speakers of more than one language, use the HTTP
Content-Language
header with a comma-separated list of language tags. more
Background reading
^ Find another task
Indicating the language of a link destination
- When pointing to a resource in another language, consider the pros and cons before indicating the language of the target document. more
- If you want to indicate that the target document of an a element is in another language, consider the pros and cons before using
hreflang
with CSS. more
- Do not use flag icons to indicate languages. more
^ Find another task
Setting & changing browser language preferences Using Accept-Language for locale setting Markup & text Getting started Using b and i tags
- Use the class attribute on a b or i element to identify why the element is being used. more
- Consider whether other elements might be more applicable than the b or i element because they carry the right semantics. more
^ Find another task
Using ruby markup See also
This section is specifically about how to use markup for ruby annotations. For information about styling ruby see Styling ruby text.
How to's
Background reading
-
What is ruby?
What are 'ruby' annotations?
-
Bopomofo on the Web
A summary of how bopomofo is used and the implications for support on the Web.
-
Use Cases & Exploratory Approaches for Ruby Markup
Discussion about what is needed in the HTML5 specification, and possibly other markup vocabularies, to adequately support ruby markup. It looks at a number of use cases and how well they are supported by the various markup models.
-
CJKV Information Processing
Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)
^ Find another task
Working with form controls Working with strings in JavaScript & databases
- Use a topic-comment approach whenever possible. more
- Avoid sentence-like arrangements when they contain substrings that are predefined translatable text or numeric text. more
- Use sentence-like arrangements with care if you have non-numeric and non-translatable text substrings (ie. text created at runtime). more
- Where the parts of a composite message appear in separate locations, provide the translator with contextual information to show how the various parts of a composite message relate to each other. more
- Provide information to the translator, where needed, to clarify what a substring represents. more
- When requested by the localization group, be prepared to provide information about the size of each substring. more
- Strings should be reused where text is always used in exactly the same context, or where the string is a self-contained, independent sentence or phrase. more
- Reused strings must not refer to more than one text, graphic or conceptual context. more
- If in doubt as to whether a string is a good candidate for re-use, don't. more
- If re-used strings will be displayed in fixed-sized displayers of varying sizes, ensure that the translation will all fit in the smallest sized display box. more
How to's
-
Working with Composite Messages
Why you need to be very careful about splitting up and reusing text on-screen. The linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve.
-
Re-using Strings in Scripted Content
Things to be aware of if you plan to use the same text string in different places on your site or user interface.
^ Find another task
Indicating what should and should not be translated
- Use the
translate
attribute on an element to prevent its content being translated by online translation services or by computer-assisted translation tools. more
^ Find another task
Styling & layout Getting started Preparing for text expansion during translation
- Ensure that your graphic backgrounds can automatically expand with the text they are related to, avoid highly constrained spaces, and anticipate that the box containing your text may grow during translation. more
Background reading
-
Text size in translation
Overview of text expansion issues.
-
Display capabilities
Do I need to worry because display capabilities (screen sizes, number of colors, etc.) of computers vary in other countries?
-
Sliding Doors of CSS
Douglas Bowman's article in A List Apart about how to layer background images, allowing them to slide over each other to create certain effects. (A note from the editors: While brilliant for its time, this article no longer reflects modern best practices.)
^ Find another task
Styling by language Using logical property styles
- Use CSS logical properties wherever possible, so as to facilitate localization into right-to-left and vertically-set scripts.
^ Find another task
Styling counters for lists, etc.
- Use the CSS
@counter-style
rule to define or modify counters used for list markers, figure numbering, chapter headings, etc..
- Don't assume that all writing systems prefer a ragged edge at the line end. Fully-justified text is the default for some scripts/languages..
How to's
-
MDN: @counter-style
How to define your own counter styles when the pre-defined styles aren't fitting your needs.
-
Ready-made Counter Styles
Cut-and-paste code snippets for a large number of international counter styles that can be used for ordered lists and other such counters.
^ Find another task
Managing line breaks
- Since default line-breaking rules vary by language, always correctly label your content for language. more
How to's
-
MDN: word-break
Specifies whether or not the browser should insert line breaks wherever the text would otherwise overflow its content box due to a lack of spaces. Particularly useful for Chinese, and Japanese. Values include break-all
and keep-all
.
-
MDN: line-break
For Chinese, Japanese, or Korean (CJK), specifies how (or if) to break lines when working with punctuation and symbols. Values include strict
, normal
, loose
, and anywhere
.
Background reading
-
Approaches to line breaking
High level summary of various typographic strategies for wrapping text at the end of a line, for a variety of scripts.
^ Find another task
Hyphenation See also
This section is specifically about hyphenation. For more general information about line breaking see Managing line breaks.
- Since CSS hyphenation only works if content is labelled for language, always do that. Since hyphenation rules are language-specific, ensure that the language is labelled correctly.
^ Find another task
Justifying and aligning text See also
Justification behaviour is closely associated with line-breaking and hyphenation. For more information on those topics, see Managing line breaks.
- Wherever possible use
start
and end
values for the CSS text-align
property, rather than left
and right
. Only use left
and right
on the rare occasions when the alignment has to remain as is, regardless of language. more
- Only use
text-align
when you really need to override the alignment produced by the current base direction. Don't litter your markup or stylesheet with unnecessary alignment calls.
- Avoid using HTML attributes with values of
left
and right
. Instead add selectors to your CSS stylesheet. This allows you to use logical properties, but also makes it much easier to change things during localisation.
- Use CSS property names that include the words 'start' and 'end', rather than 'left', 'right', 'top', and 'bottom'. Eg.
margin-inline-start
and margin-block-start
. more
- Don't assume that all writing systems prefer a ragged edge at the line end. Fully-justified text is the preferred default for some scripts/languages.
- Since justification rules vary by language, always correctly label your content for language. more
How to's
-
MDN: text-align
Specifies the horizontal alignment of an inline or table-cell box, including the value justify
, which is used to turn on justification.
-
MDN: text-justify
Defines what type of justification should be applied to text when it is justified (ie. when text-align:justify
is set). Values include inter-word
and inter-character
.
Background reading
-
Approaches to full justification
High level summary of various typographic strategies for fully justifying text on a line and in a paragraph for a variety of scripts, and some advice for authors and implementers.
^ Find another task
Creating vertical text Other links
^ Find another task
Styling ruby text See also
This section is specifically about styling ruby text. For more information about markup for ruby see Using ruby markup.
How to's
Background reading
-
Ruby
What is 'ruby'?
-
CJKV Information Processing
Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7, especially chapters 6 and 7)
Tests
^ Find another task
Applying various script-specific typographic conventions Other links
-
Document grids
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
-
Kumimoji and warichu
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
-
Emphasis
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
^ Find another task
Using fonts & webfonts Working with date formats Working with personal names
- Ask yourself whether you really need to have separate fields for given name and family name. more
- Make input fields long enough to enter long names, and ensure that if the name is displayed on a web page later there is enough space for it. more
- Avoid limiting the field size for names in your database. more
- Try to avoid using the labels 'first name' and 'last name' in non-localized forms. more
- Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where you ask the user to enter the part(s) of their name that you need to use for a specific purpose. more
- Ask separately, when setting up a profile for example, how that person would like you to address them. more
- If you have separate fields for parts of a person's name, ensure that you label clearly which parts you want where more
- Be careful about assumptions built into algorithms that pull out the parts of a name automatically. more
- Be as clear as possible about telling people how to specify their name. more
- Don't assume that a single letter name is an initial. more
- Don't require that people supply a family name. more
- Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. more
- Don't require names to be entered all in upper case. more
- Allow the user to enter a name with spaces. more
- Don't assume that members of the same family will share the same family name. more
- It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'. more
- If you hope to get Latin- or ASCII-only, you need to tell the user. more
- You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, using separate fields. more
- If you do accept non-ASCII names, you should use a Unicode character encoding (eg. UTF-8) in your pages, your back end databases and in all the software code in between. more
How to's
-
Personal names around the world
How do people's names differ around the world, and what are the implications of those differences on the design of forms, databases, ontologies, etc. for the Web?
^ Find another task
Bidirectional text Getting started Setting up a right-to-left page
- Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
- Add
dir="rtl"
to the html
tag any time the overall document direction is right-to-left. more
- Don't add
dir="rtl"
to the body
tag. more
- If you need to avoid the scroll bar moving on some browsers, put
dir
on the head
element and a div
just inside the body
element. more
- Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding. more
- If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8. more
- Do not use CSS styling to control directionality in HTML. Use markup. more
^ Find another task
Setting direction on block elements
- Add the
dir
attribute to a block element to change base direction. more
- Do not use CSS styling to control directionality in HTML. Use markup. more
- Only use bidi markup to set the base direction for the document as a whole, or where you need to change the base direction. more
^ Find another task
Managing text direction in form controls
- Add
dir="auto"
to input
tags to automatically align text to the correct side of an input field. more
- Add
dir="auto"
to textarea
and pre
tags to make paragraphs align to the left or right according to the intial strong character more
- Consider using the
dirname
attribute to pass information to the server about the direction of text in a text or search form control. more
^ Find another task
Mixing text direction inline
- Tightly wrap every opposite-direction phrase in markup that sets its base direction. more
- If you know the phrase's direction, wrap it in an element with a
dir
attribute. If you don't already have an element around the text, use span
or bdi
. more
- If you don't know the phrase's direction, ie. unknown text that will be injected at run time, then either wrap the phrase in
bdi
(no dir
attribute needed), or if the phrase is tightly wrapped by an element already, just add dir="auto"
to that element. more
- To bulletproof the code for Edge or legacy browsers, if the tightly-wrapped phrase is followed inline (possibly after some intervening neutral characters) by a number, or is one of a list of separate phrases with the same direction, then add a directional mark (RLM or LRM) immediately after the markup of that phrase. more
- Only use Unicode control characters for bidirectional control in attribute text or element text that allows no internal markup. more
- Consider using Unicode control characters to set the base direction around bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes. more
- Do not leave white space at the end of inline elements that mark a directional boundary. more
^ Find another task
Handling parentheses and other mirrored characters
- Treat mirrored characters as if any word
left
in the name meant 'opening', and right
meant 'closing'. more
How to's
^ Find another task
Overriding the Unicode bidirectional algorithm Navigation Getting started Linking to localized content
- Use server-based, language-related content negotiation to point the user to the page that matches their browser preferences, but also add links to each page so that the user can change languages easily if they prefer. more
- Consider how to indicate to the user where the in-page language links are, and if the page is available in a long list of languages, consider whether or not to use something like a select control (and if so, how to make it obvious what its function is). more
- Locate pull-down menus or selection lists at or near the top of the page. more
- Use a recognizable image alongside a pull-down menu to indicate that it is a control which will take the user to localized pages. Do not use text. more
- Consider using the size attribute to display the first set of options in a select control. more
- Translate the links or options into the target language. more
- Encode your page as UTF-8, so that it supports the necessary characters. more
- Decide whether it is a problem that a user won't have fonts for all the list items or menu options. If it is, use javascript menus or some other graphic-based approach. more
- Decide whether to add a description alongside each option, using the language of the current page, so that users can tell what the native word means. more
- Find the most appropriate way of ordering the list of options. more
How to's
-
Guiding users to translated pages
If my site contains alternative language versions of the same page, what can I do to help the user see the page in their preferred language?
-
Using <select> to Link to Localized Content
What are the best practices for using pull-down menus based on the select element to direct visitors to localized content?
-
About languages and flags
On some Web pages you’ll find country flags as symbols for languages. This article explains why this approach is problematic, and what you should do instead.
^ Find another task
Using content negotiation
- Use server-based, language-related content negotiation to point the user to the page that matches their browser preferences, but also add links to each page so that the user can change languages easily if they prefer. more
- If the user switches to a different language, offer them the opportunity to remember that choice and serve up subsequent pages in that language, overriding their browser settings. more
^ Find another task
You can link to this page and open specific items by using the open
parameter in the URL. For example, authoring-html.en?open=language&open=langvalues
will automatically open the sections Language and Choosing language tags. The necessary parameter values are shown to the right of each heading. These are links, to help you create a URL for sharing. The query ?open=all expands all sections.
.
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4