A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://unicode-org.github.io/icu/userguide/collation below:

Collation | ICU Documentation

Collation Overview

Information is displayed in sorted order to enable users to easily find the items they are looking for. However, users of different languages might have very different expectations of what a “sorted” list should look like. Not only does the alphabetical order vary from one language to another, but it also can vary from document to document within the same language. For example, phonebook ordering might be different than dictionary ordering. String comparison is one of the basic functions most applications require, and yet implementations often do not match local conventions. The ICU Collation Service provides string comparison capability with support for appropriate sort orderings for each of the locales you need. In the event that you have a very unusual requirement, you are also provided the facilities to customize orderings.

Starting in release 1.8, the ICU Collation Service is compliant to the Unicode Collation Algorithm (UCA) (Unicode Technical Standard #10) and based on the Default Unicode Collation Element Table (DUCET) which defines the same sort order as ISO 14651.

The ICU Collation Service also contains several enhancements that are not available in UCA. These have been adopted into the CLDR Collation Algorithm. For example:

In other words, ICU implements the CLDR Collation Algorithm which is an extension of the Unicode Collation Algorithm (UCA) which is an extension of ISO 14651.

There are several benefits to using the collation algorithms defined in these standards, including:

In addition, Unicode contains a large set of characters. This can make it difficult for collation to be a fast operation or require collation to use significant memory or disk resources. The ICU collation implementation is designed to be fast, have a small memory footprint and be highly customizable.

There are many challenges when accommodating the world’s languages and writing systems and the different orderings that are used. However, the ICU Collation Service provides an excellent means for comparing strings in a locale-sensitive fashion.

For example, here are some of the ways languages vary in ordering strings:

To accommodate the many languages and differing requirements, ICU collation supports customizing sort orderings - also known as tailoring. More details regarding tailoring are discussed in the Customization chapter.

The basic ICU Collation Service is provided by two main categories of APIs:

ICU provides an AlphabeticIndex API for generating language-appropriate sorted-section labels like in dictionaries and phone books.

ICU also provides a higher-level string search API which can be used, for example, for case-insensitive or accent-insensitive search in an editor or in a web page. ICU string search is based on the low-level collation element iteration.

Programming Examples

Here are some API usage conventions for the ICU Collation Service APIs.

Table of contents

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4