A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-collation below:

Collation | Databricks Documentation

Collation

Applies to: Databricks SQL Databricks Runtime 16.1 and above

A collation is a set of rules that determines how string comparisons are performed. Collations are used to compare strings in a case-insensitive, accent-insensitive, or trailing space insensitive manner, or to sort strings in a specific language-aware order.

Strings in Databricks are represented as UTF-8 encoded Unicode characters. By default Databricks compares strings by their binary UTF8 representation. This is known as UTF8_BINARY collation. UTF8_BINARY comparisons are fast and appropriate in many cases, but may not be suitable for all applications, especially those that require language-aware sorting or comparisons.

Aside from language-aware comparisons, a common use case is enforcing case-insensitivity. Databricks has the UTF8_LCASE collation specifically for this purpose. It converts strings to lowercase before comparing them using the fast UTF8_BINARY collation.

For language-aware comparisons, Databricks employs the following technologies:

These technologies are encapsulated in a set of named collations that can be used in SQL statements.

Collation names​

Because identifying collations by their LDML specification can be complex and challenging to read, Databricks has a set of easier-to-use named system collations.

Syntax​
{ UTF8_BINARY |
UTF8_LCASE |
{ UNICODE | locale } [ _ modifier [...] ] }

locale
language_code [ _ script_code ] [ _ country_code ]

modifier
{ CS | CI | AS | AI | RTRIM }

When processing a collation, Databricks normalizes collation names by removing defaults. For example, SR_CYR_SRN_CS_AS is normalized to SR.

For a list of supported collations, see Supported collations.

Examples​

SQL


system.builtin.unicode


unicode


DE




fr_CAN


zh_Hant_MAC



de_CI_AI


`UTF8_BINARY`
Default collation​

The default collation applies when using STRING literals, parameter markers, functions without STRING parameters producing strings, and when defining column, field or variable types without a COLLATE clause.

The default collation is derived in one of the following ways:

Collation precedence​

To decide which collation to use for a given string Databricks defines collation precedence rules.

The rules assign 4 levels of precedence to collations:

  1. Explicit

    The collation has been explicitly assigned to a string using COLLATE expression.

    Examples

    SQL


    vin COLLATE UTF8_BINARY LIKE 'ZFF%'


    ORDER BY vorname COLLATE DE
  2. Implicit

    The collation is implicitly assigned by the column, field, column-alias, variable, or routine parameter reference. This includes the result of a subquery as long as the collation is not None.

    Examples

    SQL


    employee.name LIKE 'Mc%'


    translate(session.tempvar, 'Z', ',')
  3. Default

    A STRING literal, named or unnamed parameter marker, or a STRING produced by a function from another type.

    Examples

    SQL


    'Hello'


    EXECUTE IMMEDIATE 'SELECT :parm1' USING 'Hello' AS parm1;


    EXECUTE IMMEDIATE 'SELECT ?' USING 'Hello';


    CAST(5 AS STRING)


    to_char(DATE'2016-04-08', 'y')


    session_user()

    The assigned collation is the Default Collation.

  4. None

    A STRING result of a function, operator or set operation (e.g. UNION) that takes more than one STRING argument which have different implicit collations.

    Examples

    SQL


    SELECT fr || de AS freutsch FROM VALUES('Voulez vous ' COLLATE FR), 'Kartoffelsupp...' COLLATE DE) AS T(fr, de)


    SELECT 'Voulez vous ' COLLATE FR UNION ALL SELECT 'Kartoffelsupp...' COLLATE DE
Collation derivation​

When deriving the collation for a STRING result, the collation precedence rules are applied in the following ways:

If the expression:

  1. matches the definitions above

    The collation and precedence is as defined.

  2. is a function or operator with a single STRING parameter, returning a STRING

    The collation and precedence is that of the STRING parameter.

  3. is a function or operator with two or more STRING parameters

    1. with the same collations and precedence

      The collation and precedence is that of the STRING parameters.

    2. with different collations or precedence

      Let C1 and C2 be distinct collations and let D be the default collation. The precedence and the collation is determined by the following table:

Examples​

SQL

> SELECT 'hello' = 'hello   ' COLLATE UNICODE_RTRIM;
true

> CREATE TABLE words(fr STRING COLLATE FR, de STRING COLLATE DE, en STRING COLLATE EN);
> INSERT INTO words VALUES ('Salut', 'Hallo', 'Hello');


> SELECT collation('Ciao');
UTF8_BINARY


> SELECT collation(user());
UTF8_BINARY


> SELECT collation(upper('Ciao'));
UTF8_BINARY


> SELECT collation(fr || 'Ciao') FROM words;
FR


> SELECT collation('Salut' COLLATE FR || de) FROM words;
FR



> SELECT collation(de || fr) FROM words;
null


> SELECT collation('Salut' COLLATE FR || 'Ciao');
FR


> SELECT collation('Salut' COLLATE FR || 'Hallo' COLLATE DE);
COLLATION_MISMATCH.EXPLICIT


> SELECT collation('Ciao' COLLATE IT || (fr || de)) FROM words;
IT


> SELECT collation(en || (fr || de)) FROM words;
null


> SELECT collation((fr || ltrim('H' COLLATE EN, fr)) || fr) FROM words;
EN

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4