Stay organized with collections Save and categorize content based on your preferences.
GoogleSQL for BigQuery supports collation. You can learn more about collation in this topic.
About collationCollation determines how strings are sorted and compared in collation-supported operations. If you would like to customize collation for a collation-supported operation, you must assign a collation specification to at least one string in the operation. Some operations can't use collation, but can pass collation through them.
Operations affected by collationWhen an operation is affected by collation, this means that the operation takes into consideration collation during the operation. These query operations are affected by collation when sorting and comparing strings:
Operations that propagate collationCollation can pass through some query operations to other parts of a query. When collation passes through an operation in a query, this is known as propagation. During propagation:
GoogleSQL has several functions, operators, and expressions that can propagate collation.
In the following example, the 'und:ci'
collation specification is propagated from the character
column to the ORDER BY
operation.
-- With collation
SELECT *
FROM UNNEST([
COLLATE('B', 'und:ci'),
'b',
'a'
]) AS character
ORDER BY character
/*-----------*
| character |
+-----------+
| a |
| B |
| b |
*-----------*/
-- Without collation
SELECT *
FROM UNNEST([
'B',
'b',
'a'
]) AS character
ORDER BY character
/*-----------*
| character |
+-----------+
| B |
| a |
| b |
*-----------*/
Functions
These functions let collation propagate through them:
OperatorsThese operators let collation propagate through them:
ExpressionsThese expressions let collation propagate through them:
Expression NotesARRAY
When you construct an ARRAY
, collation on input arguments is propagated on the elements in the ARRAY
. CASE
CASE
expr COALESCE
IF
IFNULL
NULLIF
STRUCT
When you construct a STRUCT
, collation on input arguments is propagated on the fields in the STRUCT
. Additional features that support collation
These features in BigQuery generally support collation:
Where you can assign a collation specificationA collation specification can be assigned to these collation-supported types:
STRING
STRING
field in a STRUCT
STRING
element in an ARRAY
In addition:
COLLATE
function. This overrides any collation specifications set previously.In summary:
You can define a default collation specification for a dataset. For example:
CREATE SCHEMA (...)
DEFAULT COLLATE 'und:ci'
You can define a default collation specification for a table. For example:
CREATE TABLE (...)
DEFAULT COLLATE 'und:ci'
You can define a collation specification for a collation-supported column. For example:
CREATE TABLE (
case_insensitive_column STRING COLLATE 'und:ci'
)
You can specify a collation specification for a collation-supported expression with the COLLATE
function. For example:
SELECT COLLATE('a', 'und:ci') AS character
DDL statements Location Support Notes Dataset CREATE SCHEMA
Create a dataset and optionally add a default collation specification to the dataset. Dataset ALTER SCHEMA
Updates the default collation specification for a dataset. Table CREATE TABLE
Create a table and optionally add a default collation specification to a table or a collation specification to a collation-supported type in a column.
You can't have collation on a column used with CLUSTERING
.
ALTER TABLE
Update the default collation specification for collation-supported type in a table. Column ADD COLUMN
Add a collation specification to a collation-supported type in a new column in an existing table. Data types Type Notes STRING
You can apply a collation specification directly to this data type. STRUCT
You can apply a collation specification to a STRING
field in a STRUCT
. A STRUCT
can have STRING
fields with different collation specifications. A STRUCT
can only be used in comparisons with the following operators and conditional expressions: =
, !=
, IN
, NULLIF
, and CASE
. ARRAY
You can apply a collation specification to a STRING
element in an ARRAY
. An ARRAY
can have STRING
elements with different collation specifications. Note: Use the COLLATE
function to apply a collation specification to collation-supported expressions. Functions, operators, and conditional expressions Functions Operators Conditional expressions
The preceding collation-supported operations (functions, operators, and conditional expressions) can include input with explicitly defined collation specifications for collation-supported types. In a collation-supported operation:
For example:
-- Assume there's a table with this column declaration:
CREATE TABLE table_a
(
col_a STRING COLLATE 'und:ci',
col_b STRING COLLATE '',
col_c STRING,
col_d STRING COLLATE 'und:ci'
);
-- This runs. Column 'b' has a collation specification and the
-- column 'c' doesn't.
SELECT STARTS_WITH(col_b_expression, col_c_expression)
FROM table_a;
-- This runs. Column 'a' and 'd' have the same collation specification.
SELECT STARTS_WITH(col_a_expression, col_d_expression)
FROM table_a;
-- This runs. Even though column 'a' and 'b' have different
-- collation specifications, column 'b' is considered the default collation
-- because it's assigned to an empty collation specification.
SELECT STARTS_WITH(col_a_expression, col_b_expression)
FROM table_a;
-- This works. Even though column 'a' and 'b' have different
-- collation specifications, column 'b' is updated to use the same
-- collation specification as column 'a'.
SELECT STARTS_WITH(col_a_expression, COLLATE(col_b_expression, 'und:ci'))
FROM table_a;
-- This runs. Column 'c' doesn't have a collation specification, so it uses the
-- collation specification of column 'd'.
SELECT STARTS_WITH(col_c_expression, col_d_expression)
FROM table_a;
Collation specification details
A collation specification determines how strings are sorted and compared in collation-supported operations. You can define the Unicode collation specification, und:ci
, for collation-supported types.
If a collation specification isn't defined, the default collation specification is used. To learn more, see the next section.
Default collation specificationWhen a collation specification isn't assigned or is empty, 'binary'
collation is used. Binary collation indicates that the operation should return data in Unicode code point order. You can't set binary collation explicitly.
In general, the following behavior occurs when an empty string is included in collation:
und:ci
collation, the string comparison is case-insensitive.collation_specification:
'language_tag:collation_attribute'
A unicode collation specification indicates that the operation should use the Unicode Collation Algorithm to sort and compare strings. The collation specification can be a STRING
literal or a query parameter.
The language tag determines how strings are generally sorted and compared. Allowed values for language_tag
are:
und
: A locale string representing the undetermined locale. und
is a special language tag defined in the IANA language subtag registry and used to indicate an undetermined locale. This is also known as the root locale and can be considered the default Unicode collation. It defines a reasonable, locale agnostic collation.In addition to the language tag, the unicode collation specification must have a collation_attribute
, which enables additional rules for sorting and comparing strings. Allowed values are:
ci
: Collation is case-insensitive.This is what the ci
collation attribute looks like when used with the und
language tag in the COLLATE
function:
COLLATE('orange1', 'und:ci')
Caveats
Differing strings can be considered equal. For instance, ẞ
(LATIN CAPITAL LETTER SHARP S) is considered equal to 'SS'
in some contexts. The following expressions both evaluate to TRUE
:
COLLATE('ẞ', 'und:ci') > COLLATE('SS', 'und:ci')
COLLATE('ẞ1', 'und:ci') < COLLATE('SS2', 'und:ci')
This is similar to how case insensitivity works.
In search operations, strings with different lengths could be considered equal. To ensure consistency, collation should be used without search tailoring.
There are a wide range of unicode code points (punctuation, symbols, etc), that are treated as if they aren't there. So strings with and without them are sorted identically. For example, the format control code point U+2060
is ignored when the following strings are sorted:
SELECT *
FROM UNNEST([
COLLATE('oran\u2060ge1', 'und:ci'),
COLLATE('\u2060orange2', 'und:ci'),
COLLATE('orange3', 'und:ci')
]) AS fruit
ORDER BY fruit
/*---------*
| fruit |
+---------+
| orange1 |
| orange2 |
| orange3 |
*---------*/
Ordering may change. The Unicode specification of the und
collation can change occasionally, which can affect sorting order.
Limitations for supported features are captured in the previous sections, but here are a few general limitations to keep in mind:
und:ci
and empty collation are supported, but not other collation specifications.You can't set non-empty collation on a clustering field.
CREATE TABLE my_dataset.my_table
(
word STRING COLLATE 'und:ci',
number INT64
)
CLUSTER BY word;
-- User error:
-- "CLUSTER BY STRING column word with
-- collation und:ci isn't supported"
You can't create a materialized view with collated sort keys in an aggregate function.
CREATE MATERIALIZED VIEW my_dataset.my_view
AS SELECT
-- Assume collated_table.col_ci is a string column with 'und:ci' collation.
ARRAY_AGG(col_int64 ORDER BY col_ci) AS col_int64_arr
FROM my_dataset.collated_table;
-- User error:
-- "Sort key with collation in aggregate function array_agg isn't
-- supported in materialized view"
If a materialized view has joined on collated columns and not all of the collated columns were produced by the materialized view, it's possible that a query with the materialized view will use data from base tables rather than the materialized view.
CREATE MATERIALIZED VIEW my_dataset.my_mv
AS SELECT
t1.col_ci AS t1_col_ci,
t2.col_int64 AS t2_col_int64
FROM my_dataset.collated_table1 AS t1
JOIN my_dataset.collated_table2 AS t2
ON t1.col_ci = t2.col_ci
SELECT * FROM my_dataset.my_mv
WHERE t1_col_ci = 'abc'
-- Assuming collated_table1.col_ci and collated_table2.col_ci are columns
-- with 'und:ci' collation, the query to my_mv may use data from
-- collated_table1 and collated_table2, rather than data from my_mv.
Table functions can't take collated arguments.
CREATE TABLE FUNCTION my_dataset.my_tvf(x STRING) AS (
SELECT x
);
SELECT * FROM my_dataset.my_tvf(COLLATE('abc', 'und:ci'));
-- User error:
-- "Collation 'und:ci' on argument of TVF call isn't allowed"
A table function with collated output columns isn't supported if an explicit result schema is present.
CREATE TABLE FUNCTION my_dataset.my_tvf(x STRING)
RETURNS TABLE<output_str STRING>
AS (SELECT COLLATE(x, 'und:ci') AS output_str);
-- User error:
-- "Collation 'und:ci' on output column output_str isn't allowed when an
-- explicit result schema is present"
User-defined functions (UDFs) can't take collated arguments.
CREATE FUNCTION tmp_dataset.my_udf(x STRING) AS (x);
SELECT tmp_dataset.my_udf(col_ci)
FROM shared_dataset.table_collation_simple;
-- User error:
-- "Collation isn't allowed on argument x ('und:ci').
-- Use COLLATE(arg, '') to remove collation at [1:8]"
Collation in the return type of a user-defined function body isn't allowed.
CREATE FUNCTION my_dataset.my_udf(x STRING) AS (COLLATE(x, 'und:ci'));
-- User error:
-- "Collation ['und:ci'] in return type of user-defined function body is
-- not allowed"
External tables don't support collation.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["GoogleSQL for BigQuery supports collation, which dictates how strings are sorted and compared in various operations."],["Collation can be customized by assigning a collation specification to a string in a collation-supported operation, with some operations also capable of propagating collation through them."],["Collation affects operations like comparison operations, joins, `ORDER BY`, `GROUP BY`, `WINDOW`, and several functions and set operations when sorting and comparing strings."],["Collation can propagate through certain functions, operators, and expressions, allowing the collation specification to be passed to other parts of a query."],["You can define default collation specifications at the dataset or table level, and also assign specific collation to string types in columns, or apply it directly to an expression using the `COLLATE` function; however there are limitations such as not allowing collation on clustering fields, or collated arguments in table or user-defined functions."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4