Character values are compared on the basis of two measures:
Binary or linguistic collation
Blank-padded or nonpadded comparison semantics
The following subsections describe the two measures.
Binary and Linguistic Collation
In binary collation, which is the default, Oracle compares character values like binary values. Two sequences of bytes that form the encodings of two character values in their storage character set are treated as binary values and compared as described in Binary Values. The result of this comparison is returned as the result of the binary comparison of the source character values.
For many languages, the binary collation can yield a linguistically incorrect ordering of character values. For example, in most common character sets, all the uppercase Latin letters have character codes with lower values than all the lowercase Latin letters. Hence, the binary collation yields the following order:
MacDonald
MacIntosh
Macdonald
Macintosh
However, most users expect these four values to be presented in the order:
MacDonald
Macdonald
MacIntosh
Macintosh
This shows that binary collation may not be suitable even for English character values.
Oracle Database supports linguistic collations that order strings according to rules of various spoken languages. It also supports collation variants that match character values case- and accent-insensitively. Linguistic collations are more expensive but they provide superior user experience.
Restrictions for Linguistic Collations
Comparison conditions, ORDER
BY
, GROUP
BY
and MATCH_RECOGNIZE
query clauses, COUNT(DISTINCT)
and statistical aggregate functions, LIKE
conditions, and ORDER
BY
and PARTITION
BY
analytic clauses generate collation keys when using linguistic collations. The collation keys are the same values that are returned by the function NLSSORT
and are subject to the same restrictions that are described in NLSSORT.
Blank-Padded and Nonpadded Comparison Semantics
With blank-padded semantics, if the two values have different lengths, then Oracle first adds blanks to the end of the shorter one so their lengths are equal. Oracle then compares the values character by character up to the first character that differs. The value with the greater character in the first differing position is considered greater. If two values have no differing characters, then they are considered equal. This rule means that two values are equal if they differ only in the number of trailing blanks. Oracle uses blank-padded comparison semantics only when both values in the comparison are either expressions of data type CHAR
, NCHAR
, text literals, or values returned by the USER
function.
With nonpadded semantics, Oracle compares two values character by character up to the first character that differs. The value with the greater character in that position is considered greater. If two values of different length are identical up to the end of the shorter one, then the longer value is considered greater. If two values of equal length have no differing characters, then the values are considered equal. Oracle uses nonpadded comparison semantics whenever one or both values in the comparison have the data type VARCHAR2
or NVARCHAR2
.
The results of comparing two character values using different comparison semantics may vary. The table that follows shows the results of comparing five pairs of character values using each comparison semantic. Usually, the results of blank-padded and nonpadded comparisons are the same. The last comparison in the table illustrates the differences between the blank-padded and nonpadded comparison semantics.
Blank-Padded Nonpadded'ac' > 'ab'
'ac' > 'ab'
'ab' > 'a '
'ab' > 'a '
'ab' > 'a'
'ab' > 'a'
'ab' = 'ab'
'ab' = 'ab'
'a ' = 'a'
'a ' > 'a'
Data-Bound Collation
Starting with Oracle Database 12c Release 2 (12.2), the collation to use when comparing or matching a given character value is associated with the value itself. It is called the data-bound collation. The data-bound collation can be viewed as an attribute of the data type of the value.
In previous Oracle Database releases, the session parameters NLS_COMP
and NLS_SORT
coarsely determined the collation for all collation-sensitive SQL operations in a database session. The data-bound collation architecture enables applications to consistently apply language-specific comparison rules to exactly the data that needs these rules.
Oracle Database 12c Release 2 (12.2) allows you to declare a collation for a table column. When a column is passed as an argument to a collation-sensitive SQL operation, the SQL operation uses the column's declared collation to process the column's values. If the SQL operation has multiple character arguments that are compared to each other, the collation determination rules determine the collation to use.
There are two types of data-bound collations:
Named Collation: This collation is a particular set of collating rules specified by a collation name. Named collations are the same collations that are specified as values for the NLS_SORT
parameter. A named collation can be either a binary collation or a linguistic collation.
Pseudo-collation: This collation does not directly specify the collating rules for a SQL operation. Instead, it instructs the operation to check the values of the session parameters NLS_SORT
and NLS_COMP
for the actual named collation to use. Pseudo-collations are the bridge between the new declarative method of specifying collations and the old method that uses session parameters. In particular, the pseudo-collation USING_NLS_COMP
directs a SQL operation to behave exactly as it used to behave before Oracle Database 12c Release 2.
When you declare a named collation for a column, you statically determine how the column values are compared. When you declare a pseudo-collation, you can dynamically control comparison behavior with the session parameter NLS_COMP
and NLS_SORT
. However, static objects, such as indexes and constraints, defined on a column declared with a pseudo-collation, fall back to using a binary collation. Dynamically settable collating rules cannot be used to compare values for a static object.
The collation for a character literal or bind variable that is used in an expression is derived from the default collation of the database object containing the expression, such as a view or materialized view query, a PL/SQL stored unit code, a user-defined type method code, or a standalone DML or query statement. In Oracle Database 12c Release 2, the default collation of PL/SQL stored units, user-defined type methods, and standalone SQL statements is always the pseudo-collation USING_NLS_COMP
. The default collation of views and materialized views can be specified in the DEFAULT
COLLATION
clause of the CREATE
VIEW
and CREATE
MATERIALIZED
VIEW
statements.
If a SQL operation returns character values, the collation derivation rules determine the derived collation for the result, so that its collation is known, when the result is passed as an argument to another collation-sensitive SQL operation in the expression tree or to a top-level consumer, such as an SQL statement clause in a SELECT
statement. If a SQL operation operates on character argument values, then the derived collation of its character result is based on the collations of the arguments. Otherwise, the derivation rules are the same as for a character literal.
You can override the derived collation of an expression node, such as a simple expression or an operator result, by using the COLLATE
operator.
Oracle Database allows you to declare a case-insensitive collation for a column, table or schema, so that the column or all character columns in a table or a schema can be always compared in a case-insensitive way.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4