A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://cloud.google.com/spanner/docs/reference/standard-sql/search_functions below:

Search functions in GoogleSQL | Spanner

Skip to main content Search functions in GoogleSQL

Stay organized with collections Save and categorize content based on your preferences.

GoogleSQL for Spanner supports the following search functions.

Categories

The search functions are grouped into the following categories, based on their behavior:

Function list Name Summary DEBUG_TOKENLIST Displays a human-readable representation of tokens present in the TOKENLIST value for debugging purposes. SCORE Calculates a relevance score of a TOKENLIST for a full-text search query. The higher the score, the stronger the match. SCORE_NGRAMS Calculates a relevance score of a TOKENLIST for a fuzzy search. The higher the score, the stronger the match. SEARCH Returns TRUE if a full-text search query matches tokens. SEARCH_NGRAMS Checks whether enough n-grams match the tokens in a fuzzy search. SEARCH_SUBSTRING Returns TRUE if a substring query matches tokens. SNIPPET Gets a list of snippets that match a full-text search query. TOKEN Constructs an exact match TOKENLIST value by tokenizing a BYTE or STRING value verbatim to accelerate exact match expressions in SQL. TOKENIZE_BOOL Constructs a boolean TOKENLIST value by tokenizing a BOOL value to accelerate boolean match expressions in SQL. TOKENIZE_FULLTEXT Constructs a full-text TOKENLIST value by tokenizing text for full-text matching. TOKENIZE_JSON Constructs a JSON TOKENLIST value by tokenizing a JSON value to accelerate JSON predicate expressions in SQL. TOKENIZE_NGRAMS Constructs an n-gram TOKENLIST value by tokenizing a STRING value for matching n-grams. TOKENIZE_NUMBER Constructs a numeric TOKENLIST value by tokenizing numeric values to accelerate numeric comparison expressions in SQL. TOKENIZE_SUBSTRING Constructs a substring TOKENLIST value by tokenizing text for substring matching. TOKENLIST_CONCAT Constructs a TOKENLIST value by concatenating one or more TOKENLIST values. DEBUG_TOKENLIST
DEBUG_TOKENLIST(tokenlist)

Description

Displays a human-readable representation of tokens present in a TOKENLIST value for debugging purposes.

Definitions

Details

Note: The returned STRING value is intended solely for debugging purposes and its format is subject to change without notice.

The output of this function is dependent on the source of the TOKENLIST value provided as input.

Return type

STRING

Examples

The following query illustrates how attributes and positions are represented:

SELECT DEBUG_TOKENLIST(TOKENIZE_FULLTEXT('Hello DB #World')) AS Result;

/*------------------------------------------------+
 | Result                                         |
 +------------------------------------------------+
 | hello(boundary), db, [#world, world](boundary) |
 +------------------------------------------------*/

The following query illustrates how equality and range are represented:

SELECT DEBUG_TOKENLIST(TOKENIZE_NUMBER([1, 10], min=> 1, max=>10)) AS Result;

/*--------------------------------------------------------------------------------+
 | Result                                                                         |
 +--------------------------------------------------------------------------------+
 | ==1, ==10, [1, 1], [1, 2], [1, 4], [1, 8], [9, 10], [9, 12], [9, 16], [10, 10] |
 +--------------------------------------------------------------------------------*/
SCORE
SCORE(
  tokens,
  search_query
  [, dialect => { "rquery" | "words" | "words_phrase" } ]
  [, language_tag => value ]
  [, enhance_query => { TRUE | FALSE } ]
  [, options => value ]
)

Description

Calculates a relevance score of a TOKENLIST for a full-text search query. The higher the score, the stronger the match.

Definitions

Details

Versions

The SCORE algorithm is periodically updated. After a short evaluation period, the default behavior updates to the newest version. You are encouraged to leave the version unspecified so that your database can benefit from improvements to the SCORE algorithm. However, you can set the version number in the options argument to retain old behavior.

Return type

FLOAT64

Examples

The following examples reference a table called Albums and a search index called AlbumsIndex.

The Albums table contains a column called DescriptionTokens, which tokenizes the input added to the Description column, and then saves those tokens in the DescriptionTokens column. Finally, AlbumsIndex indexes DescriptionTokens. Once DescriptionTokens is indexed, it can be used with the SCORE function.

CREATE TABLE Albums (
  SingerId INT64 NOT NULL,
  AlbumId INT64 NOT NULL,
  Description STRING(MAX),
  DescriptionTokens TOKENLIST AS (TOKENIZE_FULLTEXT(Description)) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(DescriptionTokens);

INSERT INTO Albums (SingerId, AlbumId, Description) VALUES (1, 1, 'classical album');
INSERT INTO Albums (SingerId, AlbumId, Description) VALUES (1, 2, 'classical and rock album');

The following query searches the column called Description for a token called classical album. If this token is found for singer ID 1, the matching Description are returned with the corresponding score. Both classical album and classical and rock album have the terms classical and album, but the first one has a higher score because the terms are adjacent.

SELECT
  a.Description, SCORE(a.DescriptionTokens, 'classical album') AS Score
FROM
  Albums a
WHERE
  SEARCH(a.DescriptionTokens, 'classical album');

/*--------------------------+---------------------*
 | Description              | Score               |
 +--------------------------+---------------------+
 | classical album          | 1.2818930149078369  |
 | classical and rock album | 0.50003194808959961 |
 *--------------------------+---------------------*/

The following query is like the previous one. However, scores are boosted more with bigram_weight on adjacent positions.

SELECT
  a.Description,
  SCORE(
    a.DescriptionTokens,
    'classical album',
    options=>JSON '{"bigram_weight": 3.0}'
  ) AS Score
FROM Albums a
WHERE SEARCH(a.DescriptionTokens, 'classical album');

/*--------------------------+---------------------*
 | Description              | Score               |
 +--------------------------+---------------------+
 | classical album          | 1.7417128086090088  |
 | classical and rock album | 0.50003194808959961 |
 *--------------------------+---------------------*/

The following query uses SCORE in the ORDER BY clause to get the row with the highest score.

SELECT a.Description
FROM Albums a
WHERE SEARCH(a.DescriptionTokens, 'classical album')
ORDER BY SCORE(a.DescriptionTokens, 'classical album') DESC
LIMIT 1;

/*--------------------------*
 | Description              |
 +--------------------------+
 | classical album          |
 *--------------------------*/
SCORE_NGRAMS
SCORE_NGRAMS(
  tokens,
  ngrams_query
  [, language_tag => value ]
  [, algorithm => value ]
  [, array_aggregator => value ]
)

Description

Calculates a relevance score of a TOKENLIST for a fuzzy search. The higher the score, the stronger the match.

Definitions

Details

Return type

FLOAT64

Examples

The following examples reference a table called Albums and a search index called AlbumsIndex.

The Albums table contains a column DescriptionSubstrTokens which tokenizes Description column using TOKENIZE_SUBSTRING. Finally, AlbumsIndex stores Description, so that the query below doesn't have to join with the base table.

CREATE TABLE Albums (
  AlbumId INT64 NOT NULL,
  Description STRING(MAX),
  DescriptionSubstrTokens TOKENLIST AS
    (TOKENIZE_SUBSTRING(Description, ngram_size_max=>3)) HIDDEN
) PRIMARY KEY (AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(DescriptionSubstrTokens)
  STORING(Description);

INSERT INTO Albums (AlbumId, Description) VALUES (1, 'rock album');
INSERT INTO Albums (AlbumId, Description) VALUES (2, 'classical album');

The following query scores Description with clasic albun, which is misspelled.

SELECT
  a.Description, SCORE_NGRAMS(a.DescriptionSubstrTokens, 'clasic albun') AS Score
FROM
  Albums a

/*-----------------+---------------------*
 | Description     | Score               |
 +-----------------+---------------------+
 | rock album      | 0.14285714285714285 |
 | classical album | 0.38095238095238093 |
 *-----------------+---------------------*/

The following query uses SCORE_NGRAMS in the ORDER BY clause to produce the row with the highest score.

SELECT a.Description
FROM Albums a
WHERE SEARCH_NGRAMS(a.DescriptionSubstrTokens, 'clasic albun')
ORDER BY SCORE_NGRAMS(a.DescriptionSubstrTokens, 'clasic albun') DESC
LIMIT 1

/*-----------------*
 | Description     |
 +-----------------+
 | classical album |
 *-----------------*/
SEARCH
SEARCH(
  tokens,
  search_query
  [, dialect => { "rquery" | "words" | "words_phrase" } ]
  [, language_tag => value]
  [, enhance_query => { TRUE | FALSE }]
)

Description

Returns TRUE if a full-text search query matches tokens.

Definitions

Details

Search query syntax dialects

Search query uses rquery syntax by default. You can specify other supported syntax dialects using the dialect argument.

Return type

BOOL

Examples

The following examples reference a table called Albums and a search index called AlbumsIndex.

The Albums table contains a column called DescriptionTokens, which tokenizes the Description column using TOKENIZE_FULLTEXT, and then saves those tokens in the DescriptionTokens column. Finally, AlbumsIndex indexes DescriptionTokens. Once DescriptionTokens is indexed, it can be used with the SEARCH function.

CREATE TABLE Albums (
  SingerId INT64 NOT NULL,
  AlbumId INT64 NOT NULL,
  Description STRING(MAX),
  DescriptionTokens TOKENLIST AS (TOKENIZE_FULLTEXT(Description)) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(DescriptionTokens)
PARTITION BY SingerId;

INSERT INTO Albums (SingerId, AlbumId, Description) VALUES (1, 1, 'rock album');
INSERT INTO Albums (SingerId, AlbumId, Description) VALUES (1, 2, 'classical album');

The following query searches the column called Description for a token called classical. If this token is found for singer ID 1, the matching rows are returned.

SELECT a.AlbumId, a.Description
FROM Albums a
WHERE a.SingerId = 1 AND SEARCH(a.DescriptionTokens, 'classical');

/*---------------------------*
 | AlbumId | Description     |
 +---------------------------+
 | 2       | classical album |
 *---------------------------*/

The following query is like the previous one. However, if Description contains the classical or rock token, the matching rows are returned.

SELECT a.AlbumId, a.Description
FROM Albums a
WHERE a.SingerId = 1 AND SEARCH(a.DescriptionTokens, 'classical OR rock');

/*---------------------------*
 | AlbumId | Description     |
 +---------------------------+
 | 2       | classical album |
 | 1       | rock album      |
 *---------------------------*/

The following query is like the previous ones. However, if Description contains the classic and albums token, the matching rows are returned. When enhance_query is enabled, it includes similar matches of classical and album.

SELECT a.AlbumId, a.Description
FROM Albums a
WHERE a.SingerId = 1 AND SEARCH(a.DescriptionTokens, 'classic albums', enhance_query => TRUE);

/*---------------------------*
 | AlbumId | Description     |
 +---------------------------+
 | 2       | classical album |
 *---------------------------*/
SEARCH_NGRAMS
SEARCH_NGRAMS(
  tokens,
  ngrams_query
  [, language_tag => value ]
  [, min_ngrams => value ]
  [, min_ngrams_percent => value ]
)

Description

Checks whether enough n-grams match the tokens in a fuzzy search.

Definitions

Details

Return type

BOOL

Examples

The following examples reference a table called Albums and a search index called AlbumsIndex.

The Albums table contains columns DescriptionSubstrTokens and DescriptionNgramsTokens which tokenize a Description column using TOKENIZE_SUBSTRING and TOKENIZE_NGRAMS, respectively. Finally, AlbumsIndex indexes DescriptionSubstrTokens and DescriptionNgramsTokens.

CREATE TABLE Albums (
  AlbumId INT64 NOT NULL,
  Description STRING(MAX),
  DescriptionSubstrTokens TOKENLIST AS
    (TOKENIZE_SUBSTRING(Description, ngram_size_min=>3, ngram_size_max=>3)) HIDDEN,
  DescriptionNgramsTokens TOKENLIST AS
    (TOKENIZE_NGRAMS(Description, ngram_size_min=>3, ngram_size_max=>3)) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(DescriptionSubstrTokens, DescriptionNgramsTokens);

INSERT INTO Albums (AlbumId, Description) VALUES (1, 'rock album');
INSERT INTO Albums (AlbumId, Description) VALUES (2, 'classical album');
INSERT INTO Albums (AlbumId, Description) VALUES (3, 'last note');

The following query searches the column Description for clasic. The query is misspelled, so querying with SEARCH_SUBSTRING(a.DescriptionSubstrTokens, 'clasic') doesn't return a row, but the n-grams search is able to find similar matches.

SEARCH_NGRAMS first transforms the query clasic into n-grams of size 3 (the value of DescriptionSubstrTokens's ngram_size_max), producing ['asi', 'cla', 'las', 'sic']. Then it finds rows that have at least two of these n-grams (the default value for min_ngrams) in the DescriptionSubstrTokens column.

SELECT
  a.AlbumId, a.Description
FROM
  Albums a
WHERE
  SEARCH_NGRAMS(a.DescriptionSubstrTokens, 'clasic');

/*---------------------------*
 | AlbumId | Description     |
 +---------------------------+
 | 2       | classical album |
 *---------------------------*/

If we change the min_ngrams to 1, then the query will also return the row with last which has one n-gram match with las. This example illustrates the decreased relevancy of the returned results when this parameter is set low.

SELECT
  a.AlbumId, a.Description
FROM
  Albums a
WHERE
  SEARCH_NGRAMS(a.DescriptionSubstrTokens, 'clasic', min_ngrams=>1);

/*---------------------------*
 | AlbumId | Description     |
 +---------------------------+
 | 2       | classical album |
 | 3       | last notes      |
 *---------------------------*/

The following query searches the column Description for clasic albun. As the DescriptionSubstrTokens is tokenized by TOKENIZE_SUBSTRING, the query is segmented into ['clasic', 'albun'] first, then n-gram tokens are generated from those words, producing the following: ['alb', 'asi', 'bun', 'cla', 'las', 'lbu', 'sic'].

SELECT
  a.AlbumId, a.Description
FROM
  Albums a
WHERE
  SEARCH_NGRAMS(a.DescriptionSubstrTokens, 'clasic albun');

/*---------------------------*
 | AlbumId | Description     |
 +---------------------------+
 | 2       | classical album |
 | 1       | rock album      |
 *---------------------------*/

The following query searches the column Description for l al, but using the DescriptionNgramsTokens this time. As the DescriptionNgramsTokens is generated by TOKENIZE_NGRAMS, there is no splitting into words before making n-gram tokens, so the query n-gram tokens are generated as the following: ['%20al', 'l%20a'].

SELECT
  a.AlbumId, a.Description
FROM
  Albums a
WHERE
  SEARCH_NGRAMS(a.DescriptionNgramsTokens, 'l al');

/*---------------------------*
 | AlbumId | Description     |
 +---------------------------+
 | 2       | classical album |
 *---------------------------*/
SEARCH_SUBSTRING
SEARCH_SUBSTRING(
  tokens,
  substring_query
  [, language_tag => value ]
  [, relative_search_type => value ]
)

Description

Returns TRUE if a substring query matches tokens.

Definitions

Details

Return type

BOOL

Examples

The following examples reference a table called Albums and a search index called AlbumsIndex.

The Albums table contains a column called DescriptionSubstrTokens, which tokenizes the input added to the Description column using TOKENIZE_SUBSTRING, and then saves those substring tokens in the DescriptionSubstrTokens column. Finally, AlbumsIndex indexes DescriptionSubstrTokens. Once DescriptionSubstrTokens is indexed, it can be used with the SEARCH_SUBSTRING function.

CREATE TABLE Albums (
  SingerId INT64 NOT NULL,
  AlbumId INT64 NOT NULL,
  Description STRING(MAX),
  DescriptionSubstrTokens TOKENLIST AS (TOKENIZE_SUBSTRING(Description, support_relative_search=>TRUE)) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(DescriptionSubstrTokens)
PARTITION BY SingerId;

INSERT INTO Albums (SingerId, AlbumId, Description) VALUES (1, 1, 'rock album');
INSERT INTO Albums (SingerId, AlbumId, Description) VALUES (1, 2, 'classical album');

The following query searches the column called Description for a token called ssic. If this token is found for singer ID 1, the matching rows are returned.

SELECT
  a.AlbumId, a.Description
FROM
  Albums a
WHERE
  a.SingerId = 1 AND SEARCH_SUBSTRING(a.DescriptionSubstrTokens, 'ssic');

/*---------------------------*
 | AlbumId | Description     |
 +---------------------------+
 | 2       | classical album |
 *---------------------------*/

The following query searches the column called Description for a token called both lbu and oc. If these tokens are found for singer ID 1, the matching rows are returned.

SELECT
  a.AlbumId, a.Description
FROM
  Albums a
WHERE
  a.SingerId = 1 AND SEARCH_SUBSTRING(a.DescriptionSubstrTokens, 'lbu oc');

/*-----------------------*
 | AlbumId | Description |
 +-----------------------+
 | 1       | rock album  |
 *-----------------------*/

The following query searches the column called Description for a token called al at the start of a word. If this token is found for singer ID 1, the matching rows are returned.

SELECT
  a.AlbumId, a.Description
FROM
  Albums a
WHERE
  a.SingerId = 1 AND SEARCH_SUBSTRING(a.DescriptionSubstrTokens, 'al', relative_search_type=>'word_prefix');

/*---------------------------*
 | AlbumId | Description     |
 +---------------------------+
 | 2       | classical album |
 | 1       | rock album      |
 *---------------------------*/

The following query searches the column called Description for a token called al at the start of tokens. If this token is found for singer ID 1, the matching rows are returned. Because there are no matches, no rows are returned.

SELECT
  a.AlbumId, a.Description
FROM
  Albums a
WHERE
  a.SingerId = 1 AND SEARCH_SUBSTRING(a.DescriptionSubstrTokens, 'al', relative_search_type=>'value_prefix');

/*---------------------------*
 | AlbumId | Description     |
 +---------------------------+
 |         |                 |
 *---------------------------*/
SNIPPET
SNIPPET(
  data_to_search,
  raw_search_query
  [, language_tag => value ]
  [, enhance_query => { TRUE | FALSE } ]
  [, max_snippet_width => value ]
  [, max_snippets => value ]
  [, content_type => value ]
)

Description

Gets a list of snippets that match a full-text search query.

Definitions

Details

Each snippet contains a matching substring of the data_to_search, and a list of highlights for the location of matching terms.

This function returns NULL when data_to_search or raw_search_query is NULL.

Return type

JSON

The JSON value has this format and definitions:

{
  "snippets":[
    {
      "highlights":[
        {
          "begin": json_number,
          "end": json_number
        },
      ],
      "snippet": json_string,
      "source_begin": json_number,
      "source_end": json_number
    }
  ]
}

Examples

The following query produces a single snippet, Rock albums rock. with two highlighted positions for the matching raw search query term, rock:

SELECT SNIPPET('Rock albums rock.', 'rock') AS Snippet;

/*--------------------------------------------------------------------------------------------------------------------------------------------------*
 | Snippet                                                                                                                                          |
 +--------------------------------------------------------------------------------------------------------------------------------------------------+
 | {"snippets":[{"highlights":[{"begin":"1","end":"5"},{"begin":"13","end":"17"}],"snippet":"Rock albums rock.","source_begin":1,"source_end":18}]} |
 *--------------------------------------------------------------------------------------------------------------------------------------------------*/

TOKEN
TOKEN(value_to_tokenize)

Description

Constructs an exact match TOKENLIST value by tokenizing a BYTE or STRING value verbatim to accelerate exact match expressions in SQL.

Definitions

Details

Return type

TOKENLIST

Examples

The Albums table contains a column called SingerNameToken and SongTitlesToken, which tokenizes the SingerName and SongTitles columns respectively using the TOKEN function. Finally, AlbumsIndex indexes SingerNameToken and SongTitlesToken, which makes it possible for Spanner to use the index to accelerate exact-match expressions in SQL.

CREATE TABLE Albums (
  SingerId INT64 NOT NULL,
  AlbumId INT64 NOT NULL,
  SingerName STRING(MAX),
  SingerNameToken TOKENLIST AS (TOKEN(SingerName)) HIDDEN,
  SongTitles ARRAY<STRING(MAX)>,
  SongTitlesToken TOKENLIST AS (TOKEN(SongTitles)) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(SingerNameToken, SongTitlesToken);

-- For example, the INSERT statement below generates SingerNameToken of
-- 'Catalina Smith', and SongTitlesToken of
-- ['Starting Again', 'The Second Title'].
INSERT INTO Albums (SingerId, AlbumId, SingerName, SongTitles)
  VALUES (1, 1, 'Catalina Smith', ['Starting Again', 'The Second Time']);

The following query finds the column SingerName is equal to Catalina Smith. The query optimizer could choose to accelerate the condition using AlbumsIndex with SingerNameToken. Optionally, the query can provide @{force_index = AlbumsIndex} to force the optimizer to use AlbumsIndex.

SELECT a.AlbumId
FROM Albums @{force_index = AlbumsIndex} a
WHERE a.SingerName = 'Catalina Smith';

/*---------*
 | AlbumId |
 +---------+
 | 1       |
 *---------*/

The following query is like the previous ones. However, this time the query searches for SongTitles that contain the string Starting Again. Array conditions should use ARRAY_INCLUDES, ARRAY_INCLUDES_ANY or ARRAY_INCLUDES_ALL functions to be eligible for using a search index for acceleration.

Note: Array column acceleration is only supported on nullable ARRAY columns.
SELECT a.AlbumId
FROM Albums a
WHERE ARRAY_INCLUDES(a.SongTitles, 'Starting Again');

/*---------*
 | AlbumId |
 +---------+
 | 1       |
 *---------*/
TOKENIZE_BOOL
TOKENIZE_BOOL(value_to_tokenize)

Description

Constructs a boolean TOKENLIST value by tokenizing a BOOL value to accelerate boolean match expressions in SQL.

Definitions

Details

Return type

TOKENLIST

Examples

The Albums table contains a column called IsAwardedToken, which tokenizes the IsAwarded column using TOKENIZE_BOOL function. Finally, AlbumsIndex indexes IsAwardedToken, which makes it possible for Spanner to use the index to accelerate boolean-match expressions in SQL.

CREATE TABLE Albums (
  SingerId INT64 NOT NULL,
  AlbumId INT64 NOT NULL,
  IsAwarded BOOL,
  IsAwardedToken TOKENLIST AS (TOKENIZE_BOOL(IsAwarded)) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(IsAwardedToken);

-- IsAwarded with TRUE generates IsAwardedToken with value 'y'.
INSERT INTO Albums (SingerId, AlbumId, IsAwarded) VALUES (1, 1, TRUE);

-- IsAwarded with FALSE generates IsAwardedToken with value 'n'.
INSERT INTO Albums (SingerId, AlbumId, IsAwarded) VALUES (1, 2, FALSE);

-- NULL IsAwarded generates IsAwardedToken with value NULL.
INSERT INTO Albums (SingerId, AlbumId) VALUES (1, 3);

The following query finds the column IsAwarded is equal to TRUE. The query optimizer could choose to accelerate the condition using AlbumsIndex with IsAwardedToken. Optionally, the query can provide @{force_index = AlbumsIndex} to force the optimizer to use AlbumsIndex.

SELECT a.AlbumId
FROM Albums @{force_index = AlbumsIndex} a
WHERE IsAwarded = TRUE;
TOKENIZE_FULLTEXT
TOKENIZE_FULLTEXT(
  value_to_tokenize
  [, language_tag => value ]
  [, content_type => { "text/plain" | "text/html" } ]
  [, token_category => { "small" | "medium" | "large" | "title" } ]
)

Description

Constructs a full-text TOKENLIST value by tokenizing text for full-text matching.

Definitions

Details

Return type

TOKENLIST

Examples

In the following example, a TOKENLIST column is created using the TOKENIZE_FULLTEXT function:

CREATE TABLE Albums (
  SingerId INT64 NOT NULL,
  AlbumId INT64 NOT NULL,
  Title STRING(MAX),
  Description STRING(MAX),
  DescriptionTokens TOKENLIST AS (TOKENIZE_FULLTEXT(Description)) HIDDEN,
  TitleTokens TOKENLIST AS (
    TOKENIZE_FULLTEXT(Title, token_category=>"title")) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

-- DescriptionTokens is generated from the Description value, using the
-- TOKENIZE_FULLTEXT function. For example, the following INSERT statement
-- generates DescriptionTokens with the tokens ['rock', 'album']. TitleTokens
-- will contain ['abbey', 'road'] and these tokens will be assigned to the
-- "title" token category.
INSERT INTO Albums (SingerId, AlbumId, Description) VALUES (1, 1, 'rock album');

-- Capitalization and delimiters are removed during tokenization. For example,
-- the following INSERT statement generates DescriptionTokens with the tokens
-- ['classical', 'albums'].
INSERT INTO Albums (SingerId, AlbumId, Description) VALUES (1, 1, 'Classical, Albums.');

To query a full-text TOKENLIST column, see the SEARCH function.

TOKENIZE_JSON
TOKENIZE_JSON(value_to_tokenize)

Description

Constructs a JSON TOKENLIST value by tokenizing a JSON value to accelerate JSON predicate matching in SQL.

Definitions

Details

Return type

TOKENLIST

Examples

The Albums table contains a column called MetadataTokens, which tokenizes the Metadata column using the TOKENIZE_JSON function. AlbumsIndex indexes MetadataToken, which makes it possible for Spanner to use the index to accelerate JSON predicate expressions in SQL.

CREATE TABLE Albums (
  SingerId INT64 NOT NULL,
  AlbumId INT64 NOT NULL,
  Metadata JSON,
  MetadataTokens TOKENLIST AS (TOKENIZE_JSON(Metadata)) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(MetadataTokens);

-- Albums can be stored with varying metadata.
INSERT INTO Albums (SingerId, AlbumId, Metadata)
  VALUES (1, 1, JSON '{"AvailableFormats": ["vinyl", "cd"]}'),
         (1, 2, JSON '{"ReissueDate": "1999-07-13", "MultiDiscCount": 2}'),
         (1, 3, JSON '{"RegionalReleases": [{"Region": "Japan", "ReleaseDate": "2025-01-05"}]}');

The following queries perform containment and existence checks on the Metadata column. The query optimizer might choose to accelerate these conditions using AlbumsIndex and MetadataTokens.

-- Query for albums available on vinyl.
SELECT a.AlbumId
FROM Albums a
WHERE JSON_CONTAINS(a.Metadata, JSON '{"AvailableFormats": ["vinyl"]}');

/*---------*
 | AlbumId |
 +---------+
 | 1       |
 *---------*/

-- Query for albums with a regional release in Japan.
SELECT a.AlbumId
FROM Albums a
WHERE JSON_CONTAINS(a.Metadata, JSON '{"RegionalReleases": [{"Region": "Japan"}]}');

/*---------*
 | AlbumId |
 +---------+
 | 3       |
 *---------*/

-- Query for reissued albums (those with a reissue date).
SELECT a.AlbumId
FROM Albums a
WHERE a.Metadata.ReissueDate IS NOT NULL;

/*---------*
 | AlbumId |
 +---------+
 | 2       |
 *---------*/
TOKENIZE_NGRAMS
TOKENIZE_NGRAMS(
  value_to_tokenize
  [, ngram_size_min => value ]
  [, ngram_size_max => value ]
  [, remove_diacritics => { TRUE | FALSE } ]
)

Description

Constructs an n-gram TOKENLIST value by tokenizing a STRING value for matching n-grams.

Definitions

Details

Return type

TOKENLIST

Examples

In the following example, a TOKENLIST column is created using the TOKENIZE_NGRAMS function. The INSERT generates a TOKENLIST which contains two sets of tokens. First, the whole string is broken up into n-grams with a length in the range [ngram_size_min, ngram_size_max-1]. Capitalization and whitespace are preserved in the n-grams. These n-grams are placed in the first position in the tokenlist.

[" ", " M", " Me", "vy ", "y ", "y M", H, He, Hea, Heav, ...], ...

Second, any n-grams with length equal to ngram_size_max are stored in sequence, with the first of these in the same position as the smaller n-grams. (In this example, the Heav token is in the first position.)

..., eavy, "avy ", "vy M", "y Me", " Met", Meta, etal

CREATE TABLE Albums (
  AlbumId INT64 NOT NULL,
  Description STRING(MAX),
  DescriptionNgramTokens TOKENLIST AS (TOKENIZE_NGRAMS(Description)) HIDDEN
) PRIMARY KEY (AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(DescriptionNgramTokens);

INSERT INTO Albums (AlbumId, Description) VALUES (1, 'Heavy Metal');

To query an n-gram TOKENLIST column, see the SEARCH_NGRAMS function.

TOKENIZE_NUMBER
TOKENIZE_NUMBER(
  value_to_tokenize,
  [, comparison_type => { "all" | "equality" } ]
  [, algorithm => { "logtree" | "prefixtree" | "floatingpoint" } ]
  [, min => value ]
  [, max => value ]
  [, granularity => value ]
  [, tree_base => value ]
  [, precision => value ]
)

Description

Constructs a numeric TOKENLIST value by tokenizing numeric values to accelerate numeric comparison expressions in SQL.

Definitions

Details

Return type

TOKENLIST

Examples

The Albums table contains a column called the RatingTokens, which tokenizes the Rating column using the TOKENIZE_NUMBER function. Finally, AlbumsIndex indexes RatingTokens, which makes it possible for Spanner to use the index to accelerate numeric comparison expressions in SQL.

CREATE TABLE Albums (
  SingerId INT64 NOT NULL,
  AlbumId INT64 NOT NULL,
  Rating INT64,
  RatingTokens TOKENLIST AS (TOKENIZE_NUMBER(Rating)) HIDDEN,
  TrackRating ARRAY<INT64>,
  TrackRatingTokens TOKENLIST AS (TOKENIZE_NUMBER(TrackRating)) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(RatingTokens, TrackRatingTokens);

-- RatingTokens and TrackRatingTokens are generated from Rating and TrackRating
-- values, respectively, using the TOKENIZE_NUMBER function.
INSERT INTO Albums (SingerId, AlbumId, Rating, TrackRating) VALUES (1, 1, 2, [2, 3]);
INSERT INTO Albums (SingerId, AlbumId, Rating, TrackRating) VALUES (1, 2, 5, [3, 5]);

The following query finds rows in which the column Rating is equal to 5. The query optimizer might choose to accelerate the condition using AlbumsIndex with RatingTokens. Optionally, the query can provide @{force_index = AlbumsIndex} to force the optimizer to use AlbumsIndex.

SELECT a.AlbumId
FROM Albums @{force_index = AlbumsIndex} a
WHERE a.Rating = 5;

/*---------*
 | AlbumId |
 +---------+
 | 2       |
 *---------*/

The following query is like the previous one. However, the condition is on the array column of TrackRating this time. Array conditions should use ARRAY_INCLUDES, ARRAY_INCLUDES_ANY or ARRAY_INCLUDES_ALL functions to be eligible for using a search index for acceleration.

SELECT a.AlbumId
FROM Albums a
WHERE ARRAY_INCLUDES_ALL(a.TrackRating, [2, 3]);

/*---------*
 | AlbumId |
 +---------+
 | 1       |
 *---------*/

SELECT a.AlbumId
FROM Albums a
WHERE ARRAY_INCLUDES_ANY(a.TrackRating, [3, 4, 5]);

/*---------*
 | AlbumId |
 +---------+
 | 1       |
 | 2       |
 *---------*/

The following query is like the previous ones. However, the condition is range this time. This query can also be accelerated, as default comparison_type is all which covers both equality and range comparisons.

SELECT a.AlbumId
FROM Albums a
WHERE a.Rating >= 2;

/*---------*
 | AlbumId |
 +---------+
 | 1       |
 | 2       |
 *---------*/
TOKENIZE_SUBSTRING
TOKENIZE_SUBSTRING(
  value_to_tokenize
  [, language_tag => value ]
  [, ngram_size_min => value ]
  [, ngram_size_max => value ]
  [, relative_search_types => value ]
  [, content_type => { "text/plain" | "text/html" } ]
  [, remove_diacritics => { TRUE | FALSE } ]
  [, short_tokens_only_for_anchors => {TRUE | FALSE } ]
)

Description

Constructs a substring TOKENLIST value, which tokenizes text for substring matching.

Definitions

Details

Return type

TOKENLIST

Example

In the following example, a TOKENLIST column is created using the TOKENIZE_SUBSTRING function. The INSERT generates a TOKENLIST which contains two sets of tokens. First, each word is broken up into lower-cased n-grams with a length in the range [ngram_size_min, ngram_size_max-1], and any whole words with a length shorter than that ngram_size_max. All of these tokens are placed in the first position in the tokenlist.

[a, al, av, avy, e, ea, eav, et, eta, h, he, hea, ...], ...

Second, any n-grams with length equal to ngram_size_max are stored in subsequent positions. These tokens are used when searching for words larger than the maximum n-gram size.

..., heav, eavy, <gap(1)>, meta, etal

CREATE TABLE Albums (
  SingerId INT64 NOT NULL,
  AlbumId INT64 NOT NULL,
  Description STRING(MAX),
  DescriptionSubstrTokens TOKENLIST
    AS (TOKENIZE_SUBSTRING(Description, ngram_size_min=>1, ngram_size_max=>4)) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

INSERT INTO Albums (SingerId, AlbumId, Description)
  VALUES (1, 1, 'Heavy Metal');

To query a substring TOKENLIST column, see the SEARCH_SUBSTRING or SEARCH_NGRAMS function.

TOKENLIST_CONCAT
TOKENLIST_CONCAT(value1 [, ...])

Description

Constructs a TOKENLIST value by concatenating one or more TOKENLIST values.

Details

Return type

TOKENLIST

Examples

In the following example, full-text TOKENLIST columns are created using the TOKENIZE_FULLTEXT function, then another full-text TOKENLIST column is created using the TOKENLIST_CONCAT function:

CREATE TABLE Albums (
  SingerId INT64 NOT NULL,
  AlbumId INT64 NOT NULL,
  SingerName STRING(MAX),
  SingerNameTokens TOKENLIST AS (TOKENIZE_FULLTEXT(SingerName)) HIDDEN,
  AlbumName STRING(MAX),
  AlbumNameTokens TOKENLIST AS (TOKENIZE_FULLTEXT(AlbumName)) HIDDEN,
  SingerOrAlbumNameTokens TOKENLIST AS (TOKENLIST_CONCAT([SingerNameTokens, AlbumNameTokens])) HIDDEN
) PRIMARY KEY (SingerId, AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(SingerNameTokens, AlbumNameTokens, SingerOrAlbumNameTokens);

-- The INSERT statement below generates SingerOrAlbumNameTokens by concatenating
-- all the tokens in SingerNameTokens and AlbumNameTokens.
INSERT INTO Albums (SingerId, AlbumId, SingerName, AlbumName) VALUES (1, 1, 'Alice Trentor', 'Go Go Go');
INSERT INTO Albums (SingerId, AlbumId, SingerName, AlbumName) VALUES (2, 1, 'Catalina Smith', 'Alice Wonderland');

The following query searches for a token alice in the SingerOrAlbumNameColumnTokens. The rows that match alice in either SingerNameTokens or AlbumNameTokens are returned.

SELECT a.SingerId, a.AlbumId
FROM Albums a
WHERE SEARCH(a.SingerOrAlbumNameTokens, 'alice');

/*--------------------*
 | SingerId | AlbumId |
 +--------------------+
 | 2        | 1       |
 | 1        | 1       |
 *--------------------*/

The following query is like the previous one. However, TOKENLIST_CONCAT is called directly inside of a SEARCH function this time.

SELECT a.SingerId, a.AlbumId
FROM Albums a
WHERE SEARCH(TOKENLIST_CONCAT([a.SingerNameTokens, a.AlbumNameTokens]), 'alice');

/*--------------------*
 | SingerId | AlbumId |
 +--------------------+
 | 2        | 1       |
 | 1        | 1       |
 *--------------------*/

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-12 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-12 UTC."],[],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4