Stay organized with collections Save and categorize content based on your preferences.
GoogleSQL for BigQuery supports the following search functions.
Function list Name SummarySEARCH
Checks to see whether a table or other search data contains a set of search terms. VECTOR_SEARCH
Performs a vector search on embeddings to find semantically similar entities. SEARCH
SEARCH(
data_to_search, search_query
[, json_scope => { 'JSON_VALUES' | 'JSON_KEYS' | 'JSON_KEYS_AND_VALUES' } ]
[, analyzer => { 'LOG_ANALYZER' | 'NO_OP_ANALYZER' | 'PATTERN_ANALYZER'} ]
[, analyzer_options => analyzer_options_values ]
)
Description
The SEARCH
function checks to see whether a BigQuery table or other search data contains a set of search terms (tokens). It returns TRUE
if all search terms appear in the data, based on the rules for search_query and text analysis described in the text analyzer. Otherwise, this function returns FALSE
.
Definitions
data_to_search
: The data to search over. The value can be:
A table reference is evaluated as a STRUCT
whose fields are the columns of the table. data_to_search
can be any type, but SEARCH
will return FALSE
for all types except those listed here:
ARRAY<STRING>
ARRAY<STRUCT>
JSON
STRING
STRUCT
You can search for string literals in columns of the preceding types. For additional rules, see Search data rules.
search_query
: A STRING
literal, or a STRING
constant expression that represents the terms of the search query. If search_query
is NULL
, an error is returned. If search_query
produces no search tokens, and the text analyzer is LOG_ANALYZER
or PATTERN_ANALYZER
, an error is produced.json_scope
: A named argument with a STRING
value. Takes one of the following values to indicate the scope of JSON data to be searched. It has no effect if data_to_search
isn't a JSON value or doesn't contain a JSON field.
'JSON_VALUES'
(default): Only the JSON values are searched. If json_scope
isn't provided, this is used by default.
'JSON_KEYS'
: Only the JSON keys are searched.
'JSON_KEYS_AND_VALUES'
: The JSON keys and values are searched.
analyzer
: A named argument with a STRING
value. Takes one of the following values to indicate the text analyzer to use:
'LOG_ANALYZER'
(default): Breaks the input into tokens when delimiters are encountered and then normalizes the tokens. For more information, see LOG_ANALYZER
.
'NO_OP_ANALYZER'
: Extracts the text as a single token, but doesn't apply normalization. For more information about this analyzer, see NO_OP_ANALYZER
.
'PATTERN_ANALYZER'
: Breaks the input into tokens that match a regular expression. For more information, see PATTERN_ANALYZER
text analyzer.
analyzer_options
: A named argument with a JSON-formatted STRING
value. Takes a list of text analysis rules. For more information, see Text analyzer options.
Details
The SEARCH
function is designed to work with search indexes to optimize point lookups. Although the SEARCH
function works for tables that aren't indexed, its performance will be greatly improved with a search index. If both the analyzer and analyzer options match the one used to create the index, the search index will be used.
Rules for search_query
The 'NO_OP_ANALYZER'
extracts the search query as a single token without parsing it. The following rules apply only when using the 'LOG_ANALYZER'
or 'PATTERN_ANALYZER'
.
A search query is a set of one or more terms that are combined using the logical operators AND
and OR
along with parenthesis. Any whitespace in the search query that is not in a phrase or backtick term is considered an (implicit) AND
. First, a search query is broken down into terms using logical operators and parenthesis in the search query. Then, each term is evaluated based on whether or not it appears in the data to search. The final outcome of the SEARCH
function is the result of the logical expression represented by the search query.
The following grammar is used to transform the search query into a logical expression of terms. The grammar is defined using the ANTLR meta-language:
query_string : expression EOF;
expression : '(' expression ')'
| expression 'AND' expression
| expression '\s' expression
| expression 'OR' expression
| term
;
term : single_term
| phrase_term
| backtick_term
;
backtick_term : '`' ( '\`' | ~[`] )+ '`';
phrase_term : '"' ( '\"' | ~["] )+ '"';
single_term : ( '\' reserved_char | ~[reserved_char] )+;
To evaluate each term, it is further broken down into zero or more searchable tokens based on the text analyzer. The following section contains the rules for how different types of terms are analyzed and evaluated.
Rules for backtick_term
in search_query
:
If the LOG_ANALYZER
text analyzer is used, text enclosed in backticks forces an exact match.
For example, `Hello World` happy days
becomes Hello World
, happy
, and days
.
Search terms enclosed in backticks must match exactly in data_to_search
, subject to the following conditions:
It appears at the start of data_to_search
or is immediately preceded by a delimiter.
It appears at the end of data_to_search
or is immediately followed by a delimiter.
For example, SEARCH('foo.bar', '`foo.`')
returns FALSE
because the text enclosed in the backticks foo.
is immediately followed by the character b
in the search data foo.bar
, rather than by a delimiter or the end of the string. However, SEARCH('foo..bar', '`foo.`')
returns TRUE
because foo.
is immediately followed by the delimiter .
in the search data.
Search terms enclosed in backticks must match case exactly, regardless of any normalization settings in analyzer_options
.
For example:
-- FALSE because backticks require an exact match, including capitalization
SELECT
SEARCH( 'Hello-world', '`WORLD`',
analyzer=>'LOG_ANALYZER',
analyzer_options=>'''
{
"token_filters": [
{
"normalizer": {"mode": "LOWER"}
}
]
}'''
) AS results
The backtick itself can be escaped using a backslash, as in \`foobar\`
.
The following are reserved words and must be enclosed in backticks:
AND
, NOT
, OR
, IN
, and NEAR
Rules for reserved_char
in search_query
:
Text not enclosed in backticks requires the following reserved characters to be escaped by a double backslash \\
:
[ ] < > ( ) { } | ! ' " * & ? + / : = - \ ~ ^
If the quoted string is preceded by the character r
or R
, such as r"my\+string"
, then it's treated as a raw string and only a single backslash is required to escape the reserved characters. For more information about raw strings and escape sequences, see String and byte literals.
Rules for phrase_term
in search_query
:
analyzer
is LOG_ANALYZER
, PATTERN_ANALYZER
, or not set (LOG_ANALYZER
by default), the term represents a phrase."foo baz.bar"
, the analyzer called LOG_ANALYZER
generates the phrase-specific tokens foo
, baz
, and bar
.The order of terms in a phrase matters. A match is only returned if the tokens that were produced for the phrase are next to each other and in the same order as the tokens for data_to_search
.
For example:
-- FALSE because 'foo' and 'bar' aren't next to each other in
-- 'foo baz.bar'.
SEARCH('foo baz.bar', '"foo bar"')
-- TRUE because 'foo' and 'baz' are next to each other in
-- 'foo baz.bar'.
SEARCH('foo baz.bar', '"foo baz"')
A single quote inside of the phrase is analyzed as a special character.
An escaped double quote (double quote after a backslash) is analyzed as a double quote character.
How data_to_search
is broken into searchable tokens
The following table shows how data_to_search
is broken into searchable tokens by the LOG_ANALYZER
text analyzer. All entries are strings.
How search_query
is broken into query terms
The following table shows how search_query
is broken into query terms by the LOG_ANALYZER
text analyzer. All entries are strings.
Rules for data_to_search
General rules for data_to_search
:
data_to_search
must contain all tokens produced for search_query
for the function to return TRUE
.data_to_search
must be a STRUCT
, ARRAY
, or JSON
data type.STRING
field in a compound data type is individually searched for terms.If at least one field in data_to_search
includes all search terms produced by search_query
, SEARCH
returns TRUE
. Otherwise it has the following behavior:
If at least one STRING
field is NULL
, SEARCH
returns NULL
.
Otherwise, SEARCH
returns FALSE
.
Return type
BOOL
Examples
The following queries show how tokens in search_query
are analyzed by a SEARCH
function call using the default analyzer, LOG_ANALYZER
:
SELECT
-- ERROR: `search_query` is NULL.
SEARCH('foobarexample', NULL) AS a,
-- ERROR: `search_query` contains no tokens.
SEARCH('foobarexample', '') AS b,
SELECT
-- TRUE: '-' and ' ' are delimiters.
SEARCH('foobar-example', 'foobar example') AS a,
-- TRUE: The search query is a constant expression evaluated to 'foobar'.
SEARCH('foobar-example', CONCAT('foo', 'bar')) AS b,
-- FALSE: The search_query isn't split.
SEARCH('foobar-example', 'foobarexample') AS c,
-- TRUE: The double backslash escapes the ampersand which is a delimiter.
SEARCH('foobar-example', 'foobar\\&example') AS d,
-- TRUE: The single backslash escapes the ampersand in a raw string.
SEARCH('foobar-example', R'foobar\&example')AS e,
-- FALSE: The backticks indicate that there must be an exact match for
-- foobar&example.
SEARCH('foobar-example', '`foobar&example`') AS f,
-- TRUE: An exact match is found.
SEARCH('foobar&example', '`foobar&example`') AS g
/*-------+-------+-------+-------+-------+-------+-------*
| a | b | c | d | e | f | g |
+-------+-------+-------+-------+-------+-------+-------+
| true | true | false | true | true | false | true |
*-------+-------+-------+-------+-------+-------+-------*/
SELECT
-- TRUE: The order of terms doesn't matter.
SEARCH('foobar-example', 'example foobar') AS a,
-- TRUE: Tokens are made lower-case.
SEARCH('foobar-example', 'Foobar Example') AS b,
-- TRUE: An exact match is found.
SEARCH('foobar-example', '`foobar-example`') AS c,
-- FALSE: Backticks preserve capitalization.
SEARCH('foobar-example', '`Foobar`') AS d,
-- FALSE: Backticks don't have special meaning for search_data and are
-- not delimiters in the default LOG_ANALYZER.
SEARCH('`foobar-example`', '`foobar-example`') AS e,
-- TRUE: An exact match is found after the delimiter in search_data.
SEARCH('foobar@example.com', '`example.com`') AS f,
-- TRUE: An exact match is found between the space delimiters.
SEARCH('a foobar-example b', '`foobar-example`') AS g;
/*-------+-------+-------+-------+-------+-------+-------*
| a | b | c | d | e | f | g |
+-------+-------+-------+-------+-------+-------+-------+
| true | true | true | false | false | true | true |
*-------+-------+-------+-------+-------+-------+-------*/
SELECT
-- FALSE: No single array entry matches all search terms.
SEARCH(['foobar', 'example'], 'foobar example') AS a,
-- FALSE: The search query is equivalent to foobar\\=.
SEARCH('foobar=', '`foobar\\=`') AS b,
-- FALSE: This is equivalent to the previous example.
SEARCH('foobar=', R'`\foobar=`') AS c,
-- TRUE: The equals sign is a delimiter in the data and query.
SEARCH('foobar=', 'foobar\\=') AS d,
-- TRUE: This is equivalent to the previous example.
SEARCH('foobar=', R'foobar\=') AS e,
-- TRUE: An exact match is found.
SEARCH('foobar.example', '`foobar`') AS f,
-- FALSE: `foobar.\` isn't analyzed because of backticks; it isn't
-- followed by a delimiter in search_data 'foobar.example'.
SEARCH('foobar.example', '`foobar.\`') AS g,
-- TRUE: `foobar.` isn't analyzed because of backticks; it is
-- followed by the delimiter '.' in search_data 'foobar..example'.
SEARCH('foobar..example', '`foobar.`') AS h;
/*-------+-------+-------+-------+-------+-------+-------+-------*
| a | b | c | d | e | f | g | h |
+-------+-------+-------+-------+-------+-------+-------+-------+
| false | false | false | true | true | true | false | true |
*-------+-------+-------+-------+-------+-------+-------+-------*/
The following queries show how logical expression can be used in search_query
to perform a SEARCH
function call:
SELECT
-- TRUE: A whitespace is an implicit AND.
-- Both `foo` and `bar` are in `foo bar baz`.
SEARCH(R'foo bar baz', R'foo bar') AS a,
-- TRUE: Similar to previous case
-- `foo` and `bar` are in `foo bar baz`.
SEARCH(R'foo bar baz', R'foo AND bar') AS b,
-- TRUE: Only one of `foo` or `bar` should be in `foo`.
SEARCH(R'foo', R'foo OR bar') AS c,
-- TRUE: `foo` and one of `bar` or `baz` should be in `foo bar`.
SEARCH(R'foo bar', R'"foo AND (bar OR baz)"') AS d,
-- FALSE: Neither `bar` or `baz` are in `foo`.
SEARCH(R'foo', R'foo AND (bar OR baz)') AS c,
/*-------+-------+-------+-------+-------+
| a | b | c | d | e |
+-------+-------+-------+-------+-------+
| true | true | true | true | false |
*-------+-------+-------+-------+-------+/
The following queries show how phrases in search_query
are analyzed by a SEARCH
function call:
SELECT
-- TRUE: The phrase `foo bar` is in `foo bar baz`.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are `foo` and `bar`
-- and because they appear in that exact order in `data_to_search`,
-- the function returns TRUE.
SEARCH(R'foo bar baz', R'"foo bar"') AS a,
-- TRUE: Case is ignored.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are `foo` and `bar`
-- and because they appear in that exact order in `data_to_search`,
-- the function return TRUE.
SEARCH(R'Foo bar baz', R'"foo Bar"') AS b,
-- TRUE: Both `-` and `&` are delimiters used during tokenization.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are `foo` and `bar`
-- and because they appear in that exact order in `data_to_search`,
-- the function returns TRUE.
SEARCH(R'foo-bar baz', R'"foo&bar"') AS c,
-- FALSE: Backticks in a phrase are treated as normal characters.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are:
-- `foo
-- bar`
-- Because these searchable tokens don't appear in `data_to_search`,
-- the function returns FALSE.
SEARCH(R'foo bar baz', R'"`foo bar`"') AS d,
-- FALSE: `foo bar` isn't in `foo else bar`.
-- The tokens in `data_to_search` are `foo`, `else`, and `bar`.
-- The searchable tokens in `query_string` are `foo` and `bar`.
-- Even though they appear in `data_to_search`, but because they
-- don't appear in that exact order (`foo` before `bar`),
-- the function returns FALSE.
SEARCH(R'foo else bar', R'"foo bar"') AS e,
-- FALSE: `foo baz` isn't in `foo bar baz`.
-- The `search_query` produces two terms. The first term is `bar`, which
-- matches with the similar token in `data_to_search`. However, the second
-- term is the phrase "foo&baz" with two tokens, `foo` and `baz`. Because
-- `foo` and `baz` don't appear next to each other in `data_to_search`
-- (`bar` is in between), the function returns FALSE.
SEARCH(R'foo-bar-baz', R'bar "foo&baz"') AS f;
/*-------+-------+-------+-------+-------+-------*
| a | b | c | d | e | f |
+-------+-------+-------+-------+-------+-------+
| true | true | false | false | false | false |
*-------+-------+-------+-------+-------+-------*/
SELECT
-- FALSE: Only double quotes need to be escaped in a phrase.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are `foo\` and `bar` and they
-- must appear in that exact order in `data_to_search`, but don't.
SEARCH(
R'foo bar baz',
R'"foo\ bar"',
analyzer_options=>'{"delimiters": [" "]}') AS a,
-- TRUE: `foo bar` is in `foo bar baz` after tokenization with the given
-- delimiters.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are `foo` and `bar` and they
-- must appear in that exact order in `data_to_search`.
SEARCH(
R'foo bar baz',
R'"foo? bar"',
analyzer_options=>'{"delimiters": [" ", "?"]}') AS b,
-- TRUE: `read book` is in `read book now` after `the` is ignored.
-- The tokens in `data_to_search` are `read`, `book`, and `now`.
-- The searchable tokens in `query_string` are `read` and `book` and they
-- must appear in that exact order in `data_to_search`.
SEARCH(
'read the book now',
R'"read the book"',
analyzer_options => '{ "token_filters": [{"stop_words": ["the"]}] }') AS c,
-- FALSE: `c d` isn't in `a`, `b`, `cd`, `e` or `f` after tokenization with
-- the given pattern.
-- The tokens in `data_to_search` are `a`, `b`, `cd`, `e` and `f`.
-- The searchable tokens in `query_string` are `c` and `d` and they
-- must appear in that exact order in `data_to_search`. `data_to_search`
-- contains a `cd` token, but not a `c` or `d` token.
SEARCH(
R'abcdef',
R'"c d"',
analyzer=>'PATTERN_ANALYZER',
analyzer_options=>'{"patterns": ["(?:cd)|[a-z]"]}') AS d,
-- TRUE: `ant apple` is in `ant apple avocado` after tokenization with
-- the given pattern.
-- The tokens in `data_to_search` are `ant`, `apple`, and `avocado`.
-- The searchable tokens in `query_string` are `ant` and `apple` and they
-- must appear in that exact order in `data_to_search`.
SEARCH(
R'ant orange apple avocado',
R'"ant apple"',
analyzer=>'PATTERN_ANALYZER',
analyzer_options=>'{"patterns": ["a[a-z]"]}') AS e;
/*-------+-------+-------+-------+-------*
| a | b | c | d | e |
+-------+-------+-------+-------+-------+
| false | true | true | false | true |
*-------+-------+-------+-------+-------*/
The following query shows examples of calls to the SEARCH
function using the NO_OP_ANALYZER
text analyzer and reasons for various return values:
SELECT
-- TRUE: exact match
SEARCH('foobar', 'foobar', analyzer=>'NO_OP_ANALYZER') AS a,
-- FALSE: Backticks aren't special characters for `NO_OP_ANALYZER`.
SEARCH('foobar', '\`foobar\`', analyzer=>'NO_OP_ANALYZER') AS b,
-- FALSE: The capitalization doesn't match.
SEARCH('foobar', 'Foobar', analyzer=>'NO_OP_ANALYZER') AS c,
-- FALSE: There are no delimiters for `NO_OP_ANALYZER`.
SEARCH('foobar example', 'foobar', analyzer=>'NO_OP_ANALYZER') AS d,
-- TRUE: An exact match is found.
SEARCH('', '', analyzer=>'NO_OP_ANALYZER') AS e,
-- FALSE: 'foo bar' and "foo bar" aren't considered an exact match.
SEARCH( R'foo bar baz', R'"foo bar"', analyzer=>'NO_OP_ANALYZER') AS f,
-- TRUE: "foo bar" and "foo Bar" are considered an exact match because the
-- analysis is case-insensitive.
SEARCH( R'"foo bar"', R'"foo Bar"', analyzer=>'NO_OP_ANALYZER') AS g;
-- FALSE: With NO_OP_ANALYZER the query string is analyzed as "foo OR bar"
-- which is not an exact match with "foo".
SEARCH( R'foo', R'foo OR bar', analyzer=>'NO_OP_ANALYZER') AS h;
/*-------+-------+-------+-------+-------+-------+-------+-------*
| a | b | c | d | e | f | g | h |
+-------+-------+-------+-------+-------+-------+-------+-------+
| true | false | false | false | true | false | true | false |
*-------+-------+-------+-------+-------+-------+-------+-------*/
Consider the following table called meals
with columns breakfast
, lunch
, and dinner
:
/*-------------------+-------------------------+------------------*
| breakfast | lunch | dinner |
+-------------------+-------------------------+------------------+
| Potato pancakes | Toasted cheese sandwich | Beef soup |
| Avocado toast | Tomato soup | Chicken soup |
*-------------------+-------------------------+------------------*/
The following query shows how to search single columns, multiple columns, and whole tables, using the default LOG_ANALYZER
text analyzer with the default analyzer options:
WITH
meals AS (
SELECT
'Potato pancakes' AS breakfast,
'Toasted cheese sandwich' AS lunch,
'Beef soup' AS dinner
UNION ALL
SELECT
'Avocado toast' AS breakfast,
'Tomato soup' AS lunch,
'Chicken soup' AS dinner
)
SELECT
SEARCH(lunch, 'soup') AS lunch_soup,
SEARCH((breakfast, dinner), 'soup') AS breakfast_or_dinner_soup,
SEARCH(meals, 'soup') AS anytime_soup
FROM meals;
/*------------+--------------------------+--------------*
| lunch_soup | breakfast_or_dinner_soup | anytime_soup |
+------------+--------------------------+--------------+
| false | true | true |
| true | true | true |
*------------+--------------------------+--------------*/
The following query shows additional ways to search, using the default LOG_ANALYZER
text analyzer with default analyzer options:
WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
SEARCH(email, 'exam') AS a,
SEARCH(email, 'foobar') AS b,
SEARCH(email, 'example.com') AS c,
SEARCH(email, R'"please use"') AS d,
SEARCH(email, R'"as email"') AS e
FROM data;
/*-------+-------+-------+-------+-------*
| a | b | c | d | e |
+-------+-------+-------+-------+-------+
| false | true | true | true | false |
*-------+-------+-------+-------+-------*/
The following query shows additional ways to search, using the default LOG_ANALYZER
text analyzer with custom analyzer options. Terms are only split when a space or @
symbol is encountered.
WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
SEARCH(email, 'foobar', analyzer_options=>'{"delimiters": [" ", "@"]}') AS a,
SEARCH(email, 'example', analyzer_options=>'{"delimiters": [" ", "@"]}') AS b,
SEARCH(email, 'example.com', analyzer_options=>'{"delimiters": [" ", "@"]}') AS c,
SEARCH(email, 'foobar@example.com', analyzer_options=>'{"delimiters": [" ", "@"]}') AS d,
SEARCH(email, R'use "foobar example.com" "as your"', analyzer_options=>'{"delimiters": [" ", "@"]}') AS e
FROM data;
/*-------+-------+-------+-------+-------*
| a | b | c | d | e |
+-------+-------+-------+-------+-------+
| true | false | true | true | true |
*-------+-------+-------+-------+-------*/
The following query shows how to search, using the NO_OP_ANALYZER
text analyzer:
WITH meals AS ( SELECT 'Tomato soup' AS lunch )
SELECT
SEARCH(lunch, 'Tomato soup', analyzer=>'NO_OP_ANALYZER') AS a,
SEARCH(lunch, 'soup', analyzer=>'NO_OP_ANALYZER') AS b,
SEARCH(lunch, 'tomato soup', analyzer=>'NO_OP_ANALYZER') AS c,
SEARCH(lunch, R'"Tomato soup"', analyzer=>'NO_OP_ANALYZER') AS d
FROM meals;
/*-------+-------+-------+-------*
| a | b | c | d |
+-------+-------+-------+-------+
| true | false | false | false |
*-------+-------+-------+-------*/
The following query shows how to use the PATTERN_ANALYZER
text analyzer with default analyzer options:
WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
SEARCH(email, 'exam', analyzer=>'PATTERN_ANALYZER') AS a,
SEARCH(email, 'foobar', analyzer=>'PATTERN_ANALYZER') AS b,
SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER') AS c,
SEARCH(email, R'foobar "EXAMPLE.com as" email', analyzer=>'PATTERN_ANALYZER') AS d
FROM data;
/*-------+-------+-------+-------*
| a | b | c | d |
+-------+-------+-------+-------+
| false | true | true | true |
*-------+-------+-------+-------*/
The following query shows additional ways to search, using the PATTERN_ANALYZER
text analyzer with custom analyzer options:
WITH data AS ( SELECT 'Please use foobar@EXAMPLE.com as your email.' AS email )
SELECT
SEARCH(email, 'EXAMPLE', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[A-Z]*"]}') AS a,
SEARCH(email, 'example', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-z]*"]}') AS b,
SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-z]*"]}') AS c,
SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-zA-Z.]*"]}') AS d
FROM data;
/*-------+-------+-------+-------+-------*
| a | b | c | d | e |
+-------+-------+-------+-------+-------+
| true | false | false | true | false |
*-------+-------+-------+-------+-------*/
For additional examples that include analyzer options, see the Text analysis reference guide.
For helpful analyzer recipes that you can use to enhance analyzer-supported queries, see the Search with text analyzers user guide.
VECTOR_SEARCH
VECTOR_SEARCH(
{ TABLE base_table | (base_table_query) },
column_to_search,
{ TABLE query_table | (query_table_query) },
[, query_column_to_search => query_column_to_search_value]
[, top_k => top_k_value ]
[, distance_type => distance_type_value ]
[, options => options_value ]
)
Description
The VECTOR_SEARCH
function lets you search embeddings to find semantically similar entities.
Embeddings are high-dimensional numerical vectors that represent a given entity, like a piece of text or an audio file. Machine learning (ML) models use embeddings to encode semantics about such entities to make it easier to reason about and compare them. For example, a common operation in clustering, classification, and recommendation models is to measure the distance between vectors in an embedding space to find items that are most semantically similar.
Definitions
base_table
: The table to search for nearest neighbor embeddings.base_table_query
: A query that you can use to pre-filter the base table. Only SELECT
, FROM
, and WHERE
clauses are allowed in this query. Don't apply any filters to the embedding column. You can't use logical views in this query. Using a subquery might interfere with index usage or cause your query to fail. If the base table is indexed and the WHERE
clause contains columns that are not stored in the index, then VECTOR_SEARCH
post-filters on those columns instead. To learn more, see Store columns and pre-filter.column_to_search
: The name of the base table column to search for nearest neighbor embeddings. The column must have a type of ARRAY<FLOAT64>
. All elements in the array must be non-NULL
, and all values in the column must have the same array dimensions. If the column has a vector index, BigQuery attempts to use it. To determine if an index was used in the vector search, see Vector index usage.query_table
: The table that provides the embeddings for which to find nearest neighbors. All columns are passed through as output columns.query_table_query
: A query that provides the embeddings for which to find nearest neighbors. All columns are passed through as output columns.query_column_to_search
: A named argument with a STRING
value. query_column_to_search_value
specifies the name of the column in the query table or statement that contains the embeddings for which to find nearest neighbors. The column must have a type of ARRAY<FLOAT64>
. All elements in the array must be non-NULL
and all values in the column must have the same array dimensions as the values in the column_to_search
column. If you don't specify query_column_to_search_value
, the function uses the column_to_search
value or picks the most appropriate column.top_k
: A named argument with an INT64
value. top_k_value
specifies the number of nearest neighbors to return. The default is 10
. A negative value is treated as infinity, meaning that all values are counted as neighbors and returned.distance_type
: A named argument with a STRING
value. distance_type_value
specifies the type of metric to use to compute the distance between two vectors. Supported distance types are EUCLIDEAN
, COSINE
, and DOT_PRODUCT
. The default is EUCLIDEAN
.
If you don't specify distance_type_value
and the column_to_search
column has a vector index that's used, VECTOR_SEARCH
uses the distance type specified in the distance_type
option of the CREATE VECTOR INDEX
statement.
options
: A named argument with a JSON-formatted STRING
value. options_value
is a literal that specifies the following vector search options:
fraction_lists_to_search
: A JSON number that specifies the percentage of lists to search. For example, options => '{"fraction_lists_to_search":0.15}'
. The fraction_lists_to_search
value must be in the range 0.0
to 1.0
, exclusive.
Specifying a higher percentage leads to higher recall and slower performance, and the converse is true when specifying a lower percentage.
fraction_lists_to_search
is only used when a vector index is also used. If you don't specify a fraction_lists_to_search
value but an index is matched, an appropriate value is picked.
The number of available lists to search is determined by the num_lists
option in the ivf_options
option or derived from the leaf_node_embedding_count
option in the tree_ah_options
option of the CREATE VECTOR INDEX
statement if specified. Otherwise, BigQuery calculates an appropriate number.
You can't specify fraction_lists_to_search
when use_brute_force
is set to true
.
use_brute_force
: A JSON boolean that determines whether to use brute force search by skipping the vector index if one is available. For example, options => '{"use_brute_force":true}'
. The default is false
. If you specify use_brute_force=false
and there is no useable vector index available, brute force is used anyway.
options
defaults to '{}'
to denote that all underlying options use their corresponding default values.
Details
You can optionally use VECTOR_SEARCH
with a vector index. When a vector index is used, VECTOR_SEARCH
uses the Approximate Nearest Neighbor search technique to help improve vector search performance, with the trade-off of reducing recall and so returning more approximate results. When a base table is large, the use of an index typically improves performance without significantly sacrificing recall. Brute force is used to return exact results when a vector index isn't available, and you can choose to use brute force to get exact results even when a vector index is available.
Output
For each row in the query data, the output contains multiple rows from the base table that satisfy the search criteria. The number of results rows per query table row is either 10 or the top_k
value if it's specified. The order of the output isn't guaranteed.
The output includes the following columns:
query
: A STRUCT
value that contains all selected columns from the query data.base
: A STRUCT
value that contains all columns from base_table
or a subset of the columns from base_table
that you selected in the base_table_query
query.distance
: A FLOAT64
value that represents the distance between the base data and the query data.Limitations
BigQuery data security and governance rules apply to the use of VECTOR_SEARCH
, which results in the following behavior:
VECTOR_SEARCH
applies the row-level access policies to the query results.VECTOR_SEARCH
succeeds only if the user running the query has the Fine-Grained Reader
role on the policy tags that are used. Otherwise, VECTOR_SEARCH
fails with an invalid query error.If any base table column or any column in the query table or statement has column-level security policies and you don't have appropriate permissions to access the column, VECTOR_SEARCH
fails with a permission denied error.
The project that runs the query containing VECTOR_SEARCH
must match the project that contains the base table.
Examples
The following queries create test tables base_table
and query_table
to use in subsequent query examples :
CREATE OR REPLACE TABLE mydataset.base_table
(
id INT64,
my_embedding ARRAY<FLOAT64>
);
INSERT mydataset.base_table (id, my_embedding)
VALUES(1, [1.0, 2.0]),
(2, [2.0, 4.0]),
(3, [1.5, 7.0]),
(4, [1.0, 3.2]),
(5, [5.0, 5.4]),
(6, [3.7, 1.8]),
(7, [4.4, 2.9]);
CREATE OR REPLACE TABLE mydataset.query_table
(
query_id STRING,
embedding ARRAY<FLOAT64>
);
INSERT mydataset.query_table (query_id, embedding)
VALUES('dog', [1.0, 2.0]),
('cat', [3.0, 5.2]);
The following example searches the my_embedding
column of base_table
for the top two embeddings that match each row of data in the embedding
column of query_table
:
SELECT *
FROM
VECTOR_SEARCH(
TABLE mydataset.base_table,
'my_embedding',
(SELECT query_id, embedding FROM mydataset.query_table),
'embedding',
top_k => 2);
/*----------------+-----------------+---------+----------------------------------------*
| query.query_id | query.embedding | base.id | base.my_embedding | distance |
+----------------+-----------------+---------+-------------------+--------------------+
| dog | 1.0 | 1 | 1.0 | 0 |
| | 2.0 | | 2.0 | |
+----------------+-----------------+---------+-------------------+--------------------+
| dog | 1.0 | 4 | 1.0 | 1.2000000000000002 |
| | 2.0 | | 3.2 | |
+----------------+-----------------+---------+-------------------+--------------------+
| cat | 3.0 | 2 | 2.0 | 1.5620499351813311 |
| | 5.2 | | 4.0 | |
+----------------+-----------------+---------+-------------------+--------------------+
| cat | 3.0 | 5 | 5.0 | 2.0099751242241779 |
| | 5.2 | | 5.4 | |
*----------------+-----------------+---------+-------------------+--------------------*/
The following example pre-filters base_table
to rows where id
isn't equal to 4 and then searches the my_embedding
column of base_table
for the top two embeddings that match each row of data in the embedding
column of query_table
.
SELECT *
FROM
VECTOR_SEARCH(
(SELECT * FROM mydataset.base_table WHERE id != 4),
'my_embedding',
(SELECT query_id, embedding FROM mydataset.query_table),
'embedding',
top_k => 2,
options => '{"use_brute_force":true}');
/*----------------+-----------------+---------+----------------------------------------*
| query.query_id | query.embedding | base.id | base.my_embedding | distance |
+----------------+-----------------+---------+-------------------+--------------------+
| dog | 1.0 | 1 | 1.0 | 0 |
| | 2.0 | | 2.0 | |
+----------------+-----------------+---------+-------------------+--------------------+
| dog | 1.0 | 2 | 2.0 | 2.23606797749979 |
| | 2.0 | | 4.0 | |
+----------------+-----------------+---------+-------------------+--------------------+
| cat | 3.0 | 2 | 2.0 | 1.5620499351813311 |
| | 5.2 | | 4.0 | |
+----------------+-----------------+---------+-------------------+--------------------+
| cat | 3.0 | 5 | 5.0 | 2.0099751242241779 |
| | 5.2 | | 5.4 | |
*----------------+-----------------+---------+-------------------+--------------------*/
The following example searches the my_embedding
column of base_table
for the top two embeddings that match each row of data in the embedding
column of query_table
, and uses the COSINE
distance type to measure the distance between the embeddings:
SELECT *
FROM
VECTOR_SEARCH(
TABLE mydataset.base_table,
'my_embedding',
TABLE mydataset.query_table,
'embedding',
top_k => 2,
distance_type => 'COSINE');
/*----------------+-----------------+---------+-------------------------------------------+
| query.query_id | query.embedding | base.id | base.my_embedding | distance |
+----------------+-----------------+---------+-------------------+-----------------------+
| dog | 1.0 | 2 | 2.0 | 0 |
| | 2.0 | | 4.0 | |
+----------------+-----------------+---------+-------------------+-----------------------+
| dog | 1.0 | 1 | 1.0 | 0 |
| | 2.0 | | 2.0 | |
+----------------+-----------------+---------+-------------------+-----------------------+
| cat | 3.0 | 2 | 2.0 | 0.0017773842088002478 |
| | 5.2 | | 4.0 | |
+----------------+-----------------+---------+-------------------+-----------------------+
| cat | 3.0 | 1 | 1.0 | 0.0017773842088002478 |
| | 5.2 | | 2.0 | |
*----------------+-----------------+---------+-------------------+-----------------------*/
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["GoogleSQL for BigQuery offers two primary search functions: `SEARCH` for term-based searches and `VECTOR_SEARCH` for semantic similarity searches using embeddings."],["The `SEARCH` function determines if a BigQuery table or data contains specified search terms, returning `TRUE` if all terms are present and `FALSE` otherwise."],["`SEARCH` supports various analyzers like `LOG_ANALYZER`, `NO_OP_ANALYZER`, and `PATTERN_ANALYZER`, along with options to define the scope of JSON data, or customize tokenization rules and special characters."],["The `VECTOR_SEARCH` function identifies semantically similar entities by comparing high-dimensional numerical vectors, or embeddings, using distance metrics like `EUCLIDEAN`, `COSINE`, or `DOT_PRODUCT`."],["`VECTOR_SEARCH` allows optional use of vector indexes to enhance performance via Approximate Nearest Neighbor (ANN) search, or alternatively, brute force search for precise results."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4