A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/bigquery/docs/reference/standard-sql/search_functions below:

Search functions | BigQuery | Google Cloud

Skip to main content Search functions

Stay organized with collections Save and categorize content based on your preferences.

GoogleSQL for BigQuery supports the following search functions.

Function list Name Summary SEARCH Checks to see whether a table or other search data contains a set of search terms. VECTOR_SEARCH Performs a vector search on embeddings to find semantically similar entities. SEARCH
SEARCH(
  data_to_search, search_query
  [, json_scope => { 'JSON_VALUES' | 'JSON_KEYS' | 'JSON_KEYS_AND_VALUES' } ]
  [, analyzer => { 'LOG_ANALYZER' | 'NO_OP_ANALYZER' | 'PATTERN_ANALYZER'} ]
  [, analyzer_options => analyzer_options_values ]
)

Description

The SEARCH function checks to see whether a BigQuery table or other search data contains a set of search terms (tokens). It returns TRUE if all search terms appear in the data, based on the rules for search_query and text analysis described in the text analyzer. Otherwise, this function returns FALSE.

Definitions

Details

The SEARCH function is designed to work with search indexes to optimize point lookups. Although the SEARCH function works for tables that aren't indexed, its performance will be greatly improved with a search index. If both the analyzer and analyzer options match the one used to create the index, the search index will be used.

Rules for search_query

The 'NO_OP_ANALYZER' extracts the search query as a single token without parsing it. The following rules apply only when using the 'LOG_ANALYZER' or 'PATTERN_ANALYZER'.

A search query is a set of one or more terms that are combined using the logical operators AND and OR along with parenthesis. Any whitespace in the search query that is not in a phrase or backtick term is considered an (implicit) AND. First, a search query is broken down into terms using logical operators and parenthesis in the search query. Then, each term is evaluated based on whether or not it appears in the data to search. The final outcome of the SEARCH function is the result of the logical expression represented by the search query.

The following grammar is used to transform the search query into a logical expression of terms. The grammar is defined using the ANTLR meta-language:

query_string : expression EOF;

expression  : '(' expression  ')'
            | expression 'AND' expression
            | expression '\s' expression
            | expression 'OR' expression
            | term
            ;

term : single_term
     | phrase_term
     | backtick_term
     ;

backtick_term : '`' ( '\`' | ~[`] )+ '`';

phrase_term : '"' ( '\"' | ~["] )+ '"';

single_term : ( '\' reserved_char | ~[reserved_char] )+;

To evaluate each term, it is further broken down into zero or more searchable tokens based on the text analyzer. The following section contains the rules for how different types of terms are analyzed and evaluated.

Rules for backtick_term in search_query:

Rules for reserved_char in search_query:

Rules for phrase_term in search_query:

How data_to_search is broken into searchable tokens

The following table shows how data_to_search is broken into searchable tokens by the LOG_ANALYZER text analyzer. All entries are strings.

data_to_search searchable tokens 127.0.0.1 127
0
1
127.0.0.1
. 127.0.0
127.0
0.0
0.0.1
0.1 foobar@example.com foobar
example
com
foobar@example
example.com
foobar@example.com The fox. the
fox
The
The fox
The fox.
fox
fox.

How search_query is broken into query terms

The following table shows how search_query is broken into query terms by the LOG_ANALYZER text analyzer. All entries are strings.

search_query query terms 127.0.0.1 127
0
1
`127.0.0.1` 127.0.0.1 foobar@example.com foobar
example
com `foobar@example.com` foobar@example.com

Rules for data_to_search

General rules for data_to_search:

Return type

BOOL

Examples

The following queries show how tokens in search_query are analyzed by a SEARCH function call using the default analyzer, LOG_ANALYZER:

SELECT
  -- ERROR: `search_query` is NULL.
  SEARCH('foobarexample', NULL) AS a,

  -- ERROR: `search_query` contains no tokens.
  SEARCH('foobarexample', '') AS b,
SELECT
  -- TRUE: '-' and ' ' are delimiters.
  SEARCH('foobar-example', 'foobar example') AS a,

  -- TRUE: The search query is a constant expression evaluated to 'foobar'.
  SEARCH('foobar-example', CONCAT('foo', 'bar')) AS b,

  -- FALSE: The search_query isn't split.
  SEARCH('foobar-example', 'foobarexample') AS c,

  -- TRUE: The double backslash escapes the ampersand which is a delimiter.
  SEARCH('foobar-example', 'foobar\\&example') AS d,

  -- TRUE: The single backslash escapes the ampersand in a raw string.
  SEARCH('foobar-example', R'foobar\&example')AS e,

  -- FALSE: The backticks indicate that there must be an exact match for
  -- foobar&example.
  SEARCH('foobar-example', '`foobar&example`') AS f,

  -- TRUE: An exact match is found.
  SEARCH('foobar&example', '`foobar&example`') AS g

/*-------+-------+-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     | f     | g     |
 +-------+-------+-------+-------+-------+-------+-------+
 | true  | true  | false | true  | true  | false | true  |
 *-------+-------+-------+-------+-------+-------+-------*/
SELECT
  -- TRUE: The order of terms doesn't matter.
  SEARCH('foobar-example', 'example foobar') AS a,

  -- TRUE: Tokens are made lower-case.
  SEARCH('foobar-example', 'Foobar Example') AS b,

  -- TRUE: An exact match is found.
  SEARCH('foobar-example', '`foobar-example`') AS c,

  -- FALSE: Backticks preserve capitalization.
  SEARCH('foobar-example', '`Foobar`') AS d,

  -- FALSE: Backticks don't have special meaning for search_data and are
  -- not delimiters in the default LOG_ANALYZER.
  SEARCH('`foobar-example`', '`foobar-example`') AS e,

  -- TRUE: An exact match is found after the delimiter in search_data.
  SEARCH('foobar@example.com', '`example.com`') AS f,

  -- TRUE: An exact match is found between the space delimiters.
  SEARCH('a foobar-example b', '`foobar-example`') AS g;

/*-------+-------+-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     | f     | g     |
 +-------+-------+-------+-------+-------+-------+-------+
 | true  | true  | true  | false | false | true  | true  |
 *-------+-------+-------+-------+-------+-------+-------*/
SELECT
  -- FALSE: No single array entry matches all search terms.
  SEARCH(['foobar', 'example'], 'foobar example') AS a,

  -- FALSE: The search query is equivalent to foobar\\=.
  SEARCH('foobar=', '`foobar\\=`') AS b,

  -- FALSE: This is equivalent to the previous example.
  SEARCH('foobar=', R'`\foobar=`') AS c,

  -- TRUE: The equals sign is a delimiter in the data and query.
  SEARCH('foobar=', 'foobar\\=') AS d,

  -- TRUE: This is equivalent to the previous example.
  SEARCH('foobar=', R'foobar\=') AS e,

  -- TRUE: An exact match is found.
  SEARCH('foobar.example', '`foobar`') AS f,

  -- FALSE: `foobar.\` isn't analyzed because of backticks; it isn't
  -- followed by a delimiter in search_data 'foobar.example'.
  SEARCH('foobar.example', '`foobar.\`') AS g,

  -- TRUE: `foobar.` isn't analyzed because of backticks; it is
  -- followed by the delimiter '.' in search_data 'foobar..example'.
  SEARCH('foobar..example', '`foobar.`') AS h;

/*-------+-------+-------+-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     | f     | g     | h     |
 +-------+-------+-------+-------+-------+-------+-------+-------+
 | false | false | false | true  | true  | true  | false | true  |
 *-------+-------+-------+-------+-------+-------+-------+-------*/

The following queries show how logical expression can be used in search_query to perform a SEARCH function call:

SELECT
  -- TRUE: A whitespace is an implicit AND.
  -- Both `foo` and `bar` are in `foo bar baz`.
  SEARCH(R'foo bar baz', R'foo bar') AS a,

  -- TRUE: Similar to previous case
  -- `foo` and `bar` are in `foo bar baz`.
  SEARCH(R'foo bar baz', R'foo AND bar') AS b,

  -- TRUE: Only one of `foo` or `bar` should be in `foo`.
  SEARCH(R'foo', R'foo OR bar') AS c,

  -- TRUE: `foo` and one of `bar` or `baz` should be in `foo bar`.
  SEARCH(R'foo bar', R'"foo AND (bar OR baz)"') AS d,

  -- FALSE: Neither `bar` or `baz` are in `foo`.
  SEARCH(R'foo', R'foo AND (bar OR baz)') AS c,

/*-------+-------+-------+-------+-------+
 | a     | b     | c     | d     | e     |
 +-------+-------+-------+-------+-------+
 | true  | true  | true  | true  | false |
 *-------+-------+-------+-------+-------+/

The following queries show how phrases in search_query are analyzed by a SEARCH function call:

SELECT
  -- TRUE: The phrase `foo bar` is in `foo bar baz`.
  -- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
  -- The searchable tokens in `query_string` are `foo` and `bar`
  -- and because they appear in that exact order in `data_to_search`,
  -- the function returns TRUE.
  SEARCH(R'foo bar baz', R'"foo bar"') AS a,

  -- TRUE: Case is ignored.
  -- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
  -- The searchable tokens in `query_string` are `foo` and `bar`
  -- and because they appear in that exact order in `data_to_search`,
  -- the function return TRUE.
  SEARCH(R'Foo bar baz', R'"foo Bar"') AS b,

  -- TRUE: Both `-` and `&` are delimiters used during tokenization.
  -- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
  -- The searchable tokens in `query_string` are `foo` and `bar`
  -- and because they appear in that exact order in `data_to_search`,
  -- the function returns TRUE.
  SEARCH(R'foo-bar baz', R'"foo&bar"') AS c,

  -- FALSE: Backticks in a phrase are treated as normal characters.
  -- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
  -- The searchable tokens in `query_string` are:
  -- `foo
  -- bar`
  -- Because these searchable tokens don't appear in `data_to_search`,
  -- the function returns FALSE.
  SEARCH(R'foo bar baz', R'"`foo bar`"') AS d,

  -- FALSE: `foo bar` isn't in `foo else bar`.
  -- The tokens in `data_to_search` are `foo`, `else`, and `bar`.
  -- The searchable tokens in `query_string` are `foo` and `bar`.
  -- Even though they appear in `data_to_search`, but because they
  -- don't appear in that exact order (`foo` before `bar`),
  -- the function returns FALSE.
  SEARCH(R'foo else bar', R'"foo bar"') AS e,

  -- FALSE: `foo baz` isn't in `foo bar baz`.
  -- The `search_query` produces two terms. The first term is `bar`, which
  -- matches with the similar token in `data_to_search`. However, the second
  -- term is the phrase "foo&baz" with two tokens, `foo` and `baz`. Because
  -- `foo` and `baz` don't appear next to each other in `data_to_search`
  -- (`bar` is in between), the function returns FALSE.
  SEARCH(R'foo-bar-baz', R'bar "foo&baz"') AS f;

/*-------+-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     | f     |
 +-------+-------+-------+-------+-------+-------+
 | true  | true  | false | false | false | false |
 *-------+-------+-------+-------+-------+-------*/
SELECT
  -- FALSE: Only double quotes need to be escaped in a phrase.
  -- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
  -- The searchable tokens in `query_string` are `foo\` and `bar` and they
  -- must appear in that exact order in `data_to_search`, but don't.
  SEARCH(
    R'foo bar baz',
    R'"foo\ bar"',
    analyzer_options=>'{"delimiters": [" "]}') AS a,

  -- TRUE: `foo bar` is in `foo bar baz` after tokenization with the given
  -- delimiters.
  -- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
  -- The searchable tokens in `query_string` are `foo` and `bar` and they
  -- must appear in that exact order in `data_to_search`.
  SEARCH(
    R'foo bar baz',
    R'"foo? bar"',
    analyzer_options=>'{"delimiters": [" ", "?"]}') AS b,

  -- TRUE: `read book` is in `read book now` after `the` is ignored.
  -- The tokens in `data_to_search` are `read`, `book`, and `now`.
  -- The searchable tokens in `query_string` are `read` and `book` and they
  -- must appear in that exact order in `data_to_search`.
  SEARCH(
    'read the book now',
    R'"read the book"',
    analyzer_options => '{ "token_filters": [{"stop_words": ["the"]}] }') AS c,

  -- FALSE: `c d` isn't in `a`, `b`, `cd`, `e` or `f` after tokenization with
  -- the given pattern.
  -- The tokens in `data_to_search` are `a`, `b`, `cd`, `e` and `f`.
  -- The searchable tokens in `query_string` are `c` and `d` and they
  -- must appear in that exact order in `data_to_search`. `data_to_search`
  -- contains a `cd` token, but not a `c` or `d` token.
  SEARCH(
    R'abcdef',
    R'"c d"',
    analyzer=>'PATTERN_ANALYZER',
    analyzer_options=>'{"patterns": ["(?:cd)|[a-z]"]}') AS d,

  -- TRUE: `ant apple` is in `ant apple avocado` after tokenization with
  -- the given pattern.
  -- The tokens in `data_to_search` are `ant`, `apple`, and `avocado`.
  -- The searchable tokens in `query_string` are `ant` and `apple` and they
  -- must appear in that exact order in `data_to_search`.
  SEARCH(
    R'ant orange apple avocado',
    R'"ant apple"',
    analyzer=>'PATTERN_ANALYZER',
    analyzer_options=>'{"patterns": ["a[a-z]"]}') AS e;

/*-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     |
 +-------+-------+-------+-------+-------+
 | false | true  | true  | false | true  |
 *-------+-------+-------+-------+-------*/

The following query shows examples of calls to the SEARCH function using the NO_OP_ANALYZER text analyzer and reasons for various return values:

SELECT
  -- TRUE: exact match
  SEARCH('foobar', 'foobar', analyzer=>'NO_OP_ANALYZER') AS a,

  -- FALSE: Backticks aren't special characters for `NO_OP_ANALYZER`.
  SEARCH('foobar', '\`foobar\`', analyzer=>'NO_OP_ANALYZER') AS b,

  -- FALSE: The capitalization doesn't match.
  SEARCH('foobar', 'Foobar', analyzer=>'NO_OP_ANALYZER') AS c,

  -- FALSE: There are no delimiters for `NO_OP_ANALYZER`.
  SEARCH('foobar example', 'foobar', analyzer=>'NO_OP_ANALYZER') AS d,

  -- TRUE: An exact match is found.
  SEARCH('', '', analyzer=>'NO_OP_ANALYZER') AS e,

  -- FALSE: 'foo bar' and "foo bar" aren't considered an exact match.
  SEARCH( R'foo bar baz', R'"foo bar"', analyzer=>'NO_OP_ANALYZER') AS f,

  -- TRUE: "foo bar" and "foo Bar" are considered an exact match because the
  -- analysis is case-insensitive.
  SEARCH( R'"foo bar"', R'"foo Bar"', analyzer=>'NO_OP_ANALYZER') AS g;

  -- FALSE: With NO_OP_ANALYZER the query string is analyzed as "foo OR bar"
  -- which is not an exact match with "foo".
  SEARCH( R'foo', R'foo OR bar', analyzer=>'NO_OP_ANALYZER') AS h;

/*-------+-------+-------+-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     | f     | g     | h     |
 +-------+-------+-------+-------+-------+-------+-------+-------+
 | true  | false | false | false | true  | false | true  | false |
 *-------+-------+-------+-------+-------+-------+-------+-------*/

Consider the following table called meals with columns breakfast, lunch, and dinner:

/*-------------------+-------------------------+------------------*
 | breakfast         | lunch                   | dinner           |
 +-------------------+-------------------------+------------------+
 | Potato pancakes   | Toasted cheese sandwich | Beef soup        |
 | Avocado toast     | Tomato soup             | Chicken soup     |
 *-------------------+-------------------------+------------------*/

The following query shows how to search single columns, multiple columns, and whole tables, using the default LOG_ANALYZER text analyzer with the default analyzer options:

WITH
  meals AS (
    SELECT
      'Potato pancakes' AS breakfast,
      'Toasted cheese sandwich' AS lunch,
      'Beef soup' AS dinner
    UNION ALL
    SELECT
      'Avocado toast' AS breakfast,
      'Tomato soup' AS lunch,
      'Chicken soup' AS dinner
  )
SELECT
  SEARCH(lunch, 'soup') AS lunch_soup,
  SEARCH((breakfast, dinner), 'soup') AS breakfast_or_dinner_soup,
  SEARCH(meals, 'soup') AS anytime_soup
FROM meals;

/*------------+--------------------------+--------------*
 | lunch_soup | breakfast_or_dinner_soup | anytime_soup |
 +------------+--------------------------+--------------+
 | false      | true                     | true         |
 | true       | true                     | true         |
 *------------+--------------------------+--------------*/

The following query shows additional ways to search, using the default LOG_ANALYZER text analyzer with default analyzer options:

WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
  SEARCH(email, 'exam') AS a,
  SEARCH(email, 'foobar') AS b,
  SEARCH(email, 'example.com') AS c,
  SEARCH(email, R'"please use"') AS d,
  SEARCH(email, R'"as email"') AS e
FROM data;

/*-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     |
 +-------+-------+-------+-------+-------+
 | false | true  | true  | true  | false |
 *-------+-------+-------+-------+-------*/

The following query shows additional ways to search, using the default LOG_ANALYZER text analyzer with custom analyzer options. Terms are only split when a space or @ symbol is encountered.

WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
  SEARCH(email, 'foobar', analyzer_options=>'{"delimiters": [" ", "@"]}') AS a,
  SEARCH(email, 'example', analyzer_options=>'{"delimiters": [" ", "@"]}') AS b,
  SEARCH(email, 'example.com', analyzer_options=>'{"delimiters": [" ", "@"]}') AS c,
  SEARCH(email, 'foobar@example.com', analyzer_options=>'{"delimiters": [" ", "@"]}') AS d,
  SEARCH(email, R'use "foobar example.com" "as your"', analyzer_options=>'{"delimiters": [" ", "@"]}') AS e
FROM data;

/*-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     |
 +-------+-------+-------+-------+-------+
 | true  | false | true  | true  | true  |
 *-------+-------+-------+-------+-------*/

The following query shows how to search, using the NO_OP_ANALYZER text analyzer:

WITH meals AS ( SELECT 'Tomato soup' AS lunch )
SELECT
  SEARCH(lunch, 'Tomato soup', analyzer=>'NO_OP_ANALYZER') AS a,
  SEARCH(lunch, 'soup', analyzer=>'NO_OP_ANALYZER') AS b,
  SEARCH(lunch, 'tomato soup', analyzer=>'NO_OP_ANALYZER') AS c,
  SEARCH(lunch, R'"Tomato soup"', analyzer=>'NO_OP_ANALYZER') AS d
FROM meals;

/*-------+-------+-------+-------*
 | a     | b     | c     | d     |
 +-------+-------+-------+-------+
 | true  | false | false | false |
 *-------+-------+-------+-------*/

The following query shows how to use the PATTERN_ANALYZER text analyzer with default analyzer options:

WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
  SEARCH(email, 'exam', analyzer=>'PATTERN_ANALYZER') AS a,
  SEARCH(email, 'foobar', analyzer=>'PATTERN_ANALYZER') AS b,
  SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER') AS c,
  SEARCH(email, R'foobar "EXAMPLE.com as" email', analyzer=>'PATTERN_ANALYZER') AS d
FROM data;

/*-------+-------+-------+-------*
 | a     | b     | c     | d     |
 +-------+-------+-------+-------+
 | false | true  | true  | true  |
 *-------+-------+-------+-------*/

The following query shows additional ways to search, using the PATTERN_ANALYZER text analyzer with custom analyzer options:

WITH data AS ( SELECT 'Please use foobar@EXAMPLE.com as your email.' AS email )
SELECT
  SEARCH(email, 'EXAMPLE', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[A-Z]*"]}') AS a,
  SEARCH(email, 'example', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-z]*"]}') AS b,
  SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-z]*"]}') AS c,
  SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-zA-Z.]*"]}') AS d
FROM data;

/*-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     |
 +-------+-------+-------+-------+-------+
 | true  | false | false | true  | false |
 *-------+-------+-------+-------+-------*/

For additional examples that include analyzer options, see the Text analysis reference guide.

For helpful analyzer recipes that you can use to enhance analyzer-supported queries, see the Search with text analyzers user guide.

VECTOR_SEARCH
VECTOR_SEARCH(
  { TABLE base_table | (base_table_query) },
  column_to_search,
  { TABLE query_table | (query_table_query) },
  [, query_column_to_search => query_column_to_search_value]
  [, top_k => top_k_value ]
  [, distance_type => distance_type_value ]
  [, options => options_value ]
)

Description

The VECTOR_SEARCH function lets you search embeddings to find semantically similar entities.

Embeddings are high-dimensional numerical vectors that represent a given entity, like a piece of text or an audio file. Machine learning (ML) models use embeddings to encode semantics about such entities to make it easier to reason about and compare them. For example, a common operation in clustering, classification, and recommendation models is to measure the distance between vectors in an embedding space to find items that are most semantically similar.

Definitions

Details

You can optionally use VECTOR_SEARCH with a vector index. When a vector index is used, VECTOR_SEARCH uses the Approximate Nearest Neighbor search technique to help improve vector search performance, with the trade-off of reducing recall and so returning more approximate results. When a base table is large, the use of an index typically improves performance without significantly sacrificing recall. Brute force is used to return exact results when a vector index isn't available, and you can choose to use brute force to get exact results even when a vector index is available.

Output

For each row in the query data, the output contains multiple rows from the base table that satisfy the search criteria. The number of results rows per query table row is either 10 or the top_k value if it's specified. The order of the output isn't guaranteed.

The output includes the following columns:

Limitations

BigQuery data security and governance rules apply to the use of VECTOR_SEARCH, which results in the following behavior:

Examples

The following queries create test tables base_table and query_table to use in subsequent query examples :

CREATE OR REPLACE TABLE mydataset.base_table
(
  id INT64,
  my_embedding ARRAY<FLOAT64>
);

INSERT mydataset.base_table (id, my_embedding)
VALUES(1, [1.0, 2.0]),
(2, [2.0, 4.0]),
(3, [1.5, 7.0]),
(4, [1.0, 3.2]),
(5, [5.0, 5.4]),
(6, [3.7, 1.8]),
(7, [4.4, 2.9]);
CREATE OR REPLACE TABLE mydataset.query_table
(
  query_id STRING,
  embedding ARRAY<FLOAT64>
);

INSERT mydataset.query_table (query_id, embedding)
VALUES('dog', [1.0, 2.0]),
('cat', [3.0, 5.2]);

The following example searches the my_embedding column of base_table for the top two embeddings that match each row of data in the embedding column of query_table:

SELECT *
FROM
  VECTOR_SEARCH(
    TABLE mydataset.base_table,
    'my_embedding',
    (SELECT query_id, embedding FROM mydataset.query_table),
    'embedding',
    top_k => 2);

/*----------------+-----------------+---------+----------------------------------------*
 | query.query_id | query.embedding | base.id | base.my_embedding | distance           |
 +----------------+-----------------+---------+-------------------+--------------------+
 | dog            | 1.0             | 1       | 1.0               | 0                  |
 |                | 2.0             |         | 2.0               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | dog            | 1.0             | 4       | 1.0               | 1.2000000000000002 |
 |                | 2.0             |         | 3.2               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | cat            | 3.0             | 2       | 2.0               | 1.5620499351813311 |
 |                | 5.2             |         | 4.0               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | cat            | 3.0             | 5       | 5.0               | 2.0099751242241779 |
 |                | 5.2             |         | 5.4               |                    |
 *----------------+-----------------+---------+-------------------+--------------------*/

The following example pre-filters base_table to rows where id isn't equal to 4 and then searches the my_embedding column of base_table for the top two embeddings that match each row of data in the embedding column of query_table.

SELECT *
FROM
  VECTOR_SEARCH(
    (SELECT * FROM mydataset.base_table WHERE id != 4),
    'my_embedding',
    (SELECT query_id, embedding FROM mydataset.query_table),
    'embedding',
    top_k => 2,
    options => '{"use_brute_force":true}');

/*----------------+-----------------+---------+----------------------------------------*
 | query.query_id | query.embedding | base.id | base.my_embedding | distance           |
 +----------------+-----------------+---------+-------------------+--------------------+
 | dog            | 1.0             | 1       | 1.0               | 0                  |
 |                | 2.0             |         | 2.0               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | dog            | 1.0             | 2       | 2.0               | 2.23606797749979   |
 |                | 2.0             |         | 4.0               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | cat            | 3.0             | 2       | 2.0               | 1.5620499351813311 |
 |                | 5.2             |         | 4.0               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | cat            | 3.0             | 5       | 5.0               | 2.0099751242241779 |
 |                | 5.2             |         | 5.4               |                    |
 *----------------+-----------------+---------+-------------------+--------------------*/

The following example searches the my_embedding column of base_table for the top two embeddings that match each row of data in the embedding column of query_table, and uses the COSINE distance type to measure the distance between the embeddings:

SELECT *
FROM
  VECTOR_SEARCH(
    TABLE mydataset.base_table,
    'my_embedding',
    TABLE mydataset.query_table,
    'embedding',
    top_k => 2,
    distance_type => 'COSINE');

/*----------------+-----------------+---------+-------------------------------------------+
 | query.query_id | query.embedding | base.id | base.my_embedding | distance              |
 +----------------+-----------------+---------+-------------------+-----------------------+
 | dog            | 1.0             | 2       | 2.0               | 0                     |
 |                | 2.0             |         | 4.0               |                       |
 +----------------+-----------------+---------+-------------------+-----------------------+
 | dog            | 1.0             | 1       | 1.0               | 0                     |
 |                | 2.0             |         | 2.0               |                       |
 +----------------+-----------------+---------+-------------------+-----------------------+
 | cat            | 3.0             | 2       | 2.0               | 0.0017773842088002478 |
 |                | 5.2             |         | 4.0               |                       |
 +----------------+-----------------+---------+-------------------+-----------------------+
 | cat            | 3.0             | 1       | 1.0               | 0.0017773842088002478 |
 |                | 5.2             |         | 2.0               |                       |
 *----------------+-----------------+---------+-------------------+-----------------------*/

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-07 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["GoogleSQL for BigQuery offers two primary search functions: `SEARCH` for term-based searches and `VECTOR_SEARCH` for semantic similarity searches using embeddings."],["The `SEARCH` function determines if a BigQuery table or data contains specified search terms, returning `TRUE` if all terms are present and `FALSE` otherwise."],["`SEARCH` supports various analyzers like `LOG_ANALYZER`, `NO_OP_ANALYZER`, and `PATTERN_ANALYZER`, along with options to define the scope of JSON data, or customize tokenization rules and special characters."],["The `VECTOR_SEARCH` function identifies semantically similar entities by comparing high-dimensional numerical vectors, or embeddings, using distance metrics like `EUCLIDEAN`, `COSINE`, or `DOT_PRODUCT`."],["`VECTOR_SEARCH` allows optional use of vector indexes to enhance performance via Approximate Nearest Neighbor (ANN) search, or alternatively, brute force search for precise results."]]],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4