Stay organized with collections Save and categorize content based on your preferences.
Introduction to vector searchThis document provides an overview of vector search in BigQuery. Vector search is a technique to compare similar objects using embeddings, and it is used to power Google products, including Google Search, YouTube, and Google Play. You can use vector search to perform searches at scale. When you use vector indexes with vector search, you can take advantage of foundational technologies like inverted file indexing (IVF) and the ScaNN algorithm.
Vector search is built on embeddings. Embeddings are high-dimensional numerical vectors that represent a given entity, like a piece of text or an audio file. Machine learning (ML) models use embeddings to encode semantics about such entities to make it easier to reason about and compare them. For example, a common operation in clustering, classification, and recommendation models is to measure the distance between vectors in an embedding space to find items that are most semantically similar.
This concept of semantic similarity and distance in an embedding space is visually demonstrated when you consider how different items might be plotted. For example, terms like cat, dog, and lion, which all represent types of animals, are grouped close together in this space due to their shared semantic characteristics. Similarly, terms like car, truck, and the more generic term vehicle would form another cluster. This is shown in the following image:
You can see that the animal and vehicle clusters are positioned far apart from each other. The separation between the groups illustrates the principle that the closer objects are in the embedding space, the more semantically similar they are, and greater distances indicate greater semantic dissimilarity.
BigQuery provides an end-to-end experience for generating embeddings, indexing content, and performing vector searches. You can complete each of these tasks independently, or in a single journey. For a tutorial that shows how to complete all of these tasks, see Perform semantic search and retrieval-augmented generation.
To perform a vector search by using SQL, you use the VECTOR_SEARCH
function. You can optionally create a vector index by using the CREATE VECTOR INDEX
statement. When a vector index is used, VECTOR_SEARCH
uses the Approximate Nearest Neighbor search technique to improve vector search performance, with the trade-off of reducing recall and so returning more approximate results. Without a vector index, VECTOR_SEARCH
uses brute force search to measure distance for every record. You can also choose to use brute force to get exact results even when a vector index is available.
This document focuses on the SQL approach, but you can also perform vector searches by using BigQuery DataFrames in Python. For a notebook that illustrates the Python approach, see Build a Vector Search application using BigQuery DataFrames.
Use casesThe combination of embedding generation and vector search enables many interesting use cases. Some possible use cases are as follows:
The VECTOR_SEARCH
function and the CREATE VECTOR INDEX
statement use BigQuery compute pricing.
VECTOR_SEARCH
function: You are charged for similarity search, using on-demand or editions pricing.
Editions pricing: You are charged for the slots required to complete the job within your reservation edition. Larger, more complex similarity calculations incur more charges.
Note: Using an index isn't supported in Standard editions.CREATE VECTOR INDEX
statement: There is no charge for the processing required to build and refresh your vector indexes as long as the total size of the indexed table data is below your per-organization limit. To support indexing beyond this limit, you must provide your own reservation for handling the index management jobs.
Storage is also a consideration for embeddings and indexes. The amount of bytes stored as embeddings and indexes are subject to active storage costs.
INFORMATION_SCHEMA.VECTOR_INDEXES
view. If the vector index is not yet at 100% coverage, you are still charged for whatever has been indexed. You can check index coverage by using the INFORMATION_SCHEMA.VECTOR_INDEXES
view.For more information, see Vector index limits.
LimitationsQueries that contain the VECTOR_SEARCH
function aren't accelerated by BigQuery BI Engine.
VECTOR_SEARCH
function.Try the Perform semantic search and retrieval-augmented generation tutorial to learn how to do the following tasks:
Try the Parse PDFs in a retrieval-augmented generation pipeline tutorial to learn how to create a RAG pipeline based on parsed PDF content.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["Vector search in BigQuery allows searching embeddings to identify semantically similar entities, using high-dimensional numerical vectors that represent data like text or audio."],["The `VECTOR_SEARCH` function, optionally enhanced by a vector index, enables this search, with the index improving performance through Approximate Nearest Neighbor search, and brute force offering an alternative to get exact results."],["Embedding generation combined with vector search powers use cases like retrieval-augmented generation (RAG), resolving similar support cases, patient profile matching, and analyzing sensor data."],["Pricing for `CREATE VECTOR INDEX` and `VECTOR_SEARCH` falls under BigQuery compute pricing, with free indexing up to a per-organization limit, after which users need to use their own reservations to index."],["`VECTOR_SEARCH` queries aren't supported by BigQuery BI Engine, and BigQuery's data security and governance rules apply to its use."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4