MYSQL_8_0_version.R20241208.01_00
or newer, where version is the minor version number. For information about how to upgrade your instance to a newer version that supports GA vector embeddings, see Self-service maintenance.
For information about the syntax and behavior of vector embeddings for the Public Preview version of this feature, see Work with vector embeddings (Preview).
This page describes how vector searches are implemented on Cloud SQL for MySQL instances. Cloud SQL lets you store vector embeddings, create vector indexes, and perform vector searches in conjunction with your other stored data.
Vector embedding storageYou store vector embeddings in a table that's compliant with atomicity, consistency, isolation, and durability (ACID) properties. Like other relational data in the table, you can access the vector embeddings in the table with existing transactional semantics.
To establish mapping between table rows and vector representations, you need to create a column in your table to store your vector embeddings. The column must use the Cloud SQL VECTOR
data type and must indicate the number of dimensions that the embedding requires. The vector embedding column can only store vector embeddings that use exactly the same dimensions that you specify when you define the column.
A table can only have one vector embedding column. There are no restrictions for the number of rows in the table.
To distinguish the vector embedding column from other columns, Cloud SQL adds a special COMMENT
and CONSTRAINT
to the column. The constraint is required for input validation, and the vector embedding column annotation is visible as a COMMENT
. You can't modify or delete the comment or constraint.
If you have enough storage and memory available on your Cloud SQL instance, you can have multiple tables with their own vector embedding columns.
Data replication works the same way for the vector embedding column as it does for other MySQL InnoDB columns.
For a list of limitations and restrictions for vector embedding tables, columns, and DML statements, see Limitations.
Vector indexesYou must use a vector index to perform ANN similarity searches on your vector embeddings. Cloud SQL creates vector indexes using the Scalable Nearest Neighbors (ScANN) algorithm.
Vector indexes have the following requirements:
For better search quality, create a vector index only after loading the bulk of your data in the base table. If you have less than 1000 embeddings in the base table, the index creation fails.
When deciding whether to create a vector index, if you have a small number of rows, consider whether you can perform a KNN search instead. The decision to use a KNN versus an ANN search also depends on the number of dimensions on the vector embedding. A larger number of embeddings might require a vector index.
For a list of limitations and restrictions for vector indexes, see Limitations. For information on creating a vector index, see Create and manage vector indexes.
Vector index updatesCloud SQL updates vector indexes in real time. Any transaction that performs data manipulation language (DML) operations on the base table also propagates changes to the associated vector indexes. Vector indexes behave in the same way as any other secondary index on the table. The vector indexes are fully transactionally consistent and ACID compliant. If you roll back a transaction, then the corresponding rollback changes also occur in the vector index.
Replication of vector indexesCloud SQL replicates vector indexes to all read replicas, including for cascading replication. When you create a new read replica from a primary instance that has vector embedding, the read replica inherits the vector embedding settings from the primary instance. For existing read replicas, you must enable vector embedding support on each one.
In terms of impact to replication lag, creating and maintaining vector indexes operate in the same way as regular MySQL indexes.
Persistence, shutdown, and impact on maintenanceVector indexes are persisted the same way as base tables, with full ACID support. Vector indexes are always in-sync with their base table data, and have the same visibility, isolation, and crash safety. There is no impact to the vector index when the instance is shut down or receives maintenance.
Index maintenanceAfter extensive DML operations are performed on the base table, the vector index that you trained on the initial data (at the time of index creation) might not reflect the new state. This can impact search quality.
There are two parts to the index:
The index tree might become less efficient after a large number of DML statements are run because rows move from one leaf to another. To refresh the index tree, you need to rebuild the index.
Unsupported DDL operations on tables with vector indexesCloud SQL provides vector distance functions that you use to perform approximate nearest neighbor (ANN) and K-nearest neighbors (KNN) vector similarity searches on your instance. When you run a query, the query vector is compared to vectors in your dataset. Distance functions calculate the distance between the vectors using a similarity metric such as cosine. The vectors with the shortest distance between them are the most similar and are returned in search results.
Cloud SQL uses the following functions to measure distance between vectors in vector searches when you perform ANN and KNN vector searches:
A KNN vector search is the preferred search method when you need exact results or want to add selective filtering. KNN search performs a distance computation of the query vector with every embedding in the dataset to find the nearest neighbor. KNN searches in Cloud SQL provide perfect recall. KNN searches don't use a vector index so they're a good option when working with smaller datasets.
To perform a KNN search, you use the vector_distance
function that takes two vectors as input: the query vector (what you're searching for) and a candidate vector from your dataset. It calculates the distance between these two vectors. You use vector_distance in a SELECT
statement. For more information, See Search K-nearest neighbors (KNN).
If you find that KNN isn't performing well, you can build a vector index later and continue to use approx_distance
in your application for ANN searches.
An ANN vector search is the preferred search type when query efficiency is a concern. It speeds up similarity searches by calculating the distance between your query vector and only a portion of the vectors in your dataset. To do this, Cloud SQL organizes the data into clusters or partitions and then focuses the search on the clusters closest to the query. ANN searches require vector indexes. These indexes prioritize search speed over perfect recall. In Cloud SQL, the TREE_SQ index type is used for ANN searches.
To perform an ANN search, you use the approx_distance
function with a distance measurement option. You use approx_distance
in an ORDER BY
or SELECT
list and a LIMIT
clause is permitted to limit search results. You can also add a WHERE
clause to perform post-filtering of your search results. For more information, see Search approximate nearest neighbors (ANN).
There are some cases when an ANN search falls back to a KNN search. For more information, see Check the fallback status for ANN searches.
RequirementsCloud SQL requires you to enable vector embeddings on the instance using the cloudsql_vector
flag before you add vector embeddings. For more information, see Enable and disable vector embeddings on your instance.
The following are limitations on tables that have a vector embedding column:
BIT
, BINARY
, VARBINARY
, JSON
, BLOB
, TEXT
data types, or spatial data aren't supported for vector indexes. Composite primary keys also can't include any of these types.The following are restrictions for vector search queries:
approx_distance
function can only be used in an ORDER BY
or SELECT
list.WHERE
condition in combination with approx_distance
expressions in the ORDER BY
or SELECT
list. The WHERE
condition predicates are evaluated after the approx_distance
vector functions are evaluated.This section provides best practices for working with vector indexes. Every workload is different, and you might need to adjust accordingly.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4