The DOT_PRODUCT
function returns the scalar product, or dot product, of two vectors. DOT_PRODUCT
takes as input two vectors and returns a numeric value.
A common use of DOT_PRODUCT
is to calculate the similarity between vectors (vector similarity), which is used in semantic text search, generative AI, searches of images and audio files, and other applications. A typical query using DOT_PRODUCT
is to find a set of vectors that most closely match a query vector.
If the input vectors to DOT_PRODUCT
are normalized to length 1, then the result of DOT_PRODUCT
is the cosine of the angle between the vectors. In this case, DOT_PRODUCT
produces what is known as the cosine similarity metric.
SingleStore supports a native vector data type and indexed approximate-nearest-neighbor (ANN) search that together provide high-performance vector search and easier building of vector-based applications.
See Vector Type, Vector Indexing, and Working with Vector Data for more information about using vectors in SingleStore.
Syntaxvector_expression <*> vector_expression
DOT_PRODUCT(vector_expression, vector_expression)
Arguments
vector_expression
: An expression that evaluates to a vector. Vectors can be stored in SingleStore using the native VECTOR
type (Vector Type) or the BLOB
type (Binary String Types). SingleStore recommends using the VECTOR
type when possible.
JSON strings are allowed as vector_expression
s when the other argument is of type VECTOR
.
If both arguments are of type VECTOR
, those arguments must have the same element types and the same number of elements.
If one argument is a VECTOR
, the other argument (which may be a JSON string or a BLOB
) will be converted to the type of the VECTOR
argument.
It will cause an error if the JSON string has a different number of elements than the VECTOR
argument.
It will cause an error if the length of the BLOB
is such that the BLOB
cannot be converted to the type of the VECTOR
. Note that there is no type checking in this conversion, so ensure that the BLOB
s were encoded with the same type as the VECTOR
argument.
If both arguments are BLOB
s, both arguments will be treated as vectors with 32-bit floating-point numbers. It will cause an error if the arguments are different lengths.
If the result is infinity, negative infinity, or not a number (NaN), NULL
will be returned instead.
The default format for vector element storage and processing is a 32-bit floating-point number (F32
). The DOT_PRODUCT
function assumes the vector inputs are encoded as 32-bit floating-point numbers and returns a DOUBLE
.
See Using Suffixes for Other Element Types with BLOBs for information on using DOT_PRODUCT
with vectors with element types other than 32-bit floating-point numbers.
Vectors can be output in JSON or binary format. Use JSON format for examples and for output readability. For production, use the default binary for efficiency.
Use the following command to output vectors in JSON format.
SET vector_type_project_format = JSON;
Use the following command to set the output format back to binary.
Using DOT_PRODUCT with the VECTOR Data TypeSET vector_type_project_format = BINARY;
The following example shows the use of DOT_PRODUCT()
to calculate the similarity between a query vector and a set of vectors in a table.
Create a table with a column of type VECTOR
, insert data into the table, and then verify the contents of the table.
CREATE TABLE vectors (id int, vec VECTOR(4) not null);
INSERT INTO vectors VALUES (1, '[0.45, 0.55, 0.495, 0.5]');
INSERT INTO vectors VALUES (2, '[0.1, 0.8, 0.2, 0.555]');
INSERT INTO vectors VALUES (3, '[-0.5, -0.03, -0.1, 0.86]');
INSERT INTO vectors VALUES (4, '[0.5, 0.3, 0.807, 0.1]');
SET vector_type_project_format = JSON;
SELECT id, vec
FROM vectors
ORDER BY id;
+------+---------------------------------------------------+
| id | vec |
+------+---------------------------------------------------+
| 1 | [0.449999988,0.550000012,0.495000005,0.5] |
| 2 | [0.100000001,0.800000012,0.200000003,0.555000007] |
| 3 | [-0.5,-0.0299999993,-0.100000001,0.860000014] |
| 4 | [0.5,0.300000012,0.806999981,0.100000001] |
+------+---------------------------------------------------+
You will notice that the results of the SELECT
do not exactly match the values in the INSERT
. This is because elements in the VECTOR
data type are stored as floating-point numbers and the values in the INSERT
statement are not perfectly representable in floating-point representation. See Finite Precision of Floating-Point Arithmetic below for an example of the impact of floating-point arithmetic on DOT_PRODUCT
.
After you've created a table and inserted data, set up a query vector. The queries below will evaluate vector similarity between the vectors in the vectors
table and @query_vec
.
The SQL below finds the similarities between vectors and @query_vec
using using the DOT_PRODUCT
infix operator <*>
. The ORDER BY
clause is included so the order of your results matches the results shown below.
Higher values of DOT_PRODUCT
indicate the vectors are more similar. On normalized vectors (Vector Normalization), the value of DOT_PRODUCT
ranges between -1
and 1
.
The @query_vec
variable is cast to a VECTOR
to ensure that @query_vec
is a valid VECTOR
and to improve performance.
SET vector_type_project_format = JSON;
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);
SELECT vec, vec <*> @query_vec AS score
FROM vectors
ORDER BY score DESC;
+---------------------------------------------------|---------------------+
| vec | score |
+---------------------------------------------------|---------------------+
| [0.449999988,0.550000012,0.495000005,0.5] | 0.9810000061988831 |
| [0.100000001,0.800000012,0.200000003,0.555000007] | 0.8993000388145447 |
| [0.5,0.300000012,0.806999981,0.100000001] | 0.7225800156593323 |
| [-0.5,-0.0299999993,-0.100000001,0.860000014] | 0.2625800371170044 |
+---------------------------------------------------|---------------------+
Using DOT_PRODUCT (<*>) in Filters, Joins, and Ordering
DOT_PRODUCT
(<*>
)can appear wherever a floating-point expression can be used in a query.
This query uses DOT_PRODUCT
(<*>
) as a filter in a WHERE
clause. The query uses the vectors
table and the @query_vec
created above.
SET vector_type_project_format = JSON;
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);
SELECT vec
FROM vectors
WHERE vec <*> @query_vec > 0.7;
+---------------------------------------------------+
| vec |
+---------------------------------------------------+
| [0.449999988,0.550000012,0.495000005,0.5] |
| [0.5,0.300000012,0.806999981,0.100000001] |
| [0.100000001,0.800000012,0.200000003,0.555000007] |
+---------------------------------------------------+
The following query uses DOT_PRODUCT
(<*>
) in the SELECT
clause, names it "score," and then filters on the score in the WHERE
clause. Note that in contrast to the previous query, the result of this query includes both the vector and the score, or DOT_PRODUCT
, of that vector with respect to @query_vec
.
SET vector_type_project_format = JSON;
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);
SELECT vec, vec <*> @query_vec AS score
FROM vectors
WHERE score > 0.7
ORDER BY score DESC;
+---------------------------------------------------|---------------------+
| vec | score |
+---------------------------------------------------|---------------------+
| [0.449999988,0.550000012,0.495000005,0.5] | 0.9810000061988831 |
| [0.100000001,0.800000012,0.200000003,0.555000007] | 0.8993000388145447 |
| [0.5,0.300000012,0.806999981,0.100000001] | 0.7225800156593323 |
+---------------------------------------------------|---------------------+
You can find the vectors in the vectors
table that are the most similar to @query_vec
using ORDER BY … LIMIT
queries, as is shown in the query below. This type of query is known as a Top-N query. See Working with Vector Data for additional information on vector similarity search and approximate nearest neighbor (ANN) search in SingleStore.
SET vector_type_project_format = JSON;
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);
SELECT vec, vec <*> @query_vec AS score
FROM vectors
ORDER BY score DESC
LIMIT 2;
+---------------------------------------------------|---------------------+
| vec | score |
+---------------------------------------------------|---------------------+
| [0.449999988,0.550000012,0.495000005,0.5] | 0.9810000061988831 |
| [0.100000001,0.800000012,0.200000003,0.555000007] | 0.8993000388145447 |
+---------------------------------------------------|---------------------+
Finally, DOT_PRODUCT
(<*>
) can even be used in cross products and joins – both arguments to it can be table fields or derived from table fields. The query below creates a second table of vectors (vectors_2
) and joins with the original table (vectors
).
CREATE TABLE vectors_2 (id_2 int, vec_2 VECTOR(4) not null);
INSERT INTO vectors_2 VALUES (5, '[0.4, 0.49, 0.16, 0.555]');
INSERT INTO vectors_2 VALUES (6, '[-0.01, -0.1, -0.2, 0.975]');
SELECT v1.id, v2.id_2, v1.vec <*> v2.vec_2 AS score
FROM vectors v1, vectors_2 v2
WHERE v1.vec <*> v2.vec_2 > 0.7
ORDER BY score DESC;
+------+------+--------------------+
| id | id_2 | score |
+------+------+--------------------+
| 3 | 6 | 0.8665000200271606 |
| 1 | 5 | 0.8062000274658203 |
| 2 | 5 | 0.7720249891281128 |
+------+------+--------------------+
Automatic Type Conversions
As described in Remarks, in some cases, SingleStore will do automatic type conversions between JSON strings, BLOB
s, and VECTOR
s. Three examples of those automatic type conversions are provided below.
Note
The examples below use the dot product infix operator <*>
; all the functionality shown also works with the DOT_PRODUCT
function.
Use the vectors
table created above. Recall this table has an attribute of type VECTOR
that holds vectors of length 4 with element type of 32-bit floating-point (F32
).
The following SQL searches for vectors that are similar to the vector '[0.44, 0.554, 0.34, 0.62]'
. The database will detect that the left-hand argument to <*>
is a vector of length 4 with element-type F32
and will convert the JSON string (the right-hand argument) to that vector type.
SET vector_type_project_format = JSON;
SELECT vec, vec <*> '[0.44, 0.554, 0.34, 0.62]' AS score
FROM vectors
ORDER BY score DESC;
+---------------------------------------------------+--------------------+
| vec | score |
+---------------------------------------------------+--------------------+
| [0.449999988,0.550000012,0.495000005,0.5] | 0.9810000061988831 |
| [0.100000001,0.800000012,0.200000003,0.555000007] | 0.8993000388145447 |
| [0.5,0.300000012,0.806999981,0.100000001] | 0.7225800156593323 |
| [-0.5,-0.0299999993,-0.100000001,0.860000014] | 0.2625800371170044 |
+---------------------------------------------------+--------------------+
Note: In the above example, the JSON string is directly included in the SELECT
clause as a constant. Including the JSON directly in the SELECT
clause is fine for queries that are run once. However, if this is a query that will be run multiple times, SingleStore recommends casting the JSON string to a VECTOR
and using that VECTOR
in the query as shown in Example 3 below.
Create a table of vectors of length 3 and element type 16-bit integer (I16
) and insert data into that table.
CREATE TABLE vectors_i16(id INT, vec VECTOR(3, I16));
INSERT INTO vectors_i16 VALUES(1, '[1, 2, 3]');
INSERT INTO vectors_i16 VALUES(2, '[4, 5, 6]');
INSERT INTO vectors_i16 VALUES(3, '[1, 4, 8]');
The following SQL calculates the dot product between the @query_vec
and the vectors in the vectors_i16
table. The database will detect that the left-hand argument to <*>
in the SELECT
clause is a vector of length 3 with element-type 16-bit integer and will convert the JSON string (the right-hand argument) to that vector type.
SELECT id, '[3, 2, 1]' <*> vectors_i16.vec AS score
FROM vectors_i16
ORDER BY score DESC;
+------+-------+
| id | score |
+------+-------+
| 2 | 28 |
| 3 | 19 |
| 1 | 10 |
+------+-------+
Example 3 - VECTOR(4, F32) <*> BLOB
Create a table with a BLOB
column type to store the vectors and use the JSON_ARRAY_PACK()
built-in function to easily insert properly formatted vectors.
CREATE TABLE vectors_b (id int, vec BLOB not null);
INSERT INTO vectors_b VALUES (1, JSON_ARRAY_PACK('[0.1, 0.8, 0.2, 0.555]'));
INSERT INTO vectors_b VALUES (2, JSON_ARRAY_PACK('[0.45, 0.55, 0.495, 0.5]'));
The following SQL calculates the dot product between the @query_vec
and the vectors stored as BLOB
s in the vectors_b
table. The database will detect that the left-hand argument to <*>
in the SELECT
clause is a vector of length 4 with element-type F32
and will convert the BLOB
(the right-hand argument) to that vector type.
SET @query_vec = '[0.44, 0.554, 0.34, 0.62]':>VECTOR(4);
SELECT id, @query_vec <*> vectors_b.vec AS score
FROM vectors_b
ORDER BY score DESC;
+------+--------------------+
| id | score |
+------+--------------------+
| 2 | 0.9810000061988831 |
| 1 | 0.8993000388145447 |
+------+--------------------+
Important
When using vectors stored as BLOB
s it is important to ensure that the BLOB
s store vectors of the same length and element type as the query vector with which you are calculating the dot product. The system verifies that the length of the BLOB
matches the length expected for the VECTOR
based on the number of elements and element type of the VECTOR
; however, the system cannot check that the element types used in the BLOB
and the VECTOR
are the same.
Elements in the VECTOR
data type are stored as floating-point numbers when the F32
(default) or F64
element types are used. As such, some vector element inputs are not perfectly representable in floating-point which may cause inexact results.
Note
It is not advisable to directly compare floating-point values for equality.
Consider the example below which selects the DOT_PRODUCT
of @query_vec
with itself. The result under exact arithmetic would be 1.0, as the DOT_PRODUCT
of a vector with itself is 1.0.
SET @query_vec = '[0.44, 0.554, 0.34, 0.62]':>VECTOR(4);
SELECT @query_vec <*> @query_vec AS DotProduct;
+---------------------+
| DotProduct |
+---------------------+
| 1.0005160570144653 |
+---------------------+
The output is not 1.0 due to the finite precision in floating-point arithmetic. The input vector values are not perfectly representable in floating-point, thus the output is close to, but not exactly, 1.0.
Using DOT_PRODUCT with Vectors as BLOBsThe following example shows the use of DOT_PRODUCT()
to calculate the similarity between a query vector and a set of vectors in a table with the vectors stored as BLOB
s.
Create a table with a column of type BLOB
to store the vectors. The second column in this table, with column name vec
and type BLOB
, will store the vectors. This example demonstrates storing vector data using BLOB
s, hence the column of type BLOB
named vec
.
Then insert data using the JSON_ARRAY_PACK()
built-in function to easily insert properly formatted vectors.
CREATE TABLE vectors_b (id int, vec BLOB not null);
INSERT INTO vectors_b VALUES (1, JSON_ARRAY_PACK('[0.1, 0.8, 0.2, 0.555]'));
INSERT INTO vectors_b VALUES (2, JSON_ARRAY_PACK('[0.45, 0.55, 0.495, 0.5]'));
To demonstrate the contents of the table, use the JSON_ARRAY_UNPACK()
function to return the table elements in JSON format:
SELECT JSON_ARRAY_UNPACK(vec) FROM vectors_b;
+---------------------------------------------------+
| JSON_ARRAY_UNPACK(vec) |
+---------------------------------------------------+
| [0.449999988,0.550000012,0.495000005,0.5] |
| [0.100000001,0.800000012,0.200000003,0.555000007] |
+---------------------------------------------------+
You can also use the HEX()
built-in function to return a printable form of the binary data:
SELECT HEX(vec) FROM vectors_b;
+----------------------------------+
| HEX(vec) |
+----------------------------------+
| 6666E63ECDCC0C3FA470FD3E0000003F |
| CDCCCC3DCDCC4C3FCDCC4C3E7B140E3F |
+----------------------------------+
Query the table using a DOT_PRODUCT()
in a SELECT
statement.
The SQL below sets up query vector (@query_vec
) and then calculates the EUCLIDEAN_DISTANCE
of the query vector and the vectors in the vectors_b
table.
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');
SELECT DOT_PRODUCT(vec, @query_vec) AS score
FROM vectors_b
ORDER BY score DESC;
+---------------------+
| score |
+---------------------+
| 0.9810000061988831 |
| 0.8993000388145447 |
+---------------------+
DOT_PRODUCT() with JSON_ARRAY_PACK()
The JSON_ARRAY_PACK()
function makes it easier to input properly-formatted vectors. JSON_ARRAY_PACK()
should be used when loading vectors into tables as is shown in the example below. That is, vectors should be formatted with JSON_ARRAY_PACK()
at the time they are loaded into a table so that the data stored in the BLOB
attribute in the table is in packed binary format. SingleStore does not recommend storing vectors as JSON strings in tables, doing so will have a negative performance impact.
JSON_ARRAY_PACK()
should not normally be used as an argument to the EUCLIDEAN_DISTANCE
function except when JSON_ARRAY_PACK()
is being used to build a constant vector value as is shown in the query below.
SELECT id, DOT_PRODUCT(
JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]'), vec) AS score
FROM vectors_b
ORDER BY score DESC;
+-----|---------------------+
| id | score |
+-----|---------------------+
| 2 | 0.9810000061988831 |
| 1 | 0.8993000388145447 |
+-----|---------------------+
Note
It is not recommended to use the DOT_PRODUCT
infix operator (<*>
) with vectors stored as BLOB
s. When used with BLOB
data, the infix operator (<*>
) will interpret vector elements as 32-bit floating-point numbers.
The default element type for vector storage and processing is 32-bit floating point (F32
). However, other element types are supported.
You can specify the datatype of the vector elements to be used in the operation by adding a suffix to the function. All operations are done using the specified datatype. Omitting the suffix from the function is equivalent to suffixing it with _F32
.
When using a suffix, the return type will be the type specified by the suffix.
Note
The functions with suffixes do not work with the VECTOR
type.
The following table lists the suffixes and their data type.
Suffix
Data Type
_I8
8-bit signed integer
_I16
16-bit signed integer
_I32
32-bit signed integer
_I64
64-bit signed integer
_F32
32-bit floating-point number (IEEE standard format)
_F64
64-bit floating-point number (IEEE standard format)
DOT_PRODUCT on BLOBs with 16-bit IntegersBelow is an example of using JSON_ARRAY_PACK
and DOT_PRODUCT
with 16-bit signed integers.
CREATE TABLE vectors_b_i (id int, vec BLOB not null);
INSERT INTO vectors_b_i VALUES (1, JSON_ARRAY_PACK_I16('[1, 3, 2, 5]'));
INSERT INTO vectors_b_i VALUES(2, JSON_ARRAY_PACK_I16('[23, 4, 1, 8]'));
SET @query_vec = JSON_ARRAY_PACK_I16('[4, 5, 4, 2]');
SELECT
DOT_PRODUCT_I16(@query_vec, vec) as DotProduct
FROM vectors_b_i
ORDER BY DotProduct DESC;
+------|-------------+
| id | DotProduct |
+------|-------------+
| 2 | 132 |
| 1 | 37 |
+------|-------------+
The result is an integer as indicated by the _I16
suffix.
When using suffixed versions of DOT_PRODUCT
, the return type will be the type of the suffix.
Note
Be sure that the suffixes you use to pack the vector data match the suffixes you use to unpack the data and the suffixes you use on functions to process that data.
Formatting Binary Vector Data for BLOBsWhen using the BLOB
type for vector operations, vector data can be formatted using JSON_ARRAY_PACK
. If your vector data is already in a packed binary format, you can load that data into the BLOB
s. The data must be encoded as a BLOB
containing packed numbers in little-endian byte order. Vectors stored as BLOB
s can be of any length; however, the input blob length must be divisible by the size of the packed vector elements (1, 2, 4 , or 8 bytes, depending on the vector element).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4