The Upstash Vector Python client
Note
This project is in GA Stage.
The Upstash Professional Support fully covers this project. It receives regular updates, and bug fixes. The Upstash team is committed to maintaining and improving its functionality.
Install a released version from pip:
pip3 install upstash-vector
In order to use this client, head out to Upstash Console and create a vector database. There, get the UPSTASH_VECTOR_REST_URL
and the UPSTASH_VECTOR_REST_TOKEN
from the dashboard.
from upstash_vector import Index index = Index(url=UPSTASH_VECTOR_REST_URL, token=UPSTASH_VECTOR_REST_TOKEN)
or alternatively, initialize from the environment variables
export UPSTASH_VECTOR_REST_URL [URL] export UPSTASH_VECTOR_REST_TOKEN [TOKEN]
from upstash_vector import Index index = Index.from_env()
Vectors can be upserted(inserted or updated) into a namespace of an index to be later queried or fetched.
There are a couple of ways of doing upserts:
# - dense indexes # - (id, vector, metadata, data) # - (id, vector, metadata) # - (id, vector) index.upsert( vectors=[ ("id1", [0.1, 0.2], {"metadata_field": "metadata_value"}, "data-value"), ("id2", [0.2, 0.2], {"metadata_field": "metadata_value"}), ("id3", [0.3, 0.4]), ] ) # - sparse indexes # - (id, sparse_vector, metadata, data) # - (id, sparse_vector, metadata) # - (id, sparse_vector) index.upsert( vectors=[ ("id1", ([0, 1], [0.1, 0.2]), {"metadata_field": "metadata_value"}, "data-value"), ("id2", ([1, 2], [0.2, 0.2]), {"metadata_field": "metadata_value"}), ("id3", ([2, 3, 4], [0.3, 0.4, 0.5])), ] ) # - hybrid indexes # - (id, vector, sparse_vector, metadata, data) # - (id, vector, sparse_vector, metadata) # - (id, vector, sparse_vector) index.upsert( vectors=[ ("id1", [0.1, 0.2], ([0, 1], [0.1, 0.2]), {"metadata_field": "metadata_value"}, "data-value"), ("id2", [0.2, 0.2], ([1, 2], [0.2, 0.2]), {"metadata_field": "metadata_value"}), ("id3", [0.3, 0.4], ([2, 3, 4], [0.3, 0.4, 0.5])), ] )
# - dense indexes # - {"id": id, "vector": vector, "metadata": metadata, "data": data) # - {"id": id, "vector": vector, "metadata": metadata) # - {"id": id, "vector": vector, "data": data) # - {"id": id, "vector": vector} index.upsert( vectors=[ {"id": "id4", "vector": [0.1, 0.2], "metadata": {"field": "value"}, "data": "value"}, {"id": "id5", "vector": [0.1, 0.2], "metadata": {"field": "value"}}, {"id": "id6", "vector": [0.1, 0.2], "data": "value"}, {"id": "id7", "vector": [0.5, 0.6]}, ] ) # - sparse indexes # - {"id": id, "sparse_vector": sparse_vector, "metadata": metadata, "data": data) # - {"id": id, "sparse_vector": sparse_vector, "metadata": metadata) # - {"id": id, "sparse_vector": sparse_vector, "data": data) # - {"id": id, "sparse_vector": sparse_vector} index.upsert( vectors=[ {"id": "id4", "sparse_vector": ([0, 1], [0.1, 0.2]), "metadata": {"field": "value"}, "data": "value"}, {"id": "id5", "sparse_vector": ([1, 2], [0.2, 0.2]), "metadata": {"field": "value"}}, {"id": "id6", "sparse_vector": ([2, 3, 4], [0.3, 0.4, 0.5]), "data": "value"}, {"id": "id7", "sparse_vector": ([4], [0.3])}, ] ) # - hybrid indexes # - {"id": id, "vector": vector, "sparse_vector": sparse_vector, "metadata": metadata, "data": data) # - {"id": id, "vector": vector, "sparse_vector": sparse_vector, "metadata": metadata) # - {"id": id, "vector": vector, "sparse_vector": sparse_vector, "data": data) # - {"id": id, "vector": vector, "sparse_vector": sparse_vector} index.upsert( vectors=[ {"id": "id4", "vector": [0.1, 0.2], "sparse_vector": ([0], [0.1]), "metadata": {"field": "value"}, "data": "value"}, {"id": "id5", "vector": [0.1, 0.2], "sparse_vector": ([1, 2], [0.2, 0.2]), "metadata": {"field": "value"}}, {"id": "id6", "vector": [0.1, 0.2], "sparse_vector": ([2, 3, 4], [0.3, 0.4, 0.5]), "data": "value"}, {"id": "id7", "vector": [0.5, 0.6], "sparse_vector": ([4], [0.3])}, ] )
from upstash_vector import Vector from upstash_vector.types import SparseVector # dense indexes index.upsert( vectors=[ Vector(id="id5", vector=[1, 2], metadata={"field": "value"}, data="value"), Vector(id="id6", vector=[1, 2], metadata={"field": "value"}), Vector(id="id7", vector=[1, 2], data="value"), Vector(id="id8", vector=[6, 7]), ] ) # sparse indexes index.upsert( vectors=[ Vector(id="id5", sparse_vector=SparseVector([1], [0.1]), metadata={"field": "value"}, data="value"), Vector(id="id6", sparse_vector=SparseVector([1, 2], [0.1, 0.2]), metadata={"field": "value"}), Vector(id="id7", sparse_vector=SparseVector([3, 5], [0.3, 0.3]), data="value"), Vector(id="id8", sparse_vector=SparseVector([4], [0.2])), ] ) # hybrid indexes index.upsert( vectors=[ Vector(id="id5", vector=[1, 2], sparse_vector=SparseVector([1], [0.1]), metadata={"field": "value"}, data="value"), Vector(id="id6", vector=[1, 2], sparse_vector=SparseVector([1, 2], [0.1, 0.2]), metadata={"field": "value"}), Vector(id="id7", vector=[1, 2], sparse_vector=SparseVector([3, 5], [0.3, 0.3]), data="value"), Vector(id="id8", vector=[6, 7], sparse_vector=SparseVector([4], [0.2])), ] )
If the index is created with an embedding model, raw string data can be upserted. In this case, the data
field of the vector will also be set to the data
passed below, so that it can be accessed later.
from upstash_vector import Data res = index.upsert( vectors=[ Data(id="id5", data="Goodbye World", metadata={"field": "value"}), Data(id="id6", data="Hello World"), ] )
Also, a namespace can be specified to upsert vectors into it. When no namespace is provided, the default namespace is used.
index.upsert( vectors=[ ("id1", [0.1, 0.2]), ("id2", [0.3, 0.4]), ], namespace="ns", )
Some number of vectors that are approximately most similar to a given query vector can be requested from a namespace of an index.
res = index.query( vector=[0.6, 0.9], # for dense and hybrid indexes sparse_vector=([0, 1], [0.1, 0.1]), # for sparse and hybrid indexes top_k=5, include_vectors=False, include_metadata=True, include_data=True, filter="metadata_f = 'metadata_v'" ) # List of query results, sorted in the descending order of similarity for r in res: print( r.id, # The id used while upserting the vector r.score, # The similarity score of this vector to the query vector. Higher is more similar. r.vector, # The value of the vector, if requested (for dense and hybrid indexes). r.sparse, # The value of the sparse vector, if requested (for sparse and hybrid indexes). r.metadata, # The metadata of the vector, if requested and present. r.data, # The data of the vector, if requested and present. )
If the index is created with an embedding model, raw string data can be queried.
res = index.query( data="hello", top_k=5, include_vectors=False, include_metadata=True, include_data=True, )
When a filter is provided, query results are further narrowed down based on the vectors whose metadata matches with it.
See Metadata Filtering documentation for more information regarding the filter syntax.
Also, a namespace can be specified to query from. When no namespace is provided, the default namespace is used.
res = index.query( vector=[0.6, 0.9], top_k=5, namespace="ns", )
A set of vectors can be fetched from a namespace of an index.
res = index.fetch( ids=["id3", "id4"], include_vectors=False, include_metadata=True, include_data=True, ) # List of fetch results, one for each id passed for r in res: if not r: # Can be None, if there is no such vector with the given id continue print( r.id, # The id used while upserting the vector r.vector, # The value of the vector, if requested (for dense and hybrid indexes). r.sparse_vector, # The value of the sparse vector, if requested (for sparse and hybrid indexes). r.metadata, # The metadata of the vector, if requested and present. r.data, # The metadata of the vector, if requested and present. )
or, for singular fetch:
res = index.fetch( "id1", include_vectors=True, include_metadata=True, include_data=False, ) r = res[0] if r: # Can be None, if there is no such vector with the given id print( r.id, # The id used while upserting the vector r.vector, # The value of the vector, if requested (for dense and hybrid indexes). r.sparse_vector, # The value of the sparse vector, if requested (for sparse and hybrid indexes). r.metadata, # The metadata of the vector, if requested and present. r.data, # The metadata of the vector, if requested and present. )
Apart from the vector ids, vectors can also be fetched with an id prefix.
# Fetch all the vectors whose id starts with `id-1` res = index.fetch( prefix="id-1", include_vectors=False, include_metadata=True, include_data=True, )
Also, a namespace can be specified to fetch from. When no namespace is provided, the default namespace is used.
res = index.fetch( ids=["id3", "id4"], namespace="ns", )
The vectors upserted into a namespace of an index can be scanned in a page by page fashion.
# Scans the vectors 100 vector at a time, res = index.range( cursor="", # Start the scan from the beginning limit=100, include_vectors=False, include_metadata=True, include_data=True, ) while res.next_cursor != "": res = index.range( cursor=res.next_cursor, limit=100, include_vectors=False, include_metadata=True, include_data=True, ) for v in res.vectors: print( v.id, # The id used while upserting the vector v.vector, # The value of the vector, if requested (for dense and hybrid indexes). v.sparse_vector, # The value of the sparse vector, if requested (for sparse and hybrid indexes). v.metadata, # The metadata of the vector, if requested and present. v.data, # The data of the vector, if requested and present. )
Apart from that, vectors can also be ranged with an id prefix.
# Range over all the vectors whose id starts with `id-1` res = index.range( cursor="", prefix="id-1", limit=100, include_vectors=False, include_metadata=True, include_data=True, ) while res.next_cursor != "": res = index.range( cursor=res.next_cursor, prefix="id-1", limit=100, include_vectors=False, include_metadata=True, include_data=True, ) for v in res.vectors: print(v)
Also, a namespace can be specified to range from. When no namespace is provided, the default namespace is used.
res = index.range( cursor="", limit=100, namespace="ns", )
A list of vectors can be deleted from a namespace of index. If no such vectors with the given ids exist, this is no-op.
res = index.delete( ids=["id1", "id2"], ) # How many vectors are deleted out of the given ids. print(res.deleted)
or, for singular deletion:
res = index.delete( "id1", ) # 1 if the vector is deleted, 0 otherwise. print(res.deleted)
Apart from the vector ids, vectors can also be deleted with an id prefix or metadata filter.
# Delete all the vectors whose id starts with `id-0` index.delete( prefix="id-0", ) # Delete all the vectors whose metadata matches with the filter index.delete( filter="salary < 3000", )
Also, a namespace can be specified to delete from. When no namespace is provided, the default namespace is used.
res = index.delete( ids=["id1", "id2"], namespace="ns", )
Any combination of vector value, sparse vector value, data, or metadata can be updated.
res = index.update( "id1", metadata={"new_field": "new_value"}, ) print(res) # A boolean indicating whether the vector is updated or not.
Also, a namespace can be specified to update from. When no namespace is provided, the default namespace is used.
res = index.update( "id1", metadata={"new_field": "new_value"}, namespace="ns", )
All vectors can be removed from a namespace of an index.
Also, a namespace can be specified to reset. When no namespace is provided, the default namespace is used.
index.reset( namespace="ns", )
All namespaces under the index can be reset with a single call as well.
Some information regarding the status and type of the index can be requested. This information also contains per-namespace status.
info = index.info() print( info.vector_count, # Total number of vectors across all namespaces info.pending_vector_count, # Total number of vectors waiting to be indexed across all namespaces info.index_size, # Total size of the index on disk in bytes info.dimension, # Vector dimension info.similarity_function, # Similarity function used ) for ns, ns_info in info.namespaces.items(): print( ns, # Name of the namespace ns_info.vector_count, # Total number of vectors in this namespaces ns_info.pending_vector_count, # Total number of vectors waiting to be indexed in this namespaces )
All the names of active namespaces can be listed.
namespaces = index.list_namespaces() for ns in namespaces: print(ns) # name of the namespace
A namespace can be deleted entirely. If no such namespace exists, and exception is raised. The default namespaces cannot be deleted.
index.delete_namespace(namespace="ns")Preparing the environment
This project uses Poetry for packaging and dependency management. Make sure you are able to create the poetry shell with relevant dependencies.
You will also need a vector database on Upstash.
To run all the tests, make sure the poetry virtual environment activated with all the necessary dependencies.
Create four Vector Stores on Upstash. First one should have 2 dimensions. Second one should use an embedding model. Set the necessary environment variables:
URL=****
TOKEN=****
EMBEDDING_URL=****
EMBEDDING_TOKEN=****
SPARSE_URL=****
SPARSE_TOKEN=****
HYBRID_URL=****
HYBRID_TOKEN=****
HYBRID_EMBEDDING_URL=****
HYBRID_EMBEDDING_TOKEN=****
Then, run the following command to run tests:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4