RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/upstash/vector-py below:

upstash/vector-py: Upstash Vector Python SDK

Upstash Vector Python SDK

The Upstash Vector Python client

Note

This project is in GA Stage.

The Upstash Professional Support fully covers this project. It receives regular updates, and bug fixes. The Upstash team is committed to maintaining and improving its functionality.

Install a released version from pip:

pip3 install upstash-vector

In order to use this client, head out to Upstash Console and create a vector database. There, get the UPSTASH_VECTOR_REST_URL and the UPSTASH_VECTOR_REST_TOKEN from the dashboard.

from upstash_vector import Index

index = Index(url=UPSTASH_VECTOR_REST_URL, token=UPSTASH_VECTOR_REST_TOKEN)

or alternatively, initialize from the environment variables

export UPSTASH_VECTOR_REST_URL [URL]
export UPSTASH_VECTOR_REST_TOKEN [TOKEN]

from upstash_vector import Index

index = Index.from_env()

Vectors can be upserted(inserted or updated) into a namespace of an index to be later queried or fetched.

There are a couple of ways of doing upserts:

# - dense indexes
#   - (id, vector, metadata, data)
#   - (id, vector, metadata)
#   - (id, vector)
index.upsert(
    vectors=[
        ("id1", [0.1, 0.2], {"metadata_field": "metadata_value"}, "data-value"),
        ("id2", [0.2, 0.2], {"metadata_field": "metadata_value"}),
        ("id3", [0.3, 0.4]),
    ]
)

# - sparse indexes
#   - (id, sparse_vector, metadata, data)
#   - (id, sparse_vector, metadata)
#   - (id, sparse_vector)
index.upsert(
    vectors=[
        ("id1", ([0, 1], [0.1, 0.2]), {"metadata_field": "metadata_value"}, "data-value"),
        ("id2", ([1, 2], [0.2, 0.2]), {"metadata_field": "metadata_value"}),
        ("id3", ([2, 3, 4], [0.3, 0.4, 0.5])),
    ]
)

# - hybrid indexes
#   - (id, vector, sparse_vector, metadata, data)
#   - (id, vector, sparse_vector, metadata)
#   - (id, vector, sparse_vector)
index.upsert(
    vectors=[
        ("id1", [0.1, 0.2], ([0, 1], [0.1, 0.2]), {"metadata_field": "metadata_value"}, "data-value"),
        ("id2", [0.2, 0.2], ([1, 2], [0.2, 0.2]), {"metadata_field": "metadata_value"}),
        ("id3", [0.3, 0.4], ([2, 3, 4], [0.3, 0.4, 0.5])),
    ]
)

# - dense indexes
#   - {"id": id, "vector": vector, "metadata": metadata, "data": data)
#   - {"id": id, "vector": vector, "metadata": metadata)
#   - {"id": id, "vector": vector, "data": data)
#   - {"id": id, "vector": vector} 
index.upsert(
    vectors=[
        {"id": "id4", "vector": [0.1, 0.2], "metadata": {"field": "value"}, "data": "value"},
        {"id": "id5", "vector": [0.1, 0.2], "metadata": {"field": "value"}},
        {"id": "id6", "vector": [0.1, 0.2], "data": "value"},
        {"id": "id7", "vector": [0.5, 0.6]},
    ]
)

# - sparse indexes
#   - {"id": id, "sparse_vector": sparse_vector, "metadata": metadata, "data": data)
#   - {"id": id, "sparse_vector": sparse_vector, "metadata": metadata)
#   - {"id": id, "sparse_vector": sparse_vector, "data": data)
#   - {"id": id, "sparse_vector": sparse_vector} 
index.upsert(
    vectors=[
        {"id": "id4", "sparse_vector": ([0, 1], [0.1, 0.2]), "metadata": {"field": "value"}, "data": "value"},
        {"id": "id5", "sparse_vector": ([1, 2], [0.2, 0.2]), "metadata": {"field": "value"}},
        {"id": "id6", "sparse_vector": ([2, 3, 4], [0.3, 0.4, 0.5]), "data": "value"},
        {"id": "id7", "sparse_vector": ([4], [0.3])},
    ]
)

# - hybrid indexes
#   - {"id": id, "vector": vector, "sparse_vector": sparse_vector, "metadata": metadata, "data": data)
#   - {"id": id, "vector": vector, "sparse_vector": sparse_vector, "metadata": metadata)
#   - {"id": id, "vector": vector, "sparse_vector": sparse_vector, "data": data)
#   - {"id": id, "vector": vector, "sparse_vector": sparse_vector} 
index.upsert(
    vectors=[
        {"id": "id4", "vector": [0.1, 0.2], "sparse_vector": ([0], [0.1]), "metadata": {"field": "value"},
         "data": "value"},
        {"id": "id5", "vector": [0.1, 0.2], "sparse_vector": ([1, 2], [0.2, 0.2]), "metadata": {"field": "value"}},
        {"id": "id6", "vector": [0.1, 0.2], "sparse_vector": ([2, 3, 4], [0.3, 0.4, 0.5]), "data": "value"},
        {"id": "id7", "vector": [0.5, 0.6], "sparse_vector": ([4], [0.3])},
    ]
)

from upstash_vector import Vector
from upstash_vector.types import SparseVector

# dense indexes
index.upsert(
    vectors=[
        Vector(id="id5", vector=[1, 2], metadata={"field": "value"}, data="value"),
        Vector(id="id6", vector=[1, 2], metadata={"field": "value"}),
        Vector(id="id7", vector=[1, 2], data="value"),
        Vector(id="id8", vector=[6, 7]),
    ]
)

# sparse indexes
index.upsert(
    vectors=[
        Vector(id="id5", sparse_vector=SparseVector([1], [0.1]), metadata={"field": "value"}, data="value"),
        Vector(id="id6", sparse_vector=SparseVector([1, 2], [0.1, 0.2]), metadata={"field": "value"}),
        Vector(id="id7", sparse_vector=SparseVector([3, 5], [0.3, 0.3]), data="value"),
        Vector(id="id8", sparse_vector=SparseVector([4], [0.2])),
    ]
)

# hybrid indexes
index.upsert(
    vectors=[
        Vector(id="id5", vector=[1, 2], sparse_vector=SparseVector([1], [0.1]), metadata={"field": "value"},
               data="value"),
        Vector(id="id6", vector=[1, 2], sparse_vector=SparseVector([1, 2], [0.1, 0.2]), metadata={"field": "value"}),
        Vector(id="id7", vector=[1, 2], sparse_vector=SparseVector([3, 5], [0.3, 0.3]), data="value"),
        Vector(id="id8", vector=[6, 7], sparse_vector=SparseVector([4], [0.2])),
    ]
)

If the index is created with an embedding model, raw string data can be upserted. In this case, the data field of the vector will also be set to the data passed below, so that it can be accessed later.

from upstash_vector import Data

res = index.upsert(
    vectors=[
        Data(id="id5", data="Goodbye World", metadata={"field": "value"}),
        Data(id="id6", data="Hello World"),
    ]
)

Also, a namespace can be specified to upsert vectors into it. When no namespace is provided, the default namespace is used.

index.upsert(
    vectors=[
        ("id1", [0.1, 0.2]),
        ("id2", [0.3, 0.4]),
    ],
    namespace="ns",
)

Some number of vectors that are approximately most similar to a given query vector can be requested from a namespace of an index.

res = index.query(
    vector=[0.6, 0.9],  # for dense and hybrid indexes
    sparse_vector=([0, 1], [0.1, 0.1]),  # for sparse and hybrid indexes 
    top_k=5,
    include_vectors=False,
    include_metadata=True,
    include_data=True,
    filter="metadata_f = 'metadata_v'"
)

# List of query results, sorted in the descending order of similarity
for r in res:
    print(
        r.id,  # The id used while upserting the vector
        r.score,  # The similarity score of this vector to the query vector. Higher is more similar.
        r.vector,  # The value of the vector, if requested (for dense and hybrid indexes).
        r.sparse,  # The value of the sparse vector, if requested (for sparse and hybrid indexes).
        r.metadata,  # The metadata of the vector, if requested and present.
        r.data,  # The data of the vector, if requested and present.
    )

If the index is created with an embedding model, raw string data can be queried.

res = index.query(
    data="hello",
    top_k=5,
    include_vectors=False,
    include_metadata=True,
    include_data=True,
)

When a filter is provided, query results are further narrowed down based on the vectors whose metadata matches with it.

See Metadata Filtering documentation for more information regarding the filter syntax.

Also, a namespace can be specified to query from. When no namespace is provided, the default namespace is used.

res = index.query(
    vector=[0.6, 0.9],
    top_k=5,
    namespace="ns",
)

A set of vectors can be fetched from a namespace of an index.

res = index.fetch(
    ids=["id3", "id4"],
    include_vectors=False,
    include_metadata=True,
    include_data=True,
)

# List of fetch results, one for each id passed
for r in res:
    if not r:  # Can be None, if there is no such vector with the given id
        continue

    print(
        r.id,  # The id used while upserting the vector
        r.vector,  # The value of the vector, if requested (for dense and hybrid indexes).
        r.sparse_vector,  # The value of the sparse vector, if requested (for sparse and hybrid indexes).
        r.metadata,  # The metadata of the vector, if requested and present.
        r.data,  # The metadata of the vector, if requested and present.
    )

or, for singular fetch:

res = index.fetch(
    "id1",
    include_vectors=True,
    include_metadata=True,
    include_data=False,
)

r = res[0]
if r:  # Can be None, if there is no such vector with the given id
    print(
        r.id,  # The id used while upserting the vector
        r.vector,  # The value of the vector, if requested (for dense and hybrid indexes).
        r.sparse_vector,  # The value of the sparse vector, if requested (for sparse and hybrid indexes).        
        r.metadata,  # The metadata of the vector, if requested and present.
        r.data,  # The metadata of the vector, if requested and present.
    )

Apart from the vector ids, vectors can also be fetched with an id prefix.

# Fetch all the vectors whose id starts with `id-1`
res = index.fetch(
    prefix="id-1",
    include_vectors=False,
    include_metadata=True,
    include_data=True,
)

Also, a namespace can be specified to fetch from. When no namespace is provided, the default namespace is used.

res = index.fetch(
    ids=["id3", "id4"],
    namespace="ns",
)

The vectors upserted into a namespace of an index can be scanned in a page by page fashion.

# Scans the vectors 100 vector at a time,
res = index.range(
    cursor="",  # Start the scan from the beginning 
    limit=100,
    include_vectors=False,
    include_metadata=True,
    include_data=True,
)

while res.next_cursor != "":
    res = index.range(
        cursor=res.next_cursor,
        limit=100,
        include_vectors=False,
        include_metadata=True,
        include_data=True,
    )

    for v in res.vectors:
        print(
            v.id,  # The id used while upserting the vector
            v.vector,  # The value of the vector, if requested (for dense and hybrid indexes).
            v.sparse_vector,  # The value of the sparse vector, if requested (for sparse and hybrid indexes).
            v.metadata,  # The metadata of the vector, if requested and present.
            v.data,  # The data of the vector, if requested and present.
        )

Apart from that, vectors can also be ranged with an id prefix.

# Range over all the vectors whose id starts with `id-1`
res = index.range(
    cursor="",
    prefix="id-1",
    limit=100,
    include_vectors=False,
    include_metadata=True,
    include_data=True,
)

while res.next_cursor != "":
    res = index.range(
        cursor=res.next_cursor,
        prefix="id-1",
        limit=100,
        include_vectors=False,
        include_metadata=True,
        include_data=True,
    )

    for v in res.vectors:
        print(v)

Also, a namespace can be specified to range from. When no namespace is provided, the default namespace is used.

res = index.range(
    cursor="",
    limit=100,
    namespace="ns",
)

A list of vectors can be deleted from a namespace of index. If no such vectors with the given ids exist, this is no-op.

res = index.delete(
    ids=["id1", "id2"],
)

# How many vectors are deleted out of the given ids.
print(res.deleted)

or, for singular deletion:

res = index.delete(
    "id1",
)

# 1 if the vector is deleted, 0 otherwise.
print(res.deleted)

Apart from the vector ids, vectors can also be deleted with an id prefix or metadata filter.

# Delete all the vectors whose id starts with `id-0`
index.delete(
    prefix="id-0",
)

# Delete all the vectors whose metadata matches with the filter
index.delete(
    filter="salary < 3000",
)

Also, a namespace can be specified to delete from. When no namespace is provided, the default namespace is used.

res = index.delete(
    ids=["id1", "id2"],
    namespace="ns",
)

Any combination of vector value, sparse vector value, data, or metadata can be updated.

res = index.update(
    "id1",
    metadata={"new_field": "new_value"},
)

print(res)  # A boolean indicating whether the vector is updated or not.

Also, a namespace can be specified to update from. When no namespace is provided, the default namespace is used.

res = index.update(
    "id1",
    metadata={"new_field": "new_value"},
    namespace="ns",
)

All vectors can be removed from a namespace of an index.

Also, a namespace can be specified to reset. When no namespace is provided, the default namespace is used.

index.reset(
    namespace="ns",
)

All namespaces under the index can be reset with a single call as well.

Some information regarding the status and type of the index can be requested. This information also contains per-namespace status.

info = index.info()
print(
    info.vector_count,  # Total number of vectors across all namespaces
    info.pending_vector_count,  # Total number of vectors waiting to be indexed across all namespaces
    info.index_size,  # Total size of the index on disk in bytes
    info.dimension,  # Vector dimension
    info.similarity_function,  # Similarity function used
)

for ns, ns_info in info.namespaces.items():
    print(
        ns,  # Name of the namespace
        ns_info.vector_count,  # Total number of vectors in this namespaces
        ns_info.pending_vector_count,  # Total number of vectors waiting to be indexed in this namespaces
    )

All the names of active namespaces can be listed.

namespaces = index.list_namespaces()
for ns in namespaces:
    print(ns)  # name of the namespace

A namespace can be deleted entirely. If no such namespace exists, and exception is raised. The default namespaces cannot be deleted.

index.delete_namespace(namespace="ns")

Preparing the environment

This project uses Poetry for packaging and dependency management. Make sure you are able to create the poetry shell with relevant dependencies.

You will also need a vector database on Upstash.

To run all the tests, make sure the poetry virtual environment activated with all the necessary dependencies.

Create four Vector Stores on Upstash. First one should have 2 dimensions. Second one should use an embedding model. Set the necessary environment variables:

A dense index with 2 dimensions, with cosine similarity
A dense index with an embedding model
A sparse index
A hybrid index with 2 dimensions, with cosine similarity for the dense component.
A hybrid index with embedding models

URL=****
TOKEN=****
EMBEDDING_URL=****
EMBEDDING_TOKEN=****
SPARSE_URL=****
SPARSE_TOKEN=****
HYBRID_URL=****
HYBRID_TOKEN=****
HYBRID_EMBEDDING_URL=****
HYBRID_EMBEDDING_TOKEN=****

Then, run the following command to run tests:

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4