This notebook covers how to get started with the CloudflareVectorize vector store.
SetupβThis Python package is a wrapper around Cloudflare's REST API. To interact with the API, you need to provide an API token with the appropriate privileges.
You can create and manage API tokens here:
https://dash.cloudflare.com/YOUR-ACCT-NUMBER/api-tokens
CredentialsβCloudflareVectorize depends on WorkersAI (if you want to use it for Embeddings), and D1 (if you are using it to store and retrieve raw values).
While you can create a single api_token
with Edit privileges to all needed resources (WorkersAI, Vectorize & D1), you may want to follow the principle of "least privilege access" and create separate API tokens for each service
Note: These service-specific tokens (if provided) will take preference over a global token. You could provide these instead of a global token.
You can also leave these parameters as environmental variables.
import os
from dotenv import load_dotenv
load_dotenv(".env")
cf_acct_id = os.getenv("CF_ACCOUNT_ID")
api_token = os.getenv("CF_API_TOKEN")
cf_vectorize_token = os.getenv("CF_VECTORIZE_API_TOKEN")
cf_d1_token = os.getenv("CF_D1_API_TOKEN")
Initializationβ
import asyncio
import json
import uuid
import warnings
from langchain_cloudflare.embeddings import (
CloudflareWorkersAIEmbeddings,
)
from langchain_cloudflare.vectorstores import (
CloudflareVectorize,
)
from langchain_community.document_loaders import WikipediaLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
warnings.filterwarnings("ignore")
vectorize_index_name = f"test-langchain-{uuid.uuid4().hex}"
Embeddingsβ
For storage of embeddings, semantic search and retrieval, you must embed your raw values as embeddings. Specify an embedding model, one available on WorkersAI
https://developers.cloudflare.com/workers-ai/models/
MODEL_WORKERSAI = "@cf/baai/bge-large-en-v1.5"
cf_ai_token = os.getenv(
"CF_AI_API_TOKEN"
)
embedder = CloudflareWorkersAIEmbeddings(
account_id=cf_acct_id, api_token=cf_ai_token, model_name=MODEL_WORKERSAI
)
Raw Values with D1β
Vectorize only stores embeddings, metadata and namespaces. If you want to store and retrieve raw values, you must leverage Cloudflare's SQL Database D1.
You can create a database here and retrieve its id:
[https://dash.cloudflare.com/YOUR-ACCT-NUMBER/workers/d1
d1_database_id = os.getenv("CF_D1_DATABASE_ID")
CloudflareVectorize Classβ
Now we can create the CloudflareVectorize instance. Here we passed:
embedding
instance from earliercfVect = CloudflareVectorize(
embedding=embedder,
account_id=cf_acct_id,
d1_api_token=cf_d1_token,
vectorize_api_token=cf_vectorize_token,
d1_database_id=d1_database_id,
)
Cleanupβ
Before we get started, let's delete any test-langchain*
indexes we have for this walkthrough
arr_indexes = cfVect.list_indexes()
arr_indexes = [x for x in arr_indexes if "test-langchain" in x.get("name")]
arr_async_requests = [
cfVect.adelete_index(index_name=x.get("name")) for x in arr_indexes
]
await asyncio.gather(*arr_async_requests);
Gotchyasβ
D1 Database ID provided but no "global" api_token
and no d1_api_token
try:
cfVect = CloudflareVectorize(
embedding=embedder,
account_id=cf_acct_id,
ai_api_token=cf_ai_token,
vectorize_api_token=cf_vectorize_token,
d1_database_id=d1_database_id,
)
except Exception as e:
print(str(e))
`d1_database_id` provided, but no global `api_token` provided and no `d1_api_token` provided.
Manage Vector Storeβ Creating an Indexβ
Let's start off this example by creating and index (and first deleting if it exists). If the index doesn't exist we will get a an error from Cloudflare telling us so.
%%capture
try:
cfVect.delete_index(index_name=vectorize_index_name, wait=True)
except Exception as e:
print(e)
r = cfVect.create_index(
index_name=vectorize_index_name, description="A Test Vectorize Index", wait=True
)
print(r)
{'created_on': '2025-05-13T05:38:04.487284Z', 'modified_on': '2025-05-13T05:38:04.487284Z', 'name': 'test-langchain-5c177bb404f74d438c916462ad73d27a', 'description': 'A Test Vectorize Index', 'config': {'dimensions': 1024, 'metric': 'cosine'}}
Listing Indexesβ
Now, we can list our indexes on our account
indexes = cfVect.list_indexes()
indexes = [x for x in indexes if "test-langchain" in x.get("name")]
print(indexes)
[{'created_on': '2025-05-13T05:38:04.487284Z', 'modified_on': '2025-05-13T05:38:04.487284Z', 'name': 'test-langchain-5c177bb404f74d438c916462ad73d27a', 'description': 'A Test Vectorize Index', 'config': {'dimensions': 1024, 'metric': 'cosine'}}]
Get Index Infoβ
We can also get certain indexes and retrieve more granular information about an index.
This call returns a processedUpToMutation
which can be used to track the status of operations such as creating indexes, adding or deleting records.
r = cfVect.get_index_info(index_name=vectorize_index_name)
print(r)
{'dimensions': 1024, 'vectorCount': 0}
Adding Metadata Indexesβ
It is common to assist retrieval by supplying metadata filters in quereies. In Vectorize, this is accomplished by first creating a "metadata index" on your Vectorize Index. We will do so for our example by creating one on the section
field in our documents.
Reference: https://developers.cloudflare.com/vectorize/reference/metadata-filtering/
r = cfVect.create_metadata_index(
property_name="section",
index_type="string",
index_name=vectorize_index_name,
wait=True,
)
print(r)
{'mutationId': '7fc5f849-4d35-420c-bb3f-b950a79e48b6'}
Listing Metadata Indexesβ
r = cfVect.list_metadata_indexes(index_name=vectorize_index_name)
print(r)
[{'propertyName': 'section', 'indexType': 'String'}]
Adding Documentsβ
For this example, we will use LangChain's Wikipedia loader to pull an article about Cloudflare. We will store this in Vectorize and query its contents later.
docs = WikipediaLoader(query="Cloudflare", load_max_docs=2).load()
We will then create some simple chunks with metadata based on the chunk sections.
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=100,
chunk_overlap=20,
length_function=len,
is_separator_regex=False,
)
texts = text_splitter.create_documents([docs[0].page_content])
running_section = ""
for idx, text in enumerate(texts):
if text.page_content.startswith("="):
running_section = text.page_content
running_section = running_section.replace("=", "").strip()
else:
if running_section == "":
text.metadata = {"section": "Introduction"}
else:
text.metadata = {"section": running_section}
print(len(texts))
print(texts[0], "\n\n", texts[-1])
55
page_content='Cloudflare, Inc., is an American company that provides content delivery network services,' metadata={'section': 'Introduction'}
page_content='attacks, Cloudflare ended up being attacked as well; Google and other companies eventually' metadata={'section': 'DDoS mitigation'}
Now we will add documents to our Vectorize Index.
Note: Adding embeddings to Vectorize happens asyncronously
, meaning there will be a small delay between adding the embeddings and being able to query them. By default add_documents
has a wait=True
parameter which waits for this operation to complete before returning a response. If you do not want the program to wait for embeddings availability, you can set this to wait=False
.
r = cfVect.add_documents(index_name=vectorize_index_name, documents=texts, wait=True)
print(json.dumps(r)[:300])
["433a614a-2253-4c54-951f-0e40379a52c4", "608a9cb6-ab71-4e5c-8831-ebedeb9749e8", "40a0eead-a781-46a7-a6a3-1940ec57c9ae", "64081e01-12d1-4760-9b3c-84ee1e4ba199", "af465fb9-9e3b-49a6-b00a-6a9eec4fc623", "2898e362-b667-46ab-ac20-651d8e13f2bf", "a2c0095b-2cbc-4724-bbcb-86cd702bfa84", "cc659763-37cb-42cb
Query vector storeβ
We will do some searches on our embeddings. We can specify our search query
and the top number of results we want with k
.
query_documents = cfVect.similarity_search(
index_name=vectorize_index_name, query="Workers AI", k=100, return_metadata="none"
)
print(f"{len(query_documents)} results:\n{query_documents[:3]}")
55 results:
[Document(id='24405ae0-c125-4177-a1c2-8b1849c13ab7', metadata={}, page_content="In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within"), Document(id='ca33b19e-4e28-4e1b-8ed7-94f133dbf8a7', metadata={}, page_content='based on queries by leveraging Workers AI.Cloudflare announced plans in September 2024 to launch a'), Document(id='14602058-73fe-4307-a1c2-95956d6392ad', metadata={}, page_content='=== Artificial intelligence ===')]
Outputβ
If you want to return metadata you can pass return_metadata="all" | 'indexed'
. The default is all
.
If you want to return the embeddings values, you can pass return_values=True
. The default is False
. Embeddings will be returned in the metadata
field under the special _values
field.
Note: return_metadata="none"
and return_values=True
will return only ther _values
field in metadata
.
Note: If you return metadata or values, the results will be limited to the top 20.
https://developers.cloudflare.com/vectorize/platform/limits/
query_documents = cfVect.similarity_search(
index_name=vectorize_index_name,
query="Workers AI",
return_values=True,
return_metadata="all",
k=100,
)
print(f"{len(query_documents)} results:\n{str(query_documents[0])[:500]}")
20 results:
page_content='In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within' metadata={'section': 'Artificial intelligence', '_values': [0.014350891, 0.0053482056, -0.022354126, 0.002948761, 0.010406494, -0.016067505, -0.002029419, -0.023513794, 0.020141602, 0.023742676, 0.01361084, 0.003019333, 0.02748108, -0.023162842, 0.008979797, -0.029373169, -0.03643799, -0.03842163, -0.004463196, 0.021255493, 0.02192688, -0.005947113, -0.060272217, -0.055389404, -0.031188965
If you'd like the similarity scores
to be returned, you can use similarity_search_with_score
query_documents = cfVect.similarity_search_with_score(
index_name=vectorize_index_name,
query="Workers AI",
k=100,
return_metadata="all",
)
print(f"{len(query_documents)} results:\n{str(query_documents[0])[:500]}")
20 results:
(Document(id='24405ae0-c125-4177-a1c2-8b1849c13ab7', metadata={'section': 'Artificial intelligence'}, page_content="In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within"), 0.7851709)
Including D1 for "Raw Values"β
All of the add
and search
methods on CloudflareVectorize support a include_d1
parameter (default=True).
This is to configure whether you want to store/retrieve raw values.
If you do not want to use D1 for this, you can set this to include=False
. This will return documents with an empty page_content
field.
Note: Your D1 table name MUST MATCH your vectorize index name! If you run 'create_index' and include_d1=True or cfVect(d1_database=...,) this D1 table will be created along with your Vectorize Index.
query_documents = cfVect.similarity_search_with_score(
index_name=vectorize_index_name,
query="california",
k=100,
return_metadata="all",
include_d1=False,
)
print(f"{len(query_documents)} results:\n{str(query_documents[0])[:500]}")
20 results:
(Document(id='64081e01-12d1-4760-9b3c-84ee1e4ba199', metadata={'section': 'Introduction'}, page_content=''), 0.60426825)
Query by turning into retrieverβ
You can also transform the vector store into a retriever for easier usage in your chains.
retriever = cfVect.as_retriever(
search_type="similarity",
search_kwargs={"k": 1, "index_name": vectorize_index_name},
)
r = retriever.get_relevant_documents("california")
Searching with Metadata Filteringβ
As mentioned before, Vectorize supports filtered search via filtered on indexes metadata fields. Here is an example where we search for Introduction
values within the indexed section
metadata field.
More info on searching on Metadata fields is here: https://developers.cloudflare.com/vectorize/reference/metadata-filtering/
query_documents = cfVect.similarity_search_with_score(
index_name=vectorize_index_name,
query="California",
k=100,
md_filter={"section": "Introduction"},
return_metadata="all",
)
print(f"{len(query_documents)} results:\n - {str(query_documents[:3])}")
6 results:
- [(Document(id='64081e01-12d1-4760-9b3c-84ee1e4ba199', metadata={'section': 'Introduction'}, page_content="and other services. Cloudflare's headquarters are in San Francisco, California. According to"), 0.60426825), (Document(id='608a9cb6-ab71-4e5c-8831-ebedeb9749e8', metadata={'section': 'Introduction'}, page_content='network services, cybersecurity, DDoS mitigation, wide area network services, reverse proxies,'), 0.52082914), (Document(id='433a614a-2253-4c54-951f-0e40379a52c4', metadata={'section': 'Introduction'}, page_content='Cloudflare, Inc., is an American company that provides content delivery network services,'), 0.50490546)]
You can do more sophisticated filtering as well
https://developers.cloudflare.com/vectorize/reference/metadata-filtering/#valid-filter-examples
query_documents = cfVect.similarity_search_with_score(
index_name=vectorize_index_name,
query="California",
k=100,
md_filter={"section": {"$ne": "Introduction"}},
return_metadata="all",
)
print(f"{len(query_documents)} results:\n - {str(query_documents[:3])}")
20 results:
- [(Document(id='daeb7891-ec00-4c09-aa73-fc8e9a226ca8', metadata={}, page_content='== Products =='), 0.56540567), (Document(id='8c91ed93-d306-4cf9-ad1e-157e90a01ddf', metadata={'section': 'History'}, page_content='Since at least 2017, Cloudflare has been using a wall of lava lamps in their San Francisco'), 0.5604333), (Document(id='1400609f-0937-4700-acde-6e770d2dbd12', metadata={'section': 'History'}, page_content='their San Francisco headquarters as a source of randomness for encryption keys, alongside double'), 0.55573463)]
query_documents = cfVect.similarity_search_with_score(
index_name=vectorize_index_name,
query="DNS",
k=100,
md_filter={"section": {"$in": ["Products", "History"]}},
return_metadata="all",
)
print(f"{len(query_documents)} results:\n - {str(query_documents)}")
20 results:
- [(Document(id='253a0987-1118-4ab2-a444-b8a50f0b4a63', metadata={'section': 'Products'}, page_content='protocols such as DNS over HTTPS, SMTP, and HTTP/2 with support for HTTP/2 Server Push. As of 2023,'), 0.7205538), (Document(id='112b61d1-6c34-41d6-a22f-7871bf1cca7b', metadata={'section': 'Products'}, page_content='utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content'), 0.58178145), (Document(id='36929a30-32a9-482a-add7-6c109bbf8f82', metadata={'section': 'Products'}, page_content='and a content distribution network to serve content across its network of servers. It supports'), 0.5797795), (Document(id='485ac8dc-c2ad-443a-90fc-8be9e004eaee', metadata={'section': 'History'}, page_content='the New York Stock Exchange under the stock ticker NET. It opened for public trading on September'), 0.5678468), (Document(id='1c7581d5-0b06-45d6-874c-554907f4f7f7', metadata={'section': 'Products'}, page_content='Cloudflare provides network and security products for consumers and businesses, utilizing edge'), 0.55722594), (Document(id='f2fd02ac-3bab-4565-a6e2-14d9963e8fd9', metadata={'section': 'History'}, page_content='Cloudflare has acquired web-services and security companies, including StopTheHacker (February'), 0.5558441), (Document(id='1315a8ff-6509-4350-ae84-21e11da282b3', metadata={'section': 'Products'}, page_content='Push. As of 2023, Cloudflare handles an average of 45 million HTTP requests per second.'), 0.55429655), (Document(id='f5b0c9d0-89c2-43ec-a9b7-5a5b376a5a85', metadata={'section': 'Products'}, page_content='It supports transport layer protocols TCP, UDP, QUIC, and many application layer protocols such as'), 0.54969466), (Document(id='cc659763-37cb-42cb-bf09-465df1b5bc1a', metadata={'section': 'History'}, page_content='Cloudflare was founded in July 2009 by Matthew Prince, Lee Holloway, and Michelle Zatlyn. Prince'), 0.54691005), (Document(id='b467348b-9a9b-4bf1-9104-27570891c9e4', metadata={'section': 'History'}, page_content='2019, Cloudflare submitted its S-1 filing for an initial public offering on the New York Stock'), 0.533554), (Document(id='7966591b-ff56-4346-aca8-341daece01fc', metadata={'section': 'History'}, page_content='Networks (March 2024), BastionZero (May 2024), and Kivera (October 2024).'), 0.53296596), (Document(id='c7657276-c631-4331-98ec-af308387ea49', metadata={'section': 'Products'}, page_content='Verizonβs October 2024 outage.'), 0.53137076), (Document(id='9418e10c-426b-45fa-a1a4-672074310890', metadata={'section': 'Products'}, page_content='Cloudflare also provides analysis and reports on large-scale outages, including Verizonβs October'), 0.53107977), (Document(id='db5507e2-0103-4275-a9f8-466f977255c0', metadata={'section': 'History'}, page_content='a product of Unspam Technologies that served as some inspiration for the basis of Cloudflare. From'), 0.528889), (Document(id='9d840318-be0e-4cf7-8a60-eaab50d45c9e', metadata={'section': 'History'}, page_content='of Cloudflare. From 2009, the company was venture-capital funded. On August 15, 2019, Cloudflare'), 0.52717584), (Document(id='db9137cc-051b-4b20-8d49-8a32bb2b99a7', metadata={'section': 'History'}, page_content='(December 2021), Vectrix (February 2022), Area 1 Security (February 2022), Nefeli Networks (March'), 0.52209044), (Document(id='dfaffd2f-4492-444d-accf-180b1f841463', metadata={'section': 'Products'}, page_content='As of 2024, Cloudflare servers are powered by AMD EPYC 9684X processors.'), 0.5169676), (Document(id='65bbd754-22d1-435a-860a-9259f6cf7dea', metadata={'section': 'History'}, page_content='(February 2014), CryptoSeal (June 2014), Eager Platform Co. (December 2016), Neumob (November'), 0.5132974), (Document(id='1400609f-0937-4700-acde-6e770d2dbd12', metadata={'section': 'History'}, page_content='their San Francisco headquarters as a source of randomness for encryption keys, alongside double'), 0.50999177), (Document(id='b77cef8b-1140-4d92-891b-0048ea70ae3a', metadata={'section': 'History'}, page_content='Neumob (November 2017), S2 Systems (January 2020), Linc (December 2020), Zaraz (December 2021),'), 0.5092492)]
Search by Namespaceβ
We can also search for vectors by namespace
. We just need to add it to the namespaces
array when adding it to our vector database.
namespace_name = f"test-namespace-{uuid.uuid4().hex[:8]}"
new_documents = [
Document(
page_content="This is a new namespace specific document!",
metadata={"section": "Namespace Test1"},
),
Document(
page_content="This is another namespace specific document!",
metadata={"section": "Namespace Test2"},
),
]
r = cfVect.add_documents(
index_name=vectorize_index_name,
documents=new_documents,
namespaces=[namespace_name] * len(new_documents),
wait=True,
)
query_documents = cfVect.similarity_search(
index_name=vectorize_index_name,
query="California",
namespace=namespace_name,
)
print(f"{len(query_documents)} results:\n - {str(query_documents)}")
2 results:
- [Document(id='65c4f7f4-aa4f-46b4-85ba-c90ea18dc7ed', metadata={'section': 'Namespace Test2', '_namespace': 'test-namespace-9cc13b96'}, page_content='This is another namespace specific document!'), Document(id='96350f98-7053-41c7-b6bb-5acdd3ab67bd', metadata={'section': 'Namespace Test1', '_namespace': 'test-namespace-9cc13b96'}, page_content='This is a new namespace specific document!')]
Search by IDsβ
We can also retrieve specific records for specific IDs. To do so, we need to set the vectorize index name on the index_name
Vectorize state param.
This will return both _namespace
and _values
as well as other metadata
.
sample_ids = [x.id for x in query_documents]
cfVect.index_name = vectorize_index_name
query_documents = cfVect.get_by_ids(
sample_ids,
)
print(str(query_documents[:3])[:500])
[Document(id='65c4f7f4-aa4f-46b4-85ba-c90ea18dc7ed', metadata={'section': 'Namespace Test2', '_namespace': 'test-namespace-9cc13b96', '_values': [-0.0005841255, 0.014480591, 0.040771484, 0.005218506, 0.015579224, 0.0007543564, -0.005138397, -0.022720337, 0.021835327, 0.038970947, 0.017456055, 0.022705078, 0.013450623, -0.015686035, -0.019119263, -0.01512146, -0.017471313, -0.007183075, -0.054382324, -0.01914978, 0.0005302429, 0.018600464, -0.083740234, -0.006462097, 0.0005598068, 0.024230957, -0
The namespace will be included in the _namespace
field in metadata
along with your other metadata (if you requested it in return_metadata
).
Note: You cannot set the _namespace
or _values
fields in metadata
as they are reserved. They will be stripped out during the insert process.
Vectorize supports Upserts which you can perform by setting upsert=True
.
query_documents[0].page_content = "Updated: " + query_documents[0].page_content
print(query_documents[0].page_content)
Updated: This is another namespace specific document!
new_document_id = "12345678910"
new_document = Document(
id=new_document_id,
page_content="This is a new document!",
metadata={"section": "Introduction"},
)
r = cfVect.add_documents(
index_name=vectorize_index_name,
documents=[new_document, query_documents[0]],
upsert=True,
wait=True,
)
query_documents_updated = cfVect.get_by_ids([new_document_id, query_documents[0].id])
print(str(query_documents_updated[0])[:500])
print(query_documents_updated[0].page_content)
print(query_documents_updated[1].page_content)
page_content='This is a new document!' metadata={'section': 'Introduction', '_namespace': None, '_values': [-0.007522583, 0.0023021698, 0.009963989, 0.031051636, -0.021316528, 0.0048103333, 0.026046753, 0.01348114, 0.026306152, 0.040374756, 0.03225708, 0.007423401, 0.031021118, -0.007347107, -0.034179688, 0.002111435, -0.027191162, -0.020950317, -0.021636963, -0.0030593872, -0.04977417, 0.018859863, -0.08062744, -0.027679443, 0.012512207, 0.0053634644, 0.008079529, -0.010528564, 0.07312012, 0.02
This is a new document!
Updated: This is another namespace specific document!
Deleting Recordsβ
We can delete records by their ids as well
r = cfVect.delete(index_name=vectorize_index_name, ids=sample_ids, wait=True)
print(r)
And to confirm deletion
query_documents = cfVect.get_by_ids(sample_ids)
assert len(query_documents) == 0
Creating from Documentsβ
LangChain stipulates that all vectorstores must have a from_documents
method to instantiate a new Vectorstore from documents. This is a more streamlined method than the individual create, add
steps shown above.
You can do that as shown here:
vectorize_index_name = "test-langchain-from-docs"
cfVect = CloudflareVectorize.from_documents(
account_id=cf_acct_id,
index_name=vectorize_index_name,
documents=texts,
embedding=embedder,
d1_database_id=d1_database_id,
d1_api_token=cf_d1_token,
vectorize_api_token=cf_vectorize_token,
wait=True,
)
query_documents = cfVect.similarity_search(
index_name=vectorize_index_name,
query="Edge Computing",
)
print(f"{len(query_documents)} results:\n{str(query_documents[0])[:300]}")
20 results:
page_content='utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content' metadata={'section': 'Products'}
Async Examplesβ
This section will show some Async examples
Creating Indexesβvectorize_index_name1 = f"test-langchain-{uuid.uuid4().hex}"
vectorize_index_name2 = f"test-langchain-{uuid.uuid4().hex}"
async_requests = [
cfVect.acreate_index(index_name=vectorize_index_name1),
cfVect.acreate_index(index_name=vectorize_index_name2),
]
res = await asyncio.gather(*async_requests);
Creating Metadata Indexesβ
async_requests = [
cfVect.acreate_metadata_index(
property_name="section",
index_type="string",
index_name=vectorize_index_name1,
wait=True,
),
cfVect.acreate_metadata_index(
property_name="section",
index_type="string",
index_name=vectorize_index_name2,
wait=True,
),
]
await asyncio.gather(*async_requests);
Adding Documentsβ
async_requests = [
cfVect.aadd_documents(index_name=vectorize_index_name1, documents=texts, wait=True),
cfVect.aadd_documents(index_name=vectorize_index_name2, documents=texts, wait=True),
]
await asyncio.gather(*async_requests);
Querying/Searchβ
async_requests = [
cfVect.asimilarity_search(index_name=vectorize_index_name1, query="Workers AI"),
cfVect.asimilarity_search(index_name=vectorize_index_name2, query="Edge Computing"),
]
async_results = await asyncio.gather(*async_requests);
print(f"{len(async_results[0])} results:\n{str(async_results[0][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[1][0])[:300]}")
20 results:
page_content='In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within'
20 results:
page_content='utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content'
Returning Metadata/Valuesβ
async_requests = [
cfVect.asimilarity_search(
index_name=vectorize_index_name1,
query="California",
return_values=True,
return_metadata="all",
),
cfVect.asimilarity_search(
index_name=vectorize_index_name2,
query="California",
return_values=True,
return_metadata="all",
),
]
async_results = await asyncio.gather(*async_requests);
print(f"{len(async_results[0])} results:\n{str(async_results[0][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[1][0])[:300]}")
20 results:
page_content='and other services. Cloudflare's headquarters are in San Francisco, California. According to' metadata={'section': 'Introduction', '_values': [-0.031219482, -0.018295288, -0.006000519, 0.017532349, 0.016403198, -0.029922485, -0.007133484, 0.004447937, 0.04559326, -0.011405945, 0.034820
20 results:
page_content='and other services. Cloudflare's headquarters are in San Francisco, California. According to' metadata={'section': 'Introduction', '_values': [-0.031219482, -0.018295288, -0.006000519, 0.017532349, 0.016403198, -0.029922485, -0.007133484, 0.004447937, 0.04559326, -0.011405945, 0.034820
Searching with Metadata Filteringβ
async_requests = [
cfVect.asimilarity_search(
index_name=vectorize_index_name1,
query="Cloudflare services",
k=2,
md_filter={"section": "Products"},
return_metadata="all",
),
cfVect.asimilarity_search(
index_name=vectorize_index_name2,
query="Cloudflare services",
k=2,
md_filter={"section": "Products"},
return_metadata="all",
),
]
async_results = await asyncio.gather(*async_requests);
print(f"{len(async_results[0])} results:\n{str(async_results[0][-1])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[1][0])[:300]}")
9 results:
page_content='It supports transport layer protocols TCP, UDP, QUIC, and many application layer protocols such as' metadata={'section': 'Products'}
9 results:
page_content='Cloudflare provides network and security products for consumers and businesses, utilizing edge' metadata={'section': 'Products'}
Cleanupβ
Let's finish by deleting all of the indexes we created in this notebook.
arr_indexes = cfVect.list_indexes()
arr_indexes = [x for x in arr_indexes if "test-langchain" in x.get("name")]
arr_async_requests = [
cfVect.adelete_index(index_name=x.get("name")) for x in arr_indexes
]
await asyncio.gather(*arr_async_requests);
API Referenceβ
https://developers.cloudflare.com/api/resources/vectorize/
https://developers.cloudflare.com/vectorize/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4