RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://python.langchain.com/docs/integrations/vectorstores/semadb below:

SemaDB | 🦜️🔗 LangChain

SemaDB

SemaDB from SemaFind is a no fuss vector similarity database for building AI applications. The hosted SemaDB Cloud offers a no fuss developer experience to get started.

The full documentation of the API along with examples and an interactive playground is available on RapidAPI.

This notebook demonstrates usage of the SemaDB Cloud vector store.

You'll need to install langchain-community with pip install -qU langchain-community to use this integration

Load document embeddings

To run things locally, we are using Sentence Transformers which are commonly used for embedding sentences. You can use any embedding model LangChain offers.

%pip install --upgrade --quiet  sentence_transformers

from langchain_huggingface import HuggingFaceEmbeddings

model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name)

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
print(len(docs))

Connect to SemaDB

SemaDB Cloud uses RapidAPI keys to authenticate. You can obtain yours by creating a free RapidAPI account.

import getpass
import os

if "SEMADB_API_KEY" not in os.environ:
    os.environ["SEMADB_API_KEY"] = getpass.getpass("SemaDB API Key:")

from langchain_community.vectorstores import SemaDB
from langchain_community.vectorstores.utils import DistanceStrategy

The parameters to the SemaDB vector store reflect the API directly:

"mycollection": is the collection name in which we will store these vectors.
768: is dimensions of the vectors. In our case, the sentence transformer embeddings yield 768 dimensional vectors.
API_KEY: is your RapidAPI key.
embeddings: correspond to how the embeddings of documents, texts and queries will be generated.
DistanceStrategy: is the distance metric used. The wrapper automatically normalises vectors if COSINE is used.

db = SemaDB("mycollection", 768, embeddings, DistanceStrategy.COSINE)



db.create_collection()

The SemaDB vector store wrapper adds the document text as point metadata to collect later. Storing large chunks of text is not recommended. If you are indexing a large collection, we instead recommend storing references to the documents such as external Ids.

db.add_documents(docs)[:2]

['813c7ef3-9797-466b-8afa-587115592c6c',
 'fc392f7f-082b-4932-bfcc-06800db5e017']

Similarity Search

We use the default LangChain similarity search interface to search for the most similar sentences.

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

docs = db.similarity_search_with_score(query)
docs[0]

(Document(page_content='And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../how_to/state_of_the_union.txt', 'text': 'And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.'}),
 0.42369342)

Clean up

You can delete the collection to remove all data.

Vector store conceptual guide
Vector store how-to guides

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4