This notebook covers how to get started with the Oceanbase vector store.
SetupTo access Oceanbase vector stores you'll need to deploy a standalone OceanBase server: %docker run --name=ob433 -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d quay.io/oceanbase/oceanbase-ce:4.3.3.1-101000012024102216 And install the langchain-oceanbase
integration package. %pip install -qU "langchain-oceanbase" Check the connection to OceanBase and set the memory usage ratio for vector data:
from pyobvector import ObVecClient
tmp_client = ObVecClient()
tmp_client.perform_raw_text_sql("ALTER SYSTEM ob_vector_memory_limit_percentage = 30")
<sqlalchemy.engine.cursor.CursorResult at 0x12696f2a0>
Initialization
Configure the API key of the embedded model. Here we use DashScopeEmbeddings
as an example. When deploying Oceanbase
with a Docker image as described above, simply follow the script below to set the host
, port
, user
, password
, and database name
. For other deployment methods, set these parameters according to the actual situation. %pip install dashscope
import os
from langchain_community.embeddings import DashScopeEmbeddings
from langchain_oceanbase.vectorstores import OceanbaseVectorStore
DASHSCOPE_API = os.environ.get("DASHSCOPE_API_KEY", "")
connection_args = {
"host": "127.0.0.1",
"port": "2881",
"user": "root@test",
"password": "",
"db_name": "test",
}
embeddings = DashScopeEmbeddings(
model="text-embedding-v1", dashscope_api_key=DASHSCOPE_API
)
vector_store = OceanbaseVectorStore(
embedding_function=embeddings,
table_name="langchain_vector",
connection_args=connection_args,
vidx_metric_type="l2",
drop_old=True,
)
Manage vector store Add items to vector store
from langchain_core.documents import Document
document_1 = Document(page_content="foo", metadata={"source": "https://foo.com"})
document_2 = Document(page_content="bar", metadata={"source": "https://bar.com"})
document_3 = Document(page_content="baz", metadata={"source": "https://baz.com"})
documents = [document_1, document_2, document_3]
vector_store.add_documents(documents=documents, ids=["1", "2", "3"])
Update items in vector store
updated_document = Document(
page_content="qux", metadata={"source": "https://another-example.com"}
)
vector_store.add_documents(documents=[updated_document], ids=["1"])
Delete items from vector store
vector_store.delete(ids=["3"])
Query vector store
Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.
Query directlyPerforming a simple similarity search can be done as follows:
results = vector_store.similarity_search(
query="thud", k=1, filter={"source": "https://another-example.com"}
)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* bar [{'source': 'https://bar.com'}]
If you want to execute a similarity search and receive the corresponding scores you can run:
results = vector_store.similarity_search_with_score(
query="thud", k=1, filter={"source": "https://example.com"}
)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=133.452299] bar [{'source': 'https://bar.com'}]
Query by turning into retriever
You can also transform the vector store into a retriever for easier usage in your chains.
retriever = vector_store.as_retriever(search_kwargs={"k": 1})
retriever.invoke("thud")
[Document(metadata={'source': 'https://bar.com'}, page_content='bar')]
Usage for retrieval-augmented generation
For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:
API referenceFor detailed documentation of all OceanbaseVectorStore features and configurations head to the API reference: https://python.langchain.com/docs/integrations/vectorstores/oceanbase
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4