RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://python.langchain.com/docs/integrations/vectorstores/sap_hanavector below:

SAP HANA Cloud Vector Engine

SAP HANA Cloud Vector Engine is a vector store fully integrated into the SAP HANA Cloud database.

Setup

Install the langchain-hana external integration package, as well as the other packages used throughout this notebook.

%pip install -qU langchain-hana

Credentials

Ensure your SAP HANA instance is running. Load your credentials from environment variables and create a connection:

import os

from dotenv import load_dotenv
from hdbcli import dbapi

load_dotenv()

connection = dbapi.connect(
    address=os.environ.get("HANA_DB_ADDRESS"),
    port=os.environ.get("HANA_DB_PORT"),
    user=os.environ.get("HANA_DB_USER"),
    password=os.environ.get("HANA_DB_PASSWORD"),
    autocommit=True,
    sslValidateCertificate=False,
)

Learn more about SAP HANA in What is SAP HANA?.

Initialization

To initialize a HanaDB vector store, you need a database connection and an embedding instance. SAP HANA Cloud Vector Engine supports both external and internal embeddings.

Using External Embeddings

pip install -qU langchain-openai

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

Using Internal Embeddings

Alternatively, you can compute embeddings directly in SAP HANA using its native VECTOR_EMBEDDING() function. To enable this, create an instance of HanaInternalEmbeddings with your internal model ID and pass it to HanaDB. Note that the HanaInternalEmbeddings instance is specifically designed for use with HanaDB and is not intended for use with other vector store implementations. For more information about internal embedding, see the SAP HANA VECTOR_EMBEDDING Function.

Caution: Ensure NLP is enabled in your SAP HANA Cloud instance.

from langchain_hana import HanaInternalEmbeddings

embeddings = HanaInternalEmbeddings(internal_embedding_model_id="SAP_NEB.20240715")

Once you have your connection and embedding instance, create the vector store by passing them to HanaDB along with a table name for storing vectors:

from langchain_hana import HanaDB

db = HanaDB(
    embedding=embeddings, connection=connection, table_name="STATE_OF_THE_UNION"
)

Example

Load the sample document "state_of_the_union.txt" and create chunks from it.

from langchain_community.document_loaders import TextLoader
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

text_documents = TextLoader(
    "../../how_to/state_of_the_union.txt", encoding="UTF-8"
).load()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
text_chunks = text_splitter.split_documents(text_documents)
print(f"Number of document chunks: {len(text_chunks)}")

Number of document chunks: 88

Add the loaded document chunks to the table. For this example, we delete any previous content from the table which might exist from previous runs.


db.delete(filter={})


db.add_documents(text_chunks)

Perform a query to get the two best-matching document chunks from the ones that were added in the previous step. By default "Cosine Similarity" is used for the search.

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query, k=2)

for doc in docs:
    print("-" * 80)
    print(doc.page_content)

--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. 

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.

Query the same content with "Euclidian Distance". The results shoud be the same as with "Cosine Similarity".

from langchain_hana.utils import DistanceStrategy

db = HanaDB(
    embedding=embeddings,
    connection=connection,
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
    table_name="STATE_OF_THE_UNION",
)

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query, k=2)
for doc in docs:
    print("-" * 80)
    print(doc.page_content)

--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. 

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.

Maximal Marginal Relevance Search (MMR)

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents. The first 20 (fetch_k) items will be retrieved from the DB. The MMR algorithm will then find the best 2 (k) matches.

docs = db.max_marginal_relevance_search(query, k=2, fetch_k=20)
for doc in docs:
    print("-" * 80)
    print(doc.page_content)

--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. 

In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. 

Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.

Creating an HNSW Vector Index

A vector index can significantly speed up top-k nearest neighbor queries for vectors. Users can create a Hierarchical Navigable Small World (HNSW) vector index using the create_hnsw_index function.

For more information about creating an index at the database level, please refer to the official documentation.


db_cosine = HanaDB(
    embedding=embeddings, connection=connection, table_name="STATE_OF_THE_UNION"
)


db_cosine.create_hnsw_index()  





db_l2 = HanaDB(
    embedding=embeddings,
    connection=connection,
    table_name="STATE_OF_THE_UNION",
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,  
)


db_l2.create_hnsw_index(
    index_name="STATE_OF_THE_UNION_L2_index",
    m=100,  
    ef_construction=200,  
    ef_search=500,  
)


docs = db_l2.max_marginal_relevance_search(query, k=2, fetch_k=20)
for doc in docs:
    print("-" * 80)
    print(doc.page_content)

--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. 

In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. 

Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.

Key Points:

Similarity Function: The similarity function for the index is cosine similarity by default. If you want to use a different similarity function (e.g., L2 distance), you need to specify it when initializing the HanaDB instance.
Default Parameters: In the create_hnsw_index function, if the user does not provide custom values for parameters like m, ef_construction, or ef_search, the default values (e.g., m=64, ef_construction=128, ef_search=200) will be used automatically. These values ensure the index is created with reasonable performance without requiring user intervention.

Basic Vectorstore Operations

db = HanaDB(
    connection=connection, embedding=embeddings, table_name="LANGCHAIN_DEMO_BASIC"
)


db.delete(filter={})

We can add simple text documents to the existing table.

docs = [Document(page_content="Some text"), Document(page_content="Other docs")]
db.add_documents(docs)

Add documents with metadata.

docs = [
    Document(
        page_content="foo",
        metadata={"start": 100, "end": 150, "doc_name": "foo.txt", "quality": "bad"},
    ),
    Document(
        page_content="bar",
        metadata={"start": 200, "end": 250, "doc_name": "bar.txt", "quality": "good"},
    ),
]
db.add_documents(docs)

Query documents with specific metadata.

docs = db.similarity_search("foobar", k=2, filter={"quality": "bad"})

for doc in docs:
    print("-" * 80)
    print(doc.page_content)
    print(doc.metadata)

--------------------------------------------------------------------------------
foo
{'start': 100, 'end': 150, 'doc_name': 'foo.txt', 'quality': 'bad'}

Delete documents with specific metadata.

db.delete(filter={"quality": "bad"})


docs = db.similarity_search("foobar", k=2, filter={"quality": "bad"})
print(len(docs))

Advanced filtering

In addition to the basic value-based filtering capabilities, it is possible to use more advanced filtering. The table below shows the available filter operators.

Operator Semantic $eq Equality (==) $ne Inequality (!=) $lt Less than (<) $lte Less than or equal (<=) $gt Greater than (>) $gte Greater than or equal (>=) $in Contained in a set of given values (in) $nin Not contained in a set of given values (not in) $between Between the range of two boundary values $like Text equality based on the "LIKE" semantics in SQL (using "%" as wildcard) $contains Filters documents containing a specific keyword $and Logical "and", supporting 2 or more operands $or Logical "or", supporting 2 or more operands


docs = [
    Document(
        page_content="First",
        metadata={"name": "Adam Smith", "is_active": True, "id": 1, "height": 10.0},
    ),
    Document(
        page_content="Second",
        metadata={"name": "Bob Johnson", "is_active": False, "id": 2, "height": 5.7},
    ),
    Document(
        page_content="Third",
        metadata={"name": "Jane Doe", "is_active": True, "id": 3, "height": 2.4},
    ),
]

db = HanaDB(
    connection=connection,
    embedding=embeddings,
    table_name="LANGCHAIN_DEMO_ADVANCED_FILTER",
)


db.delete(filter={})
db.add_documents(docs)



def print_filter_result(result):
    if len(result) == 0:
        print("<empty result>")
    for doc in result:
        print(doc.metadata)

Filtering with $ne, $gt, $gte, $lt, $lte

advanced_filter = {"id": {"$ne": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$gt": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$gte": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$lt": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$lte": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

Filter: {'id': {'$ne': 1}}
{'name': 'Jane Doe', 'is_active': True, 'id': 3, 'height': 2.4}
{'name': 'Bob Johnson', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'id': {'$gt': 1}}
{'name': 'Jane Doe', 'is_active': True, 'id': 3, 'height': 2.4}
{'name': 'Bob Johnson', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'id': {'$gte': 1}}
{'name': 'Adam Smith', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'Jane Doe', 'is_active': True, 'id': 3, 'height': 2.4}
{'name': 'Bob Johnson', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'id': {'$lt': 1}}
<empty result>
Filter: {'id': {'$lte': 1}}
{'name': 'Adam Smith', 'is_active': True, 'id': 1, 'height': 10.0}

Filtering with $between, $in, $nin

advanced_filter = {"id": {"$between": (1, 2)}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$in": ["Adam Smith", "Bob Johnson"]}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$nin": ["Adam Smith", "Bob Johnson"]}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

Filter: {'id': {'$between': (1, 2)}}
{'name': 'Adam Smith', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'Bob Johnson', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'name': {'$in': ['Adam Smith', 'Bob Johnson']}}
{'name': 'Adam Smith', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'Bob Johnson', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'name': {'$nin': ['Adam Smith', 'Bob Johnson']}}
{'name': 'Jane Doe', 'is_active': True, 'id': 3, 'height': 2.4}

Text filtering with $like

advanced_filter = {"name": {"$like": "a%"}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$like": "%a%"}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

Filter: {'name': {'$like': 'a%'}}
<empty result>
Filter: {'name': {'$like': '%a%'}}
{'name': 'Adam Smith', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'Jane Doe', 'is_active': True, 'id': 3, 'height': 2.4}

Text filtering with $contains

advanced_filter = {"name": {"$contains": "bob"}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$contains": "bo"}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$contains": "Adam Johnson"}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$contains": "Adam Smith"}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

Filter: {'name': {'$contains': 'bob'}}
{'name': 'Bob Johnson', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'name': {'$contains': 'bo'}}
<empty result>
Filter: {'name': {'$contains': 'Adam Johnson'}}
<empty result>
Filter: {'name': {'$contains': 'Adam Smith'}}
{'name': 'Adam Smith', 'is_active': True, 'id': 1, 'height': 10.0}

Combined filtering with $and, $or

advanced_filter = {"$or": [{"id": 1}, {"name": "bob"}]}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"$and": [{"id": 1}, {"id": 2}]}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"$or": [{"id": 1}, {"id": 2}, {"id": 3}]}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {
    "$and": [{"name": {"$contains": "bob"}}, {"name": {"$contains": "johnson"}}]
}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

Filter: {'$or': [{'id': 1}, {'name': 'bob'}]}
{'name': 'Adam Smith', 'is_active': True, 'id': 1, 'height': 10.0}
Filter: {'$and': [{'id': 1}, {'id': 2}]}
<empty result>
Filter: {'$or': [{'id': 1}, {'id': 2}, {'id': 3}]}
{'name': 'Adam Smith', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'Jane Doe', 'is_active': True, 'id': 3, 'height': 2.4}
{'name': 'Bob Johnson', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'$and': [{'name': {'$contains': 'bob'}}, {'name': {'$contains': 'johnson'}}]}
{'name': 'Bob Johnson', 'is_active': False, 'id': 2, 'height': 5.7}

Using a VectorStore as a retriever in chains for retrieval augmented generation (RAG)


db = HanaDB(
    connection=connection,
    embedding=embeddings,
    table_name="LANGCHAIN_DEMO_RETRIEVAL_CHAIN",
)


db.delete(filter={})


db.add_documents(text_chunks)


retriever = db.as_retriever()

Define the prompt.

from langchain_core.prompts import PromptTemplate

prompt_template = """
You are an expert in state of the union topics. You are provided multiple context items that are related to the prompt you have to answer.
Use the following pieces of context to answer the question at the end.

'''
{context}
'''

Question: {question}
"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = {"prompt": PROMPT}

Create the ConversationalRetrievalChain, which handles the chat history and the retrieval of similar document chunks to be added to the prompt.

from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo")
memory = ConversationBufferMemory(
    memory_key="chat_history", output_key="answer", return_messages=True
)
qa_chain = ConversationalRetrievalChain.from_llm(
    llm,
    db.as_retriever(search_kwargs={"k": 5}),
    return_source_documents=True,
    memory=memory,
    verbose=False,
    combine_docs_chain_kwargs={"prompt": PROMPT},
)

Ask the first question (and verify how many text chunks have been used).

question = "What about Mexico and Guatemala?"

result = qa_chain.invoke({"question": question})
print("Answer from LLM:")
print("================")
print(result["answer"])

source_docs = result["source_documents"]
print("================")
print(f"Number of used source document chunks: {len(source_docs)}")

Answer from LLM:
================
The United States has set up joint patrols with Mexico and Guatemala to catch more human traffickers at the border. This collaborative effort aims to improve border security and combat illegal activities such as human trafficking.
================
Number of used source document chunks: 5

Examine the used chunks of the chain in detail. Check if the best ranked chunk contains info about "Mexico and Guatemala" as mentioned in the question.

for doc in source_docs:
    print("-" * 80)
    print(doc.page_content)
    print(doc.metadata)

Ask another question on the same conversational chain. The answer should relate to the previous answer given.

question = "How many casualties were reported after that?"

result = qa_chain.invoke({"question": question})
print("Answer from LLM:")
print("================")
print(result["answer"])

Answer from LLM:
================
Countries like Mexico and Guatemala are participating in joint patrols to catch human traffickers. The United States is also working with partners in South and Central America to host more refugees and secure their borders. Additionally, the U.S. is working with twenty-seven members of the European Union, as well as countries like France, Germany, Italy, the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and Switzerland.

Standard tables vs. "custom" tables with vector data

As default behaviour, the table for the embeddings is created with 3 columns:

A column VEC_TEXT, which contains the text of the Document
A column VEC_META, which contains the metadata of the Document
A column VEC_VECTOR, which contains the embeddings-vector of the Document's text


db = HanaDB(
    connection=connection, embedding=embeddings, table_name="LANGCHAIN_DEMO_NEW_TABLE"
)


db.delete(filter={})


docs = [
    Document(
        page_content="A simple document",
        metadata={"start": 100, "end": 150, "doc_name": "simple.txt"},
    )
]
db.add_documents(docs)

Show the columns in table "LANGCHAIN_DEMO_NEW_TABLE"

cur = connection.cursor()
cur.execute(
    "SELECT COLUMN_NAME, DATA_TYPE_NAME FROM SYS.TABLE_COLUMNS WHERE SCHEMA_NAME = CURRENT_SCHEMA AND TABLE_NAME = 'LANGCHAIN_DEMO_NEW_TABLE'"
)
rows = cur.fetchall()
for row in rows:
    print(row)
cur.close()

('VEC_META', 'NCLOB')
('VEC_TEXT', 'NCLOB')
('VEC_VECTOR', 'REAL_VECTOR')

Show the value of the inserted document in the three columns

cur = connection.cursor()
cur.execute(
    "SELECT VEC_TEXT, VEC_META, TO_NVARCHAR(VEC_VECTOR) FROM LANGCHAIN_DEMO_NEW_TABLE LIMIT 1"
)
rows = cur.fetchall()
print(rows[0][0])  
print(rows[0][1])  
print(rows[0][2])  
cur.close()

Custom tables must have at least three columns that match the semantics of a standard table

A column with type NCLOB or NVARCHAR for the text/context of the embeddings
A column with type NCLOB or NVARCHAR for the metadata
A column with type REAL_VECTOR for the embedding vector

The table can contain additional columns. When new Documents are inserted into the table, these additional columns must allow NULL values.


my_own_table_name = "MY_OWN_TABLE_ADD"
cur = connection.cursor()
cur.execute(
    (
        f"CREATE TABLE {my_own_table_name} ("
        "SOME_OTHER_COLUMN NVARCHAR(42), "
        "MY_TEXT NVARCHAR(2048), "
        "MY_METADATA NVARCHAR(1024), "
        "MY_VECTOR REAL_VECTOR )"
    )
)


db = HanaDB(
    connection=connection,
    embedding=embeddings,
    table_name=my_own_table_name,
    content_column="MY_TEXT",
    metadata_column="MY_METADATA",
    vector_column="MY_VECTOR",
)


docs = [
    Document(
        page_content="Some other text",
        metadata={"start": 400, "end": 450, "doc_name": "other.txt"},
    )
]
db.add_documents(docs)


cur.execute(f"SELECT * FROM {my_own_table_name} LIMIT 1")
rows = cur.fetchall()
print(rows[0][0])  
print(rows[0][1])  
print(rows[0][2])  
print(rows[0][3])  

cur.close()

None
Some other text
{"start": 400, "end": 450, "doc_name": "other.txt"}
<memory at 0x110f856c0>

Add another document and perform a similarity search on the custom table.

docs = [
    Document(
        page_content="Some more text",
        metadata={"start": 800, "end": 950, "doc_name": "more.txt"},
    )
]
db.add_documents(docs)

query = "What's up?"
docs = db.similarity_search(query, k=2)
for doc in docs:
    print("-" * 80)
    print(doc.page_content)

--------------------------------------------------------------------------------
Some more text
--------------------------------------------------------------------------------
Some other text

Filter Performance Optimization with Custom Columns

To allow flexible metadata values, all metadata is stored as JSON in the metadata column by default. If some of the used metadata keys and value types are known, they can be stored in additional columns instead by creating the target table with the key names as column names and passing them to the HanaDB constructor via the specific_metadata_columns list. Metadata keys that match those values are copied into the special column during insert. Filters use the special columns instead of the metadata JSON column for keys in the specific_metadata_columns list.


my_own_table_name = "PERFORMANT_CUSTOMTEXT_FILTER"
cur = connection.cursor()
cur.execute(
    (
        f"CREATE TABLE {my_own_table_name} ("
        "CUSTOMTEXT NVARCHAR(500), "
        "MY_TEXT NVARCHAR(2048), "
        "MY_METADATA NVARCHAR(1024), "
        "MY_VECTOR REAL_VECTOR )"
    )
)


db = HanaDB(
    connection=connection,
    embedding=embeddings,
    table_name=my_own_table_name,
    content_column="MY_TEXT",
    metadata_column="MY_METADATA",
    vector_column="MY_VECTOR",
    specific_metadata_columns=["CUSTOMTEXT"],
)


docs = [
    Document(
        page_content="Some other text",
        metadata={
            "start": 400,
            "end": 450,
            "doc_name": "other.txt",
            "CUSTOMTEXT": "Filters on this value are very performant",
        },
    )
]
db.add_documents(docs)


cur.execute(f"SELECT * FROM {my_own_table_name} LIMIT 1")
rows = cur.fetchall()
print(
    rows[0][0]
)  
print(rows[0][1])  
print(
    rows[0][2]
)  
print(rows[0][3])  

cur.close()

Filters on this value are very performant
Some other text
{"start": 400, "end": 450, "doc_name": "other.txt", "CUSTOMTEXT": "Filters on this value are very performant"}
<memory at 0x110f859c0>

The special columns are completely transparent to the rest of the langchain interface. Everything works as it did before, just more performant.

docs = [
    Document(
        page_content="Some more text",
        metadata={
            "start": 800,
            "end": 950,
            "doc_name": "more.txt",
            "CUSTOMTEXT": "Another customtext value",
        },
    )
]
db.add_documents(docs)

advanced_filter = {"CUSTOMTEXT": {"$like": "%value%"}}
query = "What's up?"
docs = db.similarity_search(query, k=2, filter=advanced_filter)
for doc in docs:
    print("-" * 80)
    print(doc.page_content)

--------------------------------------------------------------------------------
Some more text
--------------------------------------------------------------------------------
Some other text

Vector store conceptual guide
Vector store how-to guides

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4