A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/knowusuboaky/RAGFlowChain below:

knowusuboaky/RAGFlowChain: A comprehensive toolkit for building Retrieval-Augmented Generation (RAG) pipelines, including data loading, vector database creation, retrieval, and chain management.

RAGFlowChain is a powerful and flexible toolkit designed for building Retrieval-Augmented Generation (RAG) pipelines. This library integrates data loading from various sources, vector database creation, and chain management, making it easier to develop advanced AI solutions that combine retrieval mechanisms with generative models.

To install RAGFlowChain, simply run:

pip install RAGFlowChain==0.5.1
1. Fetch Data from Multiple Sources

RAGFlowChain allows you to fetch and process data from various online and local sources, all integrated into a single DataFrame.

from ragflowchain import data_loader
import yaml
import os

# API Keys

PATH_CREDENTIALS = '../credentials.yml'

BOOKS_API_KEY = yaml.safe_load(open(PATH_CREDENTIALS))['book']
NEWS_API_KEY = yaml.safe_load(open(PATH_CREDENTIALS))['news']
YOUTUBE_API_KEY = yaml.safe_load(open(PATH_CREDENTIALS))['youtube']

# Define online and local data sources
# Define URLs for websites
urls = [
    "https://www.honda.ca/en",
    "https://www.honda.ca/en/vehicles",
    "https://www.honda.ca/en/odyssey"
]

# Define online sources
online_sources = {
    'youtube': {
        'topic': 'honda acura',
        'api_key': YOUTUBE_API_KEY,
        'max_results': 10
    },
    'websites': urls,
    'books': {
        'api_key': BOOKS_API_KEY,
        'query': 'automobile industry',
        'max_results': 10
    },
    'news_articles': {
        'api_key': NEWS_API_KEY,
        'query': 'automobile marketing',
        'page_size': 5,
        'max_pages': 1
    }
}

local_sources = ["../folder/irt.ppt", "../book/book.pdf", "../documents/sample.docx", "../notes/note.txt"]

# Fetch and process the data
final_data_df = data_loader(online_sources=online_sources, local_sources=local_sources, chunk_size=1000)

# Display the DataFrame
print(final_data_df)
2. Create a Vector Database

Once you have the data, you can create a vector database using the create_database function.

from ragflowchain import create_database

# Create a vector store from the processed data
vectorstore, docs_recursive = create_database(
    df=final_data_df,
    page_content="content",
    embedding_function=None,  # Uses default SentenceTransformerEmbeddings
    vectorstore_method='Chroma',  # Options: 'Chroma', 'FAISS', 'Annoy'
    vectorstore_directory="data/chroma.db",  # Adjust according to vectorstore_method
    chunk_size=1000,
    chunk_overlap=100
)
Explanation of create_database Arguments:

Integrate the data and vector store into a Retrieval-Augmented Generation (RAG) chain.

from ragflowchain import create_rag_chain

# Create the RAG chain
rag_chain = create_rag_chain(
    llm=YourLanguageModel(),  # Replace with your LLM instance
    vector_database_directory="data/chroma.db",
    method='Chroma',  # Choose 'Chroma', 'FAISS', or 'Annoy'
    embedding_function=None,  # Optional, defaults to SentenceTransformerEmbeddings
    system_prompt="This is a system prompt.",  # Optional: Customize your system prompt
    chat_history_prompt="This is a chat history prompt.",  # Optional: Customize your chat history prompt
    tavily_search="YourTavilyAPIKey"  # Optional: Replace with your Tavily API key or TavilySearchResults instance
)
Explanation of create_rag_chain Arguments: 4. Run the RAG Chain Using invoke
# Example usage with invoke method
result = rag_chain.invoke(
    {"input": "Your question here"}, 
    config={
        "configurable": {"session_id": "user123"}
    }
)

print(result["answer"])
Explanation of invoke Usage: Detailed Explanation of Function Arguments
data_loader(online_sources=None, local_sources=None, chunk_size=1000)
create_database(df, page_content, embedding_function=None, vectorstore_method='Chroma', vectorstore_directory="data/vectorstore.db", chunk_size=1000, chunk_overlap=100)

Here's the updated version of your code snippet and explanation to include the tavily_search argument and adjust the descriptions accordingly:

create_rag_chain(llm, vector_database_directory, method='Chroma', embedding_function=None, system_prompt=None, chat_history_prompt=None, tavily_search=None)
rag_chain.invoke({"input": question}, config={"configurable": {"session_id": "any"}})

For more detailed documentation, including advanced usage and customization options, please visit the GitHub repository.

RAGFlowChain is licensed under the MIT License. See the LICENSE file for more information.

RAGFlowChain is built on top of powerful tools like LangChain and Chroma. We thank the open-source community for their contributions.

Made with ❤️ by Kwadwo Daddy Nyame Owusu - Boakye.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4