A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://python.langchain.com/docs/integrations/document_loaders/powerscale/ below:

Dell PowerScale Document Loader | 🦜️🔗 LangChain

Dell PowerScale Document Loader

Dell PowerScale is an enterprise scale out storage system that hosts industry leading OneFS filesystem that can be hosted on-prem or deployed in the cloud.

This document loader utilizes unique capabilities from PowerScale that can determine what files that have been modified since an application's last run and only returns modified files for processing. This will eliminate the need to re-process (chunk and embed) files that have not been changed, improving the overall data ingestion workflow.

This loader requires PowerScale's MetadataIQ feature enabled. Additional information can be found on our GitHub Repo: https://github.com/dell/powerscale-rag-connector

Overview Integration details Loader features Source Document Lazy Loading Native Async Support PowerScaleDocumentLoader ✅ ✅ PowerScaleUnstructuredLoader ✅ ✅ Setup

This document loader requires the use of a Dell PowerScale system with MetadataIQ enabled. Additional information can be found on our github page: https://github.com/dell/powerscale-rag-connector

Installation

The document loader lives in an external pip package and can be installed using standard tooling

%pip install --upgrade --quiet  powerscale-rag-connector
Initialization

Now we can instantiate document loader:

Generic Document Loader

Our generic document loader can be used to incrementally load all files from PowerScale in the following manner:

from powerscale_rag_connector import PowerScaleDocumentLoader

loader = PowerScaleDocumentLoader(
es_host_url="http://elasticsearch:9200",
es_index_name="metadataiq",
es_api_key="your-api-key",
folder_path="/ifs/data",
)
UnstructuredLoader Loader

Optionally, the PowerScaleUnstructuredLoader can be used to locate the changed files and automatically process the files producing elements of the source file. This is done using LangChain's UnstructuredLoader class.

from powerscale_rag_connector import PowerScaleUnstructuredLoader


loader = PowerScaleUnstructuredLoader(
es_host_url="http://elasticsearch:9200",
es_index_name="metadataiq",
es_api_key="your-api-key",
folder_path="/ifs/data",


mode="elements",
)

The fields:

Load

Internally, all code is asynchronous with PowerScale and MetadataIQ and the load and lazy load methods will return a python generator. We recommend using the lazy load function.

for doc in loader.load():
print(doc)
[Document(page_content='' metadata={'source': '/ifs/pdfs/1994-Graph.Theoretic.Obstacles.to.Perfect.Hashing.TR0257.pdf', 'snapshot': 20834, 'change_types': ['ENTRY_ADDED']}),
Document(page_content='' metadata={'source': '/ifs/pdfs/New.sendfile-FreeBSD.20.Feb.2015.pdf', 'snapshot': 20920, 'change_types': ['ENTRY_MODIFIED']}),
Document(page_content='' metadata={'source': '/ifs/pdfs/FAST-Fast.Architecture.Sensitive.Tree.Search.on.Modern.CPUs.and.GPUs-Slides.pdf', 'snapshot': 20924, 'change_types': ['ENTRY_ADDED']})]
Returned Object

Both document loaders will keep track of what files were previously returned to your application. When called again, the document loader will only return new or modified files since your previous run.

Your RAG application can use the information from change_types to add, update or delete entries your chunk and vector store.

When using PowerScaleUnstructuredLoader the page_content field will be filled with data from the Unstructured Loader

Lazy Load

Internally, all code is asynchronous with PowerScale and MetadataIQ and the load and lazy load methods will return a python generator. We recommend using the lazy load function.

for doc in loader.lazy_load():
print(doc)

The same Document is returned as the load function with all the same properties mentioned above.

Additional Examples

Additional examples and code can be found on our public github webpage: https://github.com/dell/powerscale-rag-connector/tree/main/examples that provide full working examples.

API reference

For detailed documentation of all PowerScale Document Loader features and configurations head to the github page: https://github.com/dell/powerscale-rag-connector/


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4