A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://python.langchain.com/docs/integrations/document_loaders/box/ below:

BoxLoader and BoxBlobLoader | ๐Ÿฆœ๏ธ๐Ÿ”— LangChain

BoxLoader and BoxBlobLoader

The langchain-box package provides two methods to index your files from Box: BoxLoader and BoxBlobLoader. BoxLoader allows you to ingest text representations of files that have a text representation in Box. The BoxBlobLoader allows you download the blob for any document or image file for processing with the blob parser of your choice.

This notebook details getting started with both of these. For detailed documentation of all BoxLoader features and configurations head to the API Reference pages for BoxLoader and BoxBlobLoader.

Overviewโ€‹

The BoxLoader class helps you get your unstructured content from Box in Langchain's Document format. You can do this with either a List[str] containing Box file IDs, or with a str containing a Box folder ID.

The BoxBlobLoader class helps you get your unstructured content from Box in Langchain's Blob format. You can do this with a List[str] containing Box file IDs, a str containing a Box folder ID, a search query, or a BoxMetadataQuery.

If getting files from a folder with folder ID, you can also set a Bool to tell the loader to get all sub-folders in that folder, as well.

info

A Box instance can contain Petabytes of files, and folders can contain millions of files. Be intentional when choosing what folders you choose to index. And we recommend never getting all files from folder 0 recursively. Folder ID 0 is your root folder.

The BoxLoader will skip files without a text representation, while the BoxBlobLoader will return blobs for all document and image files.

Integration detailsโ€‹ Loader featuresโ€‹ Source Document Lazy Loading Async Support BoxLoader โœ… โŒ BoxBlobLoader โœ… โŒ Setupโ€‹

In order to use the Box package, you will need a few things:

Credentialsโ€‹

For these examples, we will use token authentication. This can be used with any authentication method. Just get the token with whatever methodology. If you want to learn more about how to use other authentication types with langchain-box, visit the Box provider document.

import getpass
import os

box_developer_token = getpass.getpass("Enter your Box Developer Token: ")
Enter your Box Developer Token:  ยทยทยทยทยทยทยทยท

To enable automated tracing of your model calls, set your LangSmith API key:

Installationโ€‹

Install langchain_box.

%pip install -qU langchain_box
Initializationโ€‹ Load filesโ€‹

If you wish to load files, you must provide the List of file ids at instantiation time.

This requires 1 piece of information:

BoxLoaderโ€‹
from langchain_box.document_loaders import BoxLoader

box_file_ids = ["1514555423624", "1514553902288"]

loader = BoxLoader(
box_developer_token=box_developer_token,
box_file_ids=box_file_ids,
character_limit=10000,
)
BoxBlobLoaderโ€‹
from langchain_box.blob_loaders import BoxBlobLoader

box_file_ids = ["1514555423624", "1514553902288"]

loader = BoxBlobLoader(
box_developer_token=box_developer_token, box_file_ids=box_file_ids
)
Load from folderโ€‹

If you wish to load files from a folder, you must provide a str with the Box folder ID at instantiation time.

This requires 1 piece of information:

BoxLoaderโ€‹
from langchain_box.document_loaders import BoxLoader

box_folder_id = "260932470532"

loader = BoxLoader(
box_folder_id=box_folder_id,
recursive=False,
character_limit=10000,
)
BoxBlobLoaderโ€‹
from langchain_box.blob_loaders import BoxBlobLoader

box_folder_id = "260932470532"

loader = BoxBlobLoader(
box_folder_id=box_folder_id,
recursive=False,
)
Search for files with BoxBlobLoaderโ€‹

If you need to search for files, the BoxBlobLoader offers two methods. First you can perform a full text search with optional search options to narrow down that search.

This requires 1 piece of information:

You can also provide a BoxSearchOptions object to narrow down that search

BoxBlobLoader searchโ€‹
from langchain_box.blob_loaders import BoxBlobLoader
from langchain_box.utilities import BoxSearchOptions, DocumentFiles, SearchTypeFilter

box_folder_id = "260932470532"

box_search_options = BoxSearchOptions(
ancestor_folder_ids=[box_folder_id],
search_type_filter=[SearchTypeFilter.FILE_CONTENT],
created_date_range=["2023-01-01T00:00:00-07:00", "2024-08-01T00:00:00-07:00,"],
file_extensions=[DocumentFiles.DOCX, DocumentFiles.PDF],
k=200,
size_range=[1, 1000000],
updated_data_range=None,
)

loader = BoxBlobLoader(
box_developer_token=box_developer_token,
query="Victor",
box_search_options=box_search_options,
)

You can also search for content based on Box Metadata. If your Box instance uses Metadata, you can search for any documents that have a specific Metadata Template attached that meet a certain criteria, like returning any invoices with a total greater than or equal to $500 that were created last quarter.

This requires 1 piece of information:

You can also provide a BoxSearchOptions object to narrow down that search

BoxBlobLoader Metadata queryโ€‹
from langchain_box.blob_loaders import BoxBlobLoader
from langchain_box.utilities import BoxMetadataQuery

query = BoxMetadataQuery(
template_key="enterprise_1234.myTemplate",
query="total >= :value",
query_params={"value": 100},
ancestor_folder_id="260932470532",
)

loader = BoxBlobLoader(box_metadata_query=query)
Loadโ€‹ BoxLoaderโ€‹
docs = loader.load()
docs[0]
Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/internal_files/1514555423624/versions/1663171610024/representations/extracted_text/content/', 'title': 'Invoice-A5555_txt'}, page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine Items:\n    - Gravitational Wave Detector Kit: $800\n    - Exoplanet Terrarium: $120\nTotal: $920')
{'source': 'https://dl.boxcloud.com/api/2.0/internal_files/1514555423624/versions/1663171610024/representations/extracted_text/content/', 'title': 'Invoice-A5555_txt'}
BoxBlobLoaderโ€‹
for blob in loader.yield_blobs():
print(f"Blob({blob})")
Blob(id='1514555423624' metadata={'source': 'https://app.box.com/0/260935730128/260931903795/Invoice-A5555.txt', 'name': 'Invoice-A5555.txt', 'file_size': 150} data="b'Vendor: AstroTech Solutions\\nInvoice Number: A5555\\n\\nLine Items:\\n    - Gravitational Wave Detector Kit: $800\\n    - Exoplanet Terrarium: $120\\nTotal: $920'" mimetype='text/plain' path='https://app.box.com/0/260935730128/260931903795/Invoice-A5555.txt')
Blob(id='1514553902288' metadata={'source': 'https://app.box.com/0/260935730128/260931903795/Invoice-B1234.txt', 'name': 'Invoice-B1234.txt', 'file_size': 168} data="b'Vendor: Galactic Gizmos Inc.\\nInvoice Number: B1234\\nPurchase Order Number: 001\\nLine Items:\\n - Quantum Flux Capacitor: $500\\n - Anti-Gravity Pen Set: $75\\nTotal: $575'" mimetype='text/plain' path='https://app.box.com/0/260935730128/260931903795/Invoice-B1234.txt')
Lazy Loadโ€‹ BoxLoader onlyโ€‹
page = []
for doc in loader.lazy_load():
page.append(doc)
if len(page) >= 10:



page = []

All Box connectors offer the ability to select additional fields from the Box FileFull object to return as custom LangChain metadata. Each object accepts an optional List[str] called extra_fields containing the json key from the return object, like extra_fields=["shared_link"].

The connector will add this field to the list of fields the integration needs to function and then add the results to the metadata returned in the Document or Blob, like "metadata" : { "source" : "source, "shared_link" : "shared_link" }. If the field is unavailable for that file, it will be returned as an empty string, like "shared_link" : "".

API referenceโ€‹

For detailed documentation of all BoxLoader features and configurations head to the API reference

Helpโ€‹

If you have questions, you can check out our developer documentation or reach out to use in our developer community.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4