A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://python.langchain.com/docs/integrations/document_loaders/json/ below:

JSONLoader | 🦜️🔗 LangChain

JSONLoader

This notebook provides a quick overview for getting started with JSON document loader. For detailed documentation of all JSONLoader features and configurations head to the API reference.

Overview Integration details Loader features Source Document Lazy Loading Native Async Support JSONLoader ✅ ❌ Setup

To access JSON document loader you'll need to install the langchain-community integration package as well as the jq python package.

Credentials

No credentials are required to use the JSONLoader class.

To enable automated tracing of your model calls, set your LangSmith API key:

Installation

Install langchain-community and jq:

%pip install -qU langchain-community jq 
Initialization

Now we can instantiate our model object and load documents:

from langchain_community.document_loaders import JSONLoader

loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[].content",
text_content=False,
)
Load
docs = loader.load()
docs[0]
Document(metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}, page_content='Bye!')
{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}
Lazy Load
pages = []
for doc in loader.lazy_load():
pages.append(doc)
if len(pages) >= 10:



pages = []
Read from JSON Lines file

If you want to load documents from a JSON Lines file, you pass json_lines=True and specify jq_schema to extract page_content from a single JSON object.

loader = JSONLoader(
file_path="./example_data/facebook_chat_messages.jsonl",
jq_schema=".content",
text_content=False,
json_lines=True,
)

docs = loader.load()
print(docs[0])
page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}
Read specific content keys

Another option is to set jq_schema='.' and provide a content_key in order to only load specific content:

loader = JSONLoader(
file_path="./example_data/facebook_chat_messages.jsonl",
jq_schema=".",
content_key="sender_name",
json_lines=True,
)

docs = loader.load()
print(docs[0])
page_content='User 2' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}
JSON file with jq schema content_key

To load documents from a JSON file using the content_key within the jq schema, set is_content_key_jq_parsable=True. Ensure that content_key is compatible and can be parsed using the jq schema.

loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[]",
content_key=".content",
is_content_key_jq_parsable=True,
)

docs = loader.load()
print(docs[0])
page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}

Generally, we want to include metadata available in the JSON file into the documents that we create from the content.

The following demonstrates how metadata can be extracted using the JSONLoader.

There are some key changes to be noted. In the previous example where we didn't collect the metadata, we managed to directly specify in the schema where the value for the page_content can be extracted from.

In this example, we have to tell the loader to iterate over the records in the messages field. The jq_schema then has to be .messages[]

This allows us to pass the records (dict) into the metadata_func that has to be implemented. The metadata_func is responsible for identifying which pieces of information in the record should be included in the metadata stored in the final Document object.

Additionally, we now have to explicitly specify in the loader, via the content_key argument, the key from the record where the value for the page_content needs to be extracted from.


def metadata_func(record: dict, metadata: dict) -> dict:
metadata["sender_name"] = record.get("sender_name")
metadata["timestamp_ms"] = record.get("timestamp_ms")

return metadata


loader = JSONLoader(
file_path="./example_data/facebook_chat.json",
jq_schema=".messages[]",
content_key="content",
metadata_func=metadata_func,
)

docs = loader.load()
print(docs[0].metadata)
{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1, 'sender_name': 'User 2', 'timestamp_ms': 1675597571851}
API reference

For detailed documentation of all JSONLoader features and configurations head to the API reference: https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.json_loader.JSONLoader.html


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4