A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://python.langchain.com/docs/integrations/providers/unstructured/ below:

Unstructured | 🦜️🔗 LangChain

Unstructured

The unstructured package from Unstructured.IO extracts clean text from raw source documents like PDFs and Word documents. This page covers how to use the unstructured ecosystem within LangChain.

Installation and Setup

If you are using a loader that runs locally, use the following steps to get unstructured and its dependencies running.

The Unstructured API requires API keys to make requests. You can request an API key here and start using it today! Checkout the README here here to get started making API calls. We'd love to hear your feedback, let us know how it goes in our community slack. And stay tuned for improvements to both quality and performance! Check out the instructions here if you'd like to self-host the Unstructured API or run it locally.

Data Loaders

The primary usage of Unstructured is in data loaders.

UnstructuredLoader

See a usage example to see how you can use this loader for both partitioning locally and remotely with the serverless Unstructured API.

from langchain_unstructured import UnstructuredLoader
UnstructuredCHMLoader

CHM means Microsoft Compiled HTML Help.

from langchain_community.document_loaders import UnstructuredCHMLoader
UnstructuredCSVLoader

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.

See a usage example.

from langchain_community.document_loaders import UnstructuredCSVLoader
UnstructuredEmailLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredEmailLoader
UnstructuredEPubLoader

EPUB is an e-book file format that uses the “.epub” file extension. The term is short for electronic publication and is sometimes styled ePub. EPUB is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers.

See a usage example.

from langchain_community.document_loaders import UnstructuredEPubLoader
UnstructuredExcelLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredExcelLoader
UnstructuredFileIOLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredFileIOLoader
UnstructuredHTMLLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredHTMLLoader
UnstructuredImageLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredImageLoader
UnstructuredMarkdownLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredMarkdownLoader
UnstructuredODTLoader

The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.

See a usage example.

from langchain_community.document_loaders import UnstructuredODTLoader
UnstructuredOrgModeLoader

An Org Mode document is a document editing, formatting, and organizing mode, designed for notes, planning, and authoring within the free software text editor Emacs.

See a usage example.

from langchain_community.document_loaders import UnstructuredOrgModeLoader
UnstructuredPDFLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredPDFLoader
UnstructuredPowerPointLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredPowerPointLoader
UnstructuredRSTLoader

A reStructured Text (RST) file is a file format for textual data used primarily in the Python programming language community for technical documentation.

See a usage example.

from langchain_community.document_loaders import UnstructuredRSTLoader
UnstructuredRTFLoader

See a usage example in the API documentation.

from langchain_community.document_loaders import UnstructuredRTFLoader
UnstructuredTSVLoader

A tab-separated values (TSV) file is a simple, text-based file format for storing tabular data. Records are separated by newlines, and values within a record are separated by tab characters.

See a usage example.

from langchain_community.document_loaders import UnstructuredTSVLoader
UnstructuredURLLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredURLLoader
UnstructuredWordDocumentLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredWordDocumentLoader
UnstructuredXMLLoader

See a usage example.

from langchain_community.document_loaders import UnstructuredXMLLoader

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4