Showing content from https://python.langchain.com/docs/integrations/document_loaders/ below:
Document loaders | 🦜️🔗 LangChain
acreom acreom is a dev-first knowledge base with tasks running on local mark... AgentQLLoader AgentQL's document loader provides structured data extraction from an... AirbyteLoader Airbyte is a data integration platform for ELT pipelines from APIs, d... Airtable * Get your API key here. Alibaba Cloud MaxCompute Alibaba Cloud MaxCompute (previously known as ODPS) is a general purp... Amazon Textract Amazon Textract is a machine learning (ML) service that automatically... Apify Dataset Apify Dataset is a scalable append-only storage with sequential acces... ArcGIS This notebook demonstrates the use of the langchaincommunity.document... ArxivLoader arXiv is an open-access archive for 2 million scholarly articles in t... AssemblyAI Audio Transcripts The AssemblyAIAudioTranscriptLoader allows to transcribe audio files ... AstraDB DataStax Astra DB is a serverless Async Chromium Chromium is one of the browsers supported by Playwright, a library us... AsyncHtml AsyncHtmlLoader loads raw HTML from a list of URLs concurrently. Athena Amazon Athena is a serverless, interactive analytics service built AWS S3 Directory Amazon Simple Storage Service (Amazon S3) is an object storage service AWS S3 File Amazon Simple Storage Service (Amazon S3) is an object storage servic... AZLyrics AZLyrics is a large, legal, every day growing collection of lyrics. Azure AI Data Azure AI Studio provides the capability to upload data assets to clou... Azure Blob Storage Container Azure Blob Storage is Microsoft's object storage solution for the clo... Azure Blob Storage File Azure Files offers fully managed file shares in the cloud that are ac... Azure AI Document Intelligence Azure AI Document Intelligence (formerly known as Azure Form Recogniz... BibTeX BibTeX is a file format and reference management system commonly used... BiliBili Bilibili is one of the most beloved long-form video sites in China. Blackboard Blackboard Learn (previously the Blackboard Learning Management Syste... Blockchain The intention of this notebook is to provide a means of testing funct... Box The langchain-box package provides two methods to index your files fr... Brave Search Brave Search is a search engine developed by Brave Software. Browserbase Browserbase is a developer platform to reliably run, manage, and moni... Browserless Browserless is a service that allows you to run headless Chrome insta... BSHTMLLoader This notebook provides a quick overview for getting started with Beau... Cassandra Cassandra is a NoSQL, row-oriented, highly scalable and highly availa... ChatGPT Data ChatGPT is an artificial intelligence (AI) chatbot developed by OpenA... College Confidential College Confidential gives information on 3,800+ colleges and univers... Concurrent Loader Works just like the GenericLoader but concurrently for those who choo... Confluence Confluence is a wiki collaboration platform designed to save and orga... CoNLL-U CoNLL-U is revised version of the CoNLL-X format. Annotations are enc... Copy Paste This notebook covers how to load a document object from something you... Couchbase Couchbase is an award-winning distributed NoSQL cloud database that d... CSV A comma-separated values (CSV) file is a delimited text file that use... Cube Semantic Layer This notebook demonstrates the process of retrieving Cube's data mode... Datadog Logs Datadog is a monitoring and analytics platform for cloud-scale applic... Dedoc This sample demonstrates the use of Dedoc in combination with LangCha... Diffbot Diffbot is a suite of ML-based products that make it easy to structur... Discord Discord is a VoIP and instant messaging social platform. Users have t... Docling Docling parses PDF, DOCX, PPTX, HTML, and other formats into a rich u... Docugami This notebook covers how to load documents from Docugami. It provides... Docusaurus Docusaurus is a static-site generator which provides out-of-the-box d... Dropbox Dropbox is a file hosting service that brings everything-traditional ... DuckDB DuckDB is an in-process SQL OLAP database management system. Email This notebook shows how to load email (.eml) or Microsoft Outlook (.m... EPub EPUB is an e-book file format that uses the ".epub" file extension. T... Etherscan Etherscan is the leading blockchain explorer, search, API and analyt... EverNote EverNote is intended for archiving and creating notes in which photos... example_data Facebook Chat Messenger) is an American proprietary instant messaging app and platf... Fauna Fauna is a Document Database. Figma Figma is a collaborative web application for interface design. FireCrawl FireCrawl crawls and convert any website into LLM-ready data. It craw... Geopandas Geopandas is an open-source project to make working with geospatial d... Git Git is a distributed version control system that tracks changes in an... GitBook GitBook is a modern documentation platform where teams can document e... GitHub This notebooks shows how you can load issues and pull requests (PRs) ... Glue Catalog The AWS Glue Data Catalog is a centralized metadata repository that a... Google AlloyDB for PostgreSQL AlloyDB is a fully managed relational database service that offers hi... Google BigQuery Google BigQuery is a serverless and cost-effective enterprise data wa... Google Bigtable Bigtable is a key-value and wide-column store, ideal for fast access ... Google Cloud SQL for SQL server Cloud SQL is a fully managed relational database service that offers ... Google Cloud SQL for MySQL Cloud SQL is a fully managed relational database service that offers ... Google Cloud SQL for PostgreSQL Cloud SQL for PostgreSQL is a fully-managed database service that hel... Google Cloud Storage Directory Google Cloud Storage is a managed service for storing unstructured da... Google Cloud Storage File Google Cloud Storage is a managed service for storing unstructured da... Google Firestore in Datastore Mode Firestore in Datastore Mode is a NoSQL document database built for au... Google Drive Google Drive is a file storage and synchronization service developed ... Google El Carro for Oracle Workloads Google El Carro Oracle Operator Google Firestore (Native Mode) Firestore is a serverless document-oriented database that scales to m... Google Memorystore for Redis Google Memorystore for Redis is a fully-managed service that is power... Google Spanner Spanner is a highly scalable database that combines unlimited scalabi... Google Speech-to-Text Audio Transcripts The SpeechToTextLoader allows to transcribe audio files with the Goog... Grobid GROBID is a machine learning library for extracting, parsing, and re-... Gutenberg Project Gutenberg is an online library of free eBooks. Hacker News Hacker News (sometimes abbreviated as HN) is a social news website fo... Huawei OBS Directory The following code demonstrates how to load objects from the Huawei O... Huawei OBS File The following code demonstrates how to load an object from the Huawei... HuggingFace dataset The Hugging Face Hub is home to over 5,000 datasets in more than 100 ... HyperbrowserLoader Hyperbrowser is a platform for running and scaling headless browsers.... iFixit iFixit is the largest, open repair community on the web. The site con... Images This covers how to load images into a document format that we can use... Image captions By default, the loader utilizes the pre-trained Salesforce BLIP image... IMSDb IMSDb is the Internet Movie Script Database. Iugu Iugu is a Brazilian services and software as a service (SaaS) company... Joplin Joplin is an open-source note-taking app. Capture your thoughts and s... JSONLoader This notebook provides a quick overview for getting started with JSON... Jupyter Notebook Jupyter Notebook (formerly IPython Notebook) is a web-based interacti... Kinetica This notebooks goes over how to load documents from Kinetica lakeFS lakeFS provides scalable version control over the data lake, and uses... LangSmith This notebook provides a quick overview for getting started with the ... LarkSuite (FeiShu) LarkSuite is an enterprise collaboration platform developed by ByteDa... LLM Sherpa This notebook covers how to use LLM Sherpa to load files of many type... Mastodon Mastodon is a federated social media and social networking service. MathPixPDFLoader Inspired by Daniel Gross's snippet here//gist.github.com/danielgross/... MediaWiki Dump MediaWiki XML Dumps contain the content of a wiki (wiki pages with al... Merge Documents Loader Merge the documents returned from a set of specified data loaders. mhtml MHTML is a is used both for emails but also for archived webpages. MH... Microsoft Excel The UnstructuredExcelLoader is used to load Microsoft Excel files. Th... Microsoft OneDrive Microsoft OneDrive (formerly SkyDrive) is a file hosting service oper... Microsoft OneNote This notebook covers how to load documents from OneNote. Microsoft PowerPoint Microsoft PowerPoint is a presentation program by Microsoft. Microsoft SharePoint Microsoft SharePoint is a website-based collaboration system that use... Microsoft Word Microsoft Word is a word processor developed by Microsoft. Near Blockchain The intention of this notebook is to provide a means of testing funct... Modern Treasury Modern Treasury simplifies complex payment operations. It is a unifie... MongoDB MongoDB is a NoSQL , document-oriented database that supports JSON-li... Needle Document Loader Needle makes it easy to create your RAG pipelines with minimal effort. News URL This covers how to load HTML news articles from a list of URLs into a... Notion DB 2/2 Notion is a collaboration platform with modified Markdown support tha... Nuclia Nuclia automatically indexes your unstructured data from any internal... Obsidian Obsidian is a powerful and extensible knowledge base Open Document Format (ODT) The Open Document Format for Office Applications (ODF), also known as... Open City Data Socrata provides an API for city open data. Oracle Autonomous Database Oracle autonomous database is a cloud database that uses machine lear... Oracle AI Vector Search: Document Processing Oracle AI Vector Search is designed for Artificial Intelligence (AI) ... Org-mode A Org Mode document is a document editing, formatting, and organizing... Outline Document Loader Outline is an open-source collaborative knowledge base platform desig... Oxylabs Oxylabs is a web intelligence collection platform that enables compan... Pandas DataFrame This notebook goes over how to load data from a pandas DataFrame. parsers PDFMinerLoader This notebook provides a quick overview for getting started with PDFM... PDFPlumber Like PyMuPDF, the output Documents contain detailed metadata about th... Pebblo Safe DocumentLoader Pebblo enables developers to safely load data and promote their Gen A... Polars DataFrame This notebook goes over how to load data from a polars DataFrame. Dell PowerScale Document Loader Dell PowerScale is an enterprise scale out storage system that hosts ... Psychic This notebook covers how to load documents from Psychic. See here for... PubMed PubMed® by The National Center for Biotechnology Information, Nationa... PullMdLoader Loader for converting URLs into Markdown using the pull.md service. PyMuPDFLoader This notebook provides a quick overview for getting started with PyMu... PyMuPDF4LLM This notebook provides a quick overview for getting started with PyMu... PyPDFDirectoryLoader This loader loads all PDF files from a specific directory. PyPDFium2Loader This notebook provides a quick overview for getting started with PyPD... PyPDFLoader This notebook provides a quick overview for getting started with PyPD... PySpark This notebook goes over how to load data from a PySpark DataFrame. Quip Quip is a collaborative productivity software suite for mobile and We... ReadTheDocs Documentation Read the Docs is an open-sourced free software documentation hosting ... Recursive URL The RecursiveUrlLoader lets you recursively scrape all child links fr... Reddit Reddit is an American social news aggregation, content rating, and di... Roam ROAM is a note-taking tool for networked thought, designed to create ... Rockset ⚠️ Deprecation Notice: Rockset Integration Disabled rspace This notebook shows how to use the RSpace document loader to import r... RSS Feeds This covers how to load HTML news articles from a list of RSS feed UR... RST A reStructured Text (RST) file is a file format for textual data used... scrapfly ScrapFly is a web scraping API with headless browser capabilities, pr... ScrapingAnt ScrapingAnt is a web scraping API with headless browser capabilities,... SingleStore The SingleStoreLoader allows you to load documents directly from a Si... Sitemap Extends from the WebBaseLoader, SitemapLoader loads a sitemap from a ... Slack Slack is an instant messaging program. Snowflake This notebooks goes over how to load documents from Snowflake Source Code This notebook covers how to load source code files using a special ap... Spider Spider is the fastest and most affordable crawler and scraper that re... Spreedly Spreedly is a service that allows you to securely store credit cards ... Stripe Stripe is an Irish-American financial services and software as a serv... Subtitle The SubRip file format is described on the Matroska multimedia contai... SurrealDB SurrealDB is an end-to-end cloud-native database designed for modern ... Telegram Telegram Messenger is a globally accessible freemium, cross-platform,... Tencent COS Directory Tencent Cloud Object Storage (COS) is a distributed Tencent COS File Tencent Cloud Object Storage (COS) is a distributed TensorFlow Datasets TensorFlow Datasets is a collection of datasets ready to use, with Te... TiDB TiDB Cloud, is a comprehensive Database-as-a-Service (DBaaS) solution... 2Markdown 2markdown service transforms website content into structured markdown... TOML TOML is a file format for configuration files. It is intended to be e... Trello Trello is a web-based project management and collaboration tool that ... TSV A tab-separated values (TSV) file is a simple, text-based file format... Twitter Twitter is an online social media and social networking service. Unstructured This notebook covers how to use Unstructured document loader to load ... UnstructuredMarkdownLoader This notebook provides a quick overview for getting started with Unst... UnstructuredPDFLoader Unstructured supports a common interface for working with unstructure... Upstage This notebook covers how to get started with UpstageDocumentParseLoad... URL This example covers how to load HTML documents from a list of URLs in... Vsdx A visio file (with extension .vsdx) is associated with Microsoft Visi... Weather OpenWeatherMap is an open-source weather service provider WebBaseLoader This covers how to use WebBaseLoader to load all text from HTML webpa... WhatsApp Chat WhatsApp (also called WhatsApp Messenger) is a freeware, cross-platfo... Wikipedia Wikipedia is a multilingual free online encyclopedia written and main... UnstructuredXMLLoader This notebook provides a quick overview for getting started with Unst... Xorbits Pandas DataFrame This notebook goes over how to load data from a xorbits.pandas DataFr... YouTube audio Building chat or QA applications on YouTube videos is a topic of high... YouTube transcripts YouTube is an online video sharing and social media platform created ... YoutubeLoaderDL Loader for Youtube leveraging the yt-dlp library. Yuque Yuque is a professional cloud-based knowledge base for team collabora... ZeroxPDFLoader ZeroxPDFLoader is a document loader that leverages the Zerox library....
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4