OpenContracts is an GPL-3 enterprise document analytics tool. It supports multiple formats - including PDF and txt-based formats (with more on the way). It also supports multiple document ingestion pipelines with a pluggable architecture designed to make supporting new formats and ingestion engines easy - see our Docling Integration for an example. Writing your own custom document analytics tools where the results get displayed beautifully over the original document is easy. We also support mass document data extraction with a LlamaIndex wrapper.
PDF-Annotation and Analysis: TXT-Based Format Annotation and Analysis: Rapidly Deployable Bespoke Analytics [DEVELOPING] Document Management Ok, now tell me more. What Does it Do?OpenContracts provides several key features:
Corpuses
)We recommend you browse our docs via our Mkdocs Site. You can also view the docs in the repo:
The core idea here - besides providing a platform to analyze contracts - is an open and standardized architecture that makes data extremely portable. Powering this is a set of data standards to describe the text and layout blocks on a PDF page:
Modern, Pluggable Document Processing PipelineOpenContracts features a powerful, modular pipeline system for processing documents. The architecture supports easy creation and integration of custom parsers, embedders, and thumbnail generators:
Each pipeline component inherits from a base class that defines a clear interface:
Learn more about:
The modular design makes it easy to add custom processors - just inherit from the appropriate base class and implement the required methods. See our pipeline documentation for details on creating your own components.
At the moment, we only support PDF and text-based formats (like plaintext and MD). With our new parsing pipeline, we can easily support other ooxml office formats like docx and xlsx, HOWEVER, open source viewers and editors are a rarity. One possible route is to leverage the many ooxml --> MD tools that now exist. This will be a reasonably good solution for the majority of documents once we add a markdown viewer and annotator (see our roadmap).
Special thanks to AllenAI's PAWLS project and Nlmatics nlm-ingestor. They've pioneered a number of features and flows, and we are using their code in some parts of the application.
NLmatics was also the creator of and inspiration for our data extract grid and parsing pipeline UI/UX:
The company was ahead of its time, and, while the product is no longer available, OpenContracts aims to take some of its best and most innovative features and make them open source and available to the masses!
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4