Our new LangChain Academy Course Deep Research with LangGraph is now live!
Enroll for free.
CoNLL-UCoNLL-U is revised version of the CoNLL-X format. Annotations are encoded in plain text files (UTF-8, normalized to NFC, using only the LF character as line break, including an LF character at the end of file) with three types of lines:
- Word lines containing the annotation of a word/token in 10 fields separated by single tab characters; see below.
- Blank lines marking sentence boundaries.
- Comment lines starting with hash (#).
This is an example of how to load a file in CoNLL-U format. The whole file is treated as one document. The example data (conllu.conllu
) is based on one of the standard UD/CoNLL-U examples.
from langchain_community.document_loaders import CoNLLULoader
loader = CoNLLULoader("example_data/conllu.conllu")
[Document(page_content='They buy and sell books.', metadata={'source': 'example_data/conllu.conllu'})]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4