This notebook provides a quick overview for getting started with UnstructuredXMLLoader document loader. The UnstructuredXMLLoader
is used to load XML
files. The loader works with .xml
files. The page content will be the text extracted from the XML tags.
To access UnstructuredXMLLoader document loader you'll need to install the langchain-community
integration package.
No credentials are needed to use the UnstructuredXMLLoader
To enable automated tracing of your model calls, set your LangSmith API key:
InstallationInstall langchain_community.
%pip install -qU langchain_community
Initialization
Now we can instantiate our model object and load documents:
from langchain_community.document_loaders import UnstructuredXMLLoader
loader = UnstructuredXMLLoader(
"./example_data/factbook.xml",
)
Load
docs = loader.load()
docs[0]
Document(metadata={'source': './example_data/factbook.xml'}, page_content='United States\n\nWashington, DC\n\nJoe Biden\n\nBaseball\n\nCanada\n\nOttawa\n\nJustin Trudeau\n\nHockey\n\nFrance\n\nParis\n\nEmmanuel Macron\n\nSoccer\n\nTrinidad & Tobado\n\nPort of Spain\n\nKeith Rowley\n\nTrack & Field')
{'source': './example_data/factbook.xml'}
Lazy Load
page = []
for doc in loader.lazy_load():
page.append(doc)
if len(page) >= 10:
page = []
API reference
For detailed documentation of all __ModuleName__Loader features and configurations head to the API reference: https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.xml.UnstructuredXMLLoader.html
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4