A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://python.langchain.com/docs/integrations/document_loaders/scrapfly/ below:

scrapfly | 🦜️🔗 LangChain

ScrapFly is a web scraping API with headless browser capabilities, proxies, and anti-bot bypass. It allows for extracting web page data into accessible LLM markdown or text.

from langchain_community.document_loaders import ScrapflyLoader

scrapfly_loader = ScrapflyLoader(
["https://web-scraping.dev/products"],
api_key="Your ScrapFly API key",
continue_on_failure=True,
)


documents = scrapfly_loader.load()
print(documents)

The ScrapflyLoader also allows passing ScrapeConfig object for customizing the scrape request. See the documentation for the full feature details and their API params: https://scrapfly.io/docs/scrape-api/getting-started

from langchain_community.document_loaders import ScrapflyLoader

scrapfly_scrape_config = {
"asp": True,
"render_js": True,
"proxy_pool": "public_residential_pool",
"country": "us",
"auto_scroll": True,
"js": "",
}

scrapfly_loader = ScrapflyLoader(
["https://web-scraping.dev/products"],
api_key="Your ScrapFly API key",
continue_on_failure=True,
scrape_config=scrapfly_scrape_config,
scrape_format="markdown",
)


documents = scrapfly_loader.load()
print(documents)

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4