A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://python.langchain.com/docs/integrations/tools/scrapeless_universal_scraping/ below:

Scrapeless | 🦜️🔗 LangChain

Scrapeless

Scrapeless offers flexible and feature-rich data acquisition services with extensive parameter customization and multi-format export support. These capabilities empower LangChain to integrate and leverage external data more effectively. The core functional modules include:

DeepSerp

Universal Scraping

Crawler

Overview Integration details Tool features Native async Returns artifact Return data ✅ ✅ html, markdown, links, metadata, structured content Setup

The integration lives in the langchain-scrapeless package. !pip install langchain-scrapeless

Credentials

You'll need a Scrapeless API key to use this tool. You can set it as an environment variable:

import os

os.environ["SCRAPELESS_API_KEY"] = "your-api-key"
Instantiation

Here we show how to instantiate an instance of the Scrapeless Universal Scraping Tool. This tool allows you to scrape any website using a headless browser with JavaScript rendering capabilities, customizable output types, and geo-specific proxy support.

The tool accepts the following parameters during instantiation:

Invocation Basic Usage
from langchain_scrapeless import ScrapelessUniversalScrapingTool

tool = ScrapelessUniversalScrapingTool()


result = tool.invoke("https://example.com")
print(result)
<!DOCTYPE html><html><head>
<title>Example Domain</title>

<meta charset="utf-8">
<meta http-equiv="Content-type" content="text/html; charset=utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<style type="text/css">
body {
background-color: #f0f0f2;
margin: 0;
padding: 0;
font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;

}
div {
width: 600px;
margin: 5em auto;
padding: 2em;
background-color: #fdfdff;
border-radius: 0.5em;
box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
}
a:link, a:visited {
color: #38488f;
text-decoration: none;
}
@media (max-width: 700px) {
div {
margin: 0 auto;
width: auto;
}
}
</style>
</head>

<body>
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.</p>
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>


</body></html>
Advanced Usage with Parameters
from langchain_scrapeless import ScrapelessUniversalScrapingTool

tool = ScrapelessUniversalScrapingTool()

result = tool.invoke({"url": "https://exmaple.com", "response_type": "markdown"})
print(result)
# Well hello there.

Welcome to exmaple.com.
Chances are you got here by mistake (example.com, anyone?)
Use within an agent
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessUniversalScrapingTool
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI()

tool = ScrapelessUniversalScrapingTool()


tools = [tool]
agent = create_react_agent(llm, tools)

for chunk in agent.stream(
{
"messages": [
(
"human",
"Use the scrapeless scraping tool to fetch https://www.scrapeless.com/en and extract the h1 tag.",
)
]
},
stream_mode="values",
):
chunk["messages"][-1].pretty_print()
================================ Human Message =================================

Use the scrapeless scraping tool to fetch https://www.scrapeless.com/en and extract the h1 tag.
================================== Ai Message ==================================
Tool Calls:
scrapeless_universal_scraping (call_jBrvMVL2ixhvf6gklhi7Gqtb)
Call ID: call_jBrvMVL2ixhvf6gklhi7Gqtb
Args:
url: https://www.scrapeless.com/en
outputs: headings
================================= Tool Message =================================
Name: scrapeless_universal_scraping

{"headings":["Effortless Web Scraping Toolkitfor Business and Developers","4.8","4.5","8.5","A Flexible Toolkit for Accessing Public Web Data","Deep SerpApi","Scraping Browser","Universal Scraping API","Customized Services","From Simple Data Scraping to Complex Anti-Bot Challenges, Scrapeless Has You Covered.","Fully Compatible with Key Programming Languages and Tools","Enterprise-level Data Scraping Solution","Customized Data Scraping Solutions","High Concurrency and High-Performance Scraping","Data Cleaning and Transformation","Real-Time Data Push and API Integration","Data Security and Privacy Protection","Enterprise-level SLA","Why Scrapeless: Simplify Your Data Flow Effortlessly.","Articles","Organized Fresh Data","Prices","No need to hassle with browser maintenance","Reviews","Only pay for successful requests","Products","Fully scalable","Unleash Your Competitive Edgein Data within the Industry","Regulate Compliance for All Users","Web Scraping Blog","Scrapeless MCP Server Is Officially Live! Build Your Ultimate AI-Web Connector","Product Updates | New Profile Feature","How to Track Your Ranking on ChatGPT?","For Scraping","For Data","For AI","Top Scraper API","Learning Center","Legal"]}
================================== Ai Message ==================================

The h1 tag extracted from the website https://www.scrapeless.com/en is "Effortless Web Scraping Toolkit for Business and Developers".
API reference

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4