Supercharge your LangChain agents with AI-powered web scraping capabilities. LangChain-ScrapeGraph provides a seamless integration between LangChain and ScrapeGraph AI, enabling your agents to extract structured data from websites using natural language.
🔗 ScrapeGraph API & SDKsIf you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API here!
We offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:
pip install langchain-scrapegraph
Convert any webpage into clean, formatted markdown.
from langchain_scrapegraph.tools import MarkdownifyTool tool = MarkdownifyTool() markdown = tool.invoke({"website_url": "https://example.com"}) print(markdown)
Extract structured data from any webpage using natural language prompts.
from langchain_scrapegraph.tools import SmartScraperTool # Initialize the tool (uses SGAI_API_KEY from environment) tool = SmartscraperTool() # Extract information using natural language result = tool.invoke({ "website_url": "https://www.example.com", "user_prompt": "Extract the main heading and first paragraph" }) print(result)
Search and extract structured information from the web using natural language prompts.
from langchain_scrapegraph.tools import SearchScraperTool # Initialize the tool (uses SGAI_API_KEY from environment) tool = SearchScraperTool() # Search and extract information using natural language result = tool.invoke({ "user_prompt": "What are the key features and pricing of ChatGPT Plus?" }) print(result) # { # "product": { # "name": "ChatGPT Plus", # "description": "Premium version of ChatGPT..." # }, # "features": [...], # "pricing": {...}, # "reference_urls": [ # "https://openai.com/chatgpt", # ... # ] # }🔍 Using Output Schemas with SearchscraperTool
You can define the structure of the output using Pydantic models:
from typing import List, Dict from pydantic import BaseModel, Field from langchain_scrapegraph.tools import SearchScraperTool class ProductInfo(BaseModel): name: str = Field(description="Product name") features: List[str] = Field(description="List of product features") pricing: Dict[str, Any] = Field(description="Pricing information") reference_urls: List[str] = Field(description="Source URLs for the information") # Initialize with schema tool = SearchScraperTool(llm_output_schema=ProductInfo) # The output will conform to the ProductInfo schema result = tool.invoke({ "user_prompt": "What are the key features and pricing of ChatGPT Plus?" }) print(result) # { # "name": "ChatGPT Plus", # "features": [ # "GPT-4 access", # "Faster response speed", # ... # ], # "pricing": { # "amount": 20, # "currency": "USD", # "period": "monthly" # }, # "reference_urls": [ # "https://openai.com/chatgpt", # ... # ] # }
from langchain.agents import initialize_agent, AgentType from langchain_scrapegraph.tools import SmartScraperTool from langchain_openai import ChatOpenAI # Initialize tools tools = [ SmartScraperTool(), ] # Create an agent agent = initialize_agent( tools=tools, llm=ChatOpenAI(temperature=0), agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True ) # Use the agent response = agent.run(""" Visit example.com, make a summary of the content and extract the main heading and first paragraph """)
Set your ScrapeGraph API key in your environment:
export SGAI_API_KEY="your-api-key-here"
Or set it programmatically:
import os os.environ["SGAI_API_KEY"] = "your-api-key-here"
This project is licensed under the MIT License - see the LICENSE file for details.
This project is built on top of:
Made with ❤️ by ScrapeGraph AI
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4