This Model Context Protocol (MCP) server provides a tool for scraping webpages and converting them to markdown format using Puppeteer, Readability, and Turndown. It features AI-driven interaction capabilities to handle cookies, captchas, and other interactive elements automatically.
Now easily runnable via npx
!
npx
package.The recommended way to use this server is via npx
, which ensures you're running the latest version without needing to clone or manually install.
Prerequisites: Ensure you have Node.js and npm installed.
Environment Setup: The server requires an OPENAI_API_KEY
. You can provide this and other optional configurations in two ways:
.env
file: Create a .env
file in the directory where you will run the npx
command.Example .env
file or shell exports:
# Required OPENAI_API_KEY=your_api_key_here # Optional (defaults shown) # VISION_MODEL=gpt-4.1 # API_BASE_URL=https://api.openai.com/v1 # Uncomment to override # TRANSPORT_TYPE=stdio # Options: stdio, sse, http # USE_SSE=true # Deprecated: use TRANSPORT_TYPE=sse instead # PORT=3001 # Only used in sse/http modes # DISABLE_HEADLESS=true # Uncomment to see the browser in action
Run the Server: Open your terminal and run:
npx -y puppeteer-vision-mcp-server
-y
flag automatically confirms any prompts from npx
.stdio
mode. Set TRANSPORT_TYPE=sse
or TRANSPORT_TYPE=http
for HTTP server modes.This server is designed to be integrated as a tool within an MCP-compatible LLM orchestrator. Here's an example configuration snippet:
{ "mcpServers": { "web-scraper": { "command": "npx", "args": ["-y", "puppeteer-vision-mcp-server"], "env": { "OPENAI_API_KEY": "YOUR_OPENAI_API_KEY_HERE", // Optional: // "VISION_MODEL": "gpt-4.1", // "API_BASE_URL": "https://api.example.com/v1", // "TRANSPORT_TYPE": "stdio", // or "sse" or "http" // "DISABLE_HEADLESS": "true" // To see the browser during operations } } // ... other MCP servers } }
When configured this way, the MCP orchestrator will manage the lifecycle of the puppeteer-vision-mcp-server
process.
Regardless of how you run the server (NPX or local development), it uses the following environment variables:
OPENAI_API_KEY
: (Required) Your API key for accessing the vision model.VISION_MODEL
: (Optional) The model to use for vision analysis.
gpt-4.1
API_BASE_URL
: (Optional) Custom API endpoint URL.
TRANSPORT_TYPE
: (Optional) The transport protocol to use.
stdio
(default), sse
, http
stdio
: Direct process communication (recommended for most use cases)sse
: Server-Sent Events over HTTP (legacy mode)http
: Streamable HTTP transport with session managementUSE_SSE
: (Optional, deprecated) Set to true
to enable SSE mode over HTTP.
TRANSPORT_TYPE=sse
instead.PORT
: (Optional) The port for the HTTP server in SSE or HTTP mode.
3001
.DISABLE_HEADLESS
: (Optional) Set to true
to run the browser in visible mode.
false
(browser runs in headless mode).The server supports three communication modes:
TRANSPORT_TYPE=sse
in your environment.PORT
(default: 3001).http://localhost:3001/sse
TRANSPORT_TYPE=http
in your environment.PORT
(default: 3001).http://localhost:3001/mcp
The server provides a scrape-webpage
tool.
Tool Parameters:
url
(string, required): The URL of the webpage to scrape.autoInteract
(boolean, optional, default: true): Whether to automatically handle interactive elements.maxInteractionAttempts
(number, optional, default: 3): Maximum number of AI interaction attempts.waitForNetworkIdle
(boolean, optional, default: true): Whether to wait for network to be idle before processing.Response Format:
The tool returns its result in a structured format:
content
: An array containing a single text object with the raw markdown of the scraped webpage.metadata
: Contains additional information:
message
: Status message.success
: Boolean indicating success.contentSize
: Size of the content in characters (on success).Example Success Response:
{ "content": [ { "type": "text", "text": "# Page Title\n\nThis is the content..." } ], "metadata": { "message": "Scraping successful", "success": true, "contentSize": 8734 } }
Example Error Response:
{ "content": [ { "type": "text", "text": "" } ], "metadata": { "message": "Error scraping webpage: Failed to load the URL", "success": false } }
The system uses vision-capable AI models (configurable via VISION_MODEL
and API_BASE_URL
) to analyze screenshots of web pages and decide on actions like clicking, typing, or scrolling to bypass overlays and consent forms. This process repeats up to maxInteractionAttempts
.
After interactions, Mozilla's Readability extracts the main content, which is then sanitized and converted to Markdown using Turndown with custom rules for code blocks and tables.
Installation & Development (for Modifying the Code)If you wish to contribute, modify the server, or run a local development version:
Clone the Repository:
git clone https://github.com/djannot/puppeteer-vision-mcp.git cd puppeteer-vision-mcp
Install Dependencies:
Build the Project:
Set Up Environment: Create a .env
file in the project's root directory with your OPENAI_API_KEY
and any other desired configurations (see "Environment Configuration Details" above).
Run for Development:
npm start # Starts the server using the local build
Or, for automatic rebuilding on changes:
You can modify the behavior of the scraper by editing:
src/ai/vision-analyzer.ts
(analyzePageWithAI
function): Customize the AI prompt.src/ai/page-interactions.ts
(executeAction
function): Add new action types.src/scrapers/webpage-scraper.ts
(visitWebPage
function): Change Puppeteer options.src/utils/markdown-formatters.ts
: Adjust Turndown rules for Markdown conversion.Key dependencies include:
@modelcontextprotocol/sdk
puppeteer
, puppeteer-extra
@mozilla/readability
, jsdom
turndown
, sanitize-html
openai
(or compatible API for vision models)express
(for SSE mode)zod
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4