RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/just-every/mcp-read-website-fast below:

just-every/mcp-read-website-fast: Quickly reads webpages and converts to markdown for fast, token efficient web scraping

@just-every/mcp-read-website-fast

Fast, token-efficient web content extraction for AI agents - converts websites to clean Markdown.

Existing MCP web crawlers are slow and consume large quantities of tokens. This pauses the development process and provides incomplete results as LLMs need to parse whole web pages.

This MCP package fetches web pages locally, strips noise, and converts content to clean Markdown while preserving links. Designed for Claude Code, IDEs and LLM pipelines with minimal token footprint. Crawl sites locally with minimal dependencies.

Note: This package now uses @just-every/crawl for its core crawling and markdown conversion functionality.

Fast startup using official MCP SDK with lazy loading for optimal performance
Content extraction using Mozilla Readability (same as Firefox Reader View)
HTML to Markdown conversion with Turndown + GFM support
Smart caching with SHA-256 hashed URLs
Polite crawling with robots.txt support and rate limiting
Concurrent fetching with configurable depth crawling
Stream-first design for low memory usage
Link preservation for knowledge graphs
Optional chunking for downstream processing

claude mcp add read-website-fast -s user -- npx -y @just-every/mcp-read-website-fast

code --add-mcp '{"name":"read-website-fast","command":"npx","args":["-y","@just-every/mcp-read-website-fast"]}'

cursor://anysphere.cursor-deeplink/mcp/install?name=read-website-fast&config=eyJyZWFkLXdlYnNpdGUtZmFzdCI6eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqdXN0LWV2ZXJ5L21jcC1yZWFkLXdlYnNpdGUtZmFzdCJdfX0=

Settings → Tools → AI Assistant → Model Context Protocol (MCP) → Add

Choose “As JSON” and paste:

{"command":"npx","args":["-y","@just-every/mcp-read-website-fast"]}

Or, in the chat window, type /add and fill in the same JSON—both paths land the server in a single step.

Raw JSON (works in any MCP client)

{
  "mcpServers": {
    "read-website-fast": {
      "command": "npx",
      "args": ["-y", "@just-every/mcp-read-website-fast"]
    }
  }
}

Drop this into your client’s mcp.json (e.g. .vscode/mcp.json, ~/.cursor/mcp.json, or .mcp.json for Claude).

Fast startup using official MCP SDK with lazy loading for optimal performance
Content extraction using Mozilla Readability (same as Firefox Reader View)
HTML to Markdown conversion with Turndown + GFM support
Smart caching with SHA-256 hashed URLs
Polite crawling with robots.txt support and rate limiting
Concurrent fetching with configurable depth crawling
Stream-first design for low memory usage
Link preservation for knowledge graphs
Optional chunking for downstream processing

read_website - Fetches a webpage and converts it to clean markdown
- Parameters:
  - url (required): The HTTP/HTTPS URL to fetch
  - pages (optional): Maximum number of pages to crawl (default: 1, max: 100)

read-website-fast://status - Get cache statistics
read-website-fast://clear-cache - Clear the cache directory

npm install
npm run build

npm run dev fetch https://example.com/article

npm run dev fetch https://example.com --depth 2 --concurrency 5

# Markdown only (default)
npm run dev fetch https://example.com

# JSON output with metadata
npm run dev fetch https://example.com --output json

# Both URL and markdown
npm run dev fetch https://example.com --output both

-p, --pages <number> - Maximum number of pages to crawl (default: 1)
-c, --concurrency <number> - Max concurrent requests (default: 3)
--no-robots - Ignore robots.txt
--all-origins - Allow cross-origin crawling
-u, --user-agent <string> - Custom user agent
--cache-dir <path> - Cache directory (default: .cache)
-t, --timeout <ms> - Request timeout in milliseconds (default: 30000)
-o, --output <format> - Output format: json, markdown, or both (default: markdown)

The MCP server includes automatic restart capability by default for improved reliability:

Automatically restarts the server if it crashes
Handles unhandled exceptions and promise rejections
Implements exponential backoff (max 10 attempts in 1 minute)
Logs all restart attempts for monitoring
Gracefully handles shutdown signals (SIGINT, SIGTERM)

For development/debugging without auto-restart:

# Run directly without restart wrapper
npm run serve:dev

mcp/
├── src/
│   ├── crawler/        # URL fetching, queue management, robots.txt
│   ├── parser/         # DOM parsing, Readability, Turndown conversion
│   ├── cache/          # Disk-based caching with SHA-256 keys
│   ├── utils/          # Logger, chunker utilities
│   ├── index.ts        # CLI entry point
│   ├── serve.ts        # MCP server entry point
│   └── serve-restart.ts # Auto-restart wrapper

# Run in development mode
npm run dev fetch https://example.com

# Build for production
npm run build

# Run tests
npm test

# Type checking
npm run typecheck

# Linting
npm run lint

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

Increase timeout with -t flag
Check network connectivity
Verify URL is accessible

Some sites block automated access
Try custom user agent with -u flag
Check if site requires JavaScript (not supported)

MIT

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4