The #1 most requested utility for AI agents — professional web scraping, screenshots, and content extraction via MCP + REST API.
🌐 Clean Content Extraction — Extract readable text/markdown from any webpage (like Readability)
🎯 Structured Scraping — Extract specific data using CSS selectors
📸 Screenshots — Capture full-page or viewport screenshots with Playwright
🔗 Link Extraction — Get all links from a page with optional regex filtering
📋 Metadata Extraction — Extract title, description, Open Graph tags, favicon, etc
🔍 Google Search — Search Google and get results programmatically
Add to your MCP settings file (cline_mcp_settings.json or similar):
{
"mcpServers": {
"agent-scraper": {
"url": "https://agent-scraper-mcp.onrender.com/mcp"
}
}
}Base URL: https://agent-scraper-mcp.onrender.com
curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/scrape_url \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article",
"format": "markdown"
}'Response:
{
"success": true,
"url": "https://example.com/article",
"title": "Article Title",
"content": "# Article Title\n\nClean markdown content...",
"format": "markdown"
}curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/scrape_structured \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product",
"selectors": {
"title": "h1.product-title",
"price": ".price",
"reviews": ".review-text"
}
}'Response:
{
"success": true,
"url": "https://example.com/product",
"data": {
"title": "Product Name",
"price": "$29.99",
"reviews": ["Great product!", "Worth the money"]
}
}curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/screenshot_url \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"width": 1280,
"height": 720,
"full_page": false
}'Response:
{
"success": true,
"url": "https://example.com",
"image": "iVBORw0KGgoAAAANSUhEUgAA...",
"width": 1280,
"height": 720,
"full_page": false
}curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/extract_links \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"filter": "https://example.com/blog/.*"
}'curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/extract_meta \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/search_google \
-H "Content-Type: application/json" \
-d '{
"query": "python web scraping",
"num_results": 10
}'- 50 requests per IP per day
- All tools included
- No credit card required
After free tier exhausted:
- Scraping tools: $0.005/request (scrape_url, scrape_structured, extract_links, extract_meta, search_google)
- Screenshot tool: $0.01/request (higher due to compute cost)
Payment via HTTP 402 with crypto wallet:
- Wallet address:
0x8E844a7De89d7CfBFe9B4453E65935A22F146aBB - Include
X-Paymentheader with payment proof
Extract clean, readable content from any webpage (like Readability).
Parameters:
url(string, required): URL to scrapeformat(string, optional): Output format —text,markdown, orhtml(default:markdown)
Returns: {success, url, title, content, format}
Extract specific data using CSS selectors.
Parameters:
url(string, required): URL to scrapeselectors(object, required): Dict ofname → CSS selector
Returns: {success, url, data}
Example selectors:
{
"title": "h1.post-title",
"author": ".author-name",
"price": "span.price",
"images": "img.product-image"
}Capture a screenshot of any webpage.
Parameters:
url(string, required): URL to screenshotwidth(int, optional): Viewport width (default: 1280)height(int, optional): Viewport height (default: 720)full_page(bool, optional): Capture full scrollable page (default: false)
Returns: {success, url, image, width, height, full_page}
Image is base64-encoded PNG.
Extract all links from a webpage.
Parameters:
url(string, required): URL to scrapefilter(string, optional): Regex pattern to filter URLs
Returns: {success, url, links, count}
Links array contains {text, href} objects.
Extract metadata from a webpage.
Parameters:
url(string, required): URL to scrape
Returns: {success, url, meta}
Meta object includes:
title: Page titledescription: Meta descriptioncanonical: Canonical URLfavicon: Favicon URLog: Open Graph tagstwitter: Twitter Card tags
Search Google and get results.
Parameters:
query(string, required): Search querynum_results(int, optional): Number of results (default: 10)
Returns: {success, query, results, count}
Results array contains {title, url, snippet} objects.
# Clone repo
git clone https://github.com/aparajithn/agent-scraper-mcp.git
cd agent-scraper-mcp
# Install dependencies
pip install -e ".[dev]"
# Install Playwright browsers
playwright install chromium --with-deps
# Run server
uvicorn src.main:app --reload --port 8080# Initialize
curl -X POST http://localhost:8080/mcp -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}'
# List tools
curl -X POST http://localhost:8080/mcp -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
# Call scrape_url
curl -X POST http://localhost:8080/mcp -d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"scrape_url","arguments":{"url":"https://example.com","format":"markdown"}}}'# Build
docker build -t agent-scraper-mcp .
# Run
docker run -p 8080:8080 -e PUBLIC_HOST=localhost agent-scraper-mcpDeployed on Render (free tier):
- Service:
agent-scraper-mcp - Runtime: Docker
- Region: Ohio
- Auto-deploy from GitHub:
aparajithn/agent-scraper-mcp
Environment variables:
PUBLIC_HOST:agent-scraper-mcp.onrender.comX402_WALLET_ADDRESS:0x8E844a7De89d7CfBFe9B4453E65935A22F146aBB
- Python 3.11 — Modern async/await
- FastAPI — REST API framework
- FastMCP — MCP protocol implementation (Streamable HTTP)
- Playwright — Browser automation for screenshots
- httpx — Fast async HTTP client
- BeautifulSoup4 — HTML parsing
- readability-lxml — Content extraction (like Firefox Reader View)
MIT License — see LICENSE for details.
- Issues: GitHub Issues
- Email: support@agent-scraper.com
- Docs: https://agent-scraper-mcp.onrender.com/docs
Built for AI agents by AI engineers 🤖