Skip to content

MarcinDudekDev/the-data-collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Data Collector

Web scraping APIs for Bluesky, Substack, and Hacker News with x402 micropayment support. Built with FastAPI.

Live: https://frog03-20494.wykr.es

Features

  • Search Bluesky posts (AT Protocol), Substack newsletters, and Hacker News stories
  • Returns structured JSON with engagement metrics
  • x402 micropayments ($0.05 USDC on Base per call) — no account needed
  • API key authentication for regular use
  • A2A Agent Card and MCP discovery endpoints
  • OpenAPI spec with x402 payment metadata

API Endpoints

Method Endpoint Description Price
POST /api/bluesky/search Search Bluesky posts by keyword $0.05
POST /api/substack/search Scrape Substack newsletter articles $0.05
POST /api/hn/search Search Hacker News stories $0.05

Quick Start

# Clone
git clone https://github.com/MarcinDudekDev/the-data-collector.git
cd the-data-collector

# Install
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env with your APIFY_TOKEN and API_KEY

# Run
uvicorn server:app --host 0.0.0.0 --port 8001

Docker

docker build -t the-data-collector .
docker run -p 8001:8001 --env-file .env the-data-collector

Environment Variables

Variable Required Description
APIFY_TOKEN Yes Apify API token for running scrapers
API_KEY No API key for authenticated access (X-API-Key header)
BASE_URL No Public URL of the server (default: https://frog03-20494.wykr.es)
PAY_TO No Wallet address for x402 payments
PRICE_ATOMIC No Price per call in USDC atomic units (default: 50000 = $0.05)

Authentication

x402 Micropayments (no account needed)

Send a POST request without credentials. You'll receive a 402 response with payment requirements. Pay $0.05 USDC on Base — settlement is instant.

# First call returns 402 with payment details
curl -X POST https://frog03-20494.wykr.es/api/hn/search \
  -H "Content-Type: application/json" \
  -d '{"searchTerms": ["AI agents"]}'

API Key

curl -X POST https://frog03-20494.wykr.es/api/hn/search \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-key" \
  -d '{"searchTerms": ["AI agents"], "maxResults": 5}'

Discovery Endpoints

Endpoint Protocol
/.well-known/mcp.json MCP (Model Context Protocol)
/.well-known/agent-card.json A2A (Agent-to-Agent)
/.well-known/x402 x402 payment discovery
/.well-known/openapi.json OpenAPI 3.1 spec
/health Health check

MCP Client Configuration

{
  "mcpServers": {
    "the-data-collector": {
      "url": "https://frog03-20494.wykr.es/.well-known/mcp.json"
    }
  }
}

License

MIT

About

HN, Bluesky, Substack scraping MCP server with x402 micropayments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors