WebReaper

WebReaper is a workspace-first web crawling and content filing platform for teams that need to collect, organize, and review website data quickly.

The current product direction is centered on four things:

broad crawling with browser-render fallback
deep page extraction and structured inventory
workspace libraries for filing, labeling, and exporting collected pages
an analyst-facing UI backed by FastAPI and SQLite/Postgres-compatible models

Proxy, repeater, and intruder tooling remain in the product and are useful for analyst workflows, but the primary story is now crawling plus structured content operations.

Core Capabilities

Crawl and extract

Async crawl execution with resumable job tracking
Browser-render fallback for pages that need client-side rendering
Page metadata, headings, contacts, technology, and content extraction
Endpoint and parameter inventory derived from links, forms, and observed browser requests
Duplicate-content, link-health, and content-analysis views

Workspace library

Workspace-scoped crawl boundaries and scope rules
Auto-filing suggestions for category, folder, and labels
Manual filing controls for starring, notes, labels, and folder/category overrides
Workspace-level summaries, recent pages, and filtered library views
JSON and CSV export of library datasets

Analyst tooling

Proxy session management with HTTP history and intercept queue
Repeater for replaying saved requests and comparing responses
Intruder for queued payload fuzzing with result triage
On-demand security findings, triage metadata, and report export

Platform surface

Next.js dashboard with static export support
FastAPI backend with SSE streams and WebSocket/chat plumbing
Alembic migrations and async SQLAlchemy data layer
SQLite by default, with a Postgres-compatible schema layout

Architecture

Backend: FastAPI, SQLAlchemy async ORM, Alembic
Frontend: Next.js App Router, TypeScript
Storage: SQLite local default, Postgres-friendly schema
Streaming: SSE for metrics/logs/progress, WebSocket support for chat/gateway features
Background execution: in-process async job queue

Quick Start

Prerequisites

Python 3.10+
Node.js 20+ with npm/npx
pnpm is preferred, but start.sh will fall back to npx pnpm if pnpm is not installed globally

Start the full local stack

./start.sh

This script:

creates a local virtual environment if needed
installs Python and frontend dependencies
initializes the SQLite database
runs migrations
starts the FastAPI backend on http://localhost:8000
starts the dashboard on http://localhost:3000

Useful endpoints:

Dashboard: http://localhost:3000
API docs: http://localhost:8000/docs
Health check: http://localhost:8000/health

Seed a deterministic demo dataset

export PYTHONPATH=.
export DATABASE_URL='sqlite+aiosqlite:////tmp/webreaper_demo.db'
./.venv/bin/python scripts/seed_demo_data.py

For a fuller walkthrough, see docs/demo-flow.md.

Development Workflow

Backend tests

./.venv/bin/pytest tests

Frontend build and tests

cd web
npx pnpm build
npx pnpm test

Production-style dashboard export

The dashboard is configured with output: 'export'.

cd web
NEXT_PUBLIC_API_URL='http://127.0.0.1:8000' \
NEXT_PUBLIC_WS_URL='ws://127.0.0.1:8000' \
NEXT_PUBLIC_SSE_URL='http://127.0.0.1:8000' \
npx pnpm build

npx pnpm start

pnpm start serves the generated web/out bundle.

Recommended Product Demo

Open the dashboard and verify the live metrics stream.
Start or inspect a crawl from Jobs.
Review extracted content in Data.
Open a workspace and review or edit library filings.
Inspect captured traffic in Proxy.
Replay a request in Repeater.
Review a fuzzing job in Intruder.
Review findings and exports in Security.

Configuration Notes

Local/self-hosted usage does not require license enforcement by default.
To enable the legacy gated behavior explicitly, set WEBREAPER_REQUIRE_LICENSE=1.
Missing Supabase or Stripe configuration will degrade those related features, but the local crawler/library workflow still runs in development mode.

Safety

Use interception, replay, fuzzing, or active security testing features only against systems you own or are explicitly authorized to assess.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
.kiro/specs/webreaper-platform-vnext		.kiro/specs/webreaper-platform-vnext
alembic		alembic
config		config
database		database
desktop		desktop
docs		docs
hooks		hooks
output/playwright		output/playwright
scripts		scripts
server		server
tests		tests
web		web
webreaper		webreaper
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
DESIGN_GENRES_AND_UI.md		DESIGN_GENRES_AND_UI.md
Dockerfile		Dockerfile
FIXSHEET.md		FIXSHEET.md
MERGE_NOTES.md		MERGE_NOTES.md
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
install.sh		install.sh
package.json		package.json
pytest.ini		pytest.ini
requirements.dev.txt		requirements.dev.txt
requirements.postgres.txt		requirements.postgres.txt
requirements.txt		requirements.txt
setup.py		setup.py
start.sh		start.sh
webreaper.py		webreaper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebReaper

Core Capabilities

Crawl and extract

Workspace library

Analyst tooling

Platform surface

Architecture

Quick Start

Prerequisites

Start the full local stack

Seed a deterministic demo dataset

Development Workflow

Backend tests

Frontend build and tests

Production-style dashboard export

Recommended Product Demo

Configuration Notes

Safety

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WebReaper

Core Capabilities

Crawl and extract

Workspace library

Analyst tooling

Platform surface

Architecture

Quick Start

Prerequisites

Start the full local stack

Seed a deterministic demo dataset

Development Workflow

Backend tests

Frontend build and tests

Production-style dashboard export

Recommended Product Demo

Configuration Notes

Safety

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages