Ship a team of specialized AI agents — with human-in-the-loop approvals, full observability, real enterprise connectors, and defense-in-depth security — to production.
Quickstart · Architecture · Features · API · Deployment · Roadmap · Contributing
- What is ForgeFlow?
- Why ForgeFlow?
- Features
- Architecture
- Technology Stack
- Quickstart
- Configuration
- Usage Examples
- Workflows & Connectors
- API Reference
- Security
- Performance & Scalability
- Deployment
- Developer Guide
- Project Structure
- Roadmap
- Contributing
- License
ForgeFlow is an open-source platform for building, running, and operating multi-agent AI workflows in the enterprise. Instead of a single monolithic prompt, ForgeFlow orchestrates a supervisor agent that routes work to a team of specialists — a researcher, an analyzer, and an executor — each grounded in real tools and real data, and gated by human approval before any high-impact side effect.
It ships three named workflow domains out of the box:
| Workflow | Pipeline | Status | Connector |
|---|---|---|---|
sales_ops |
qualify → research → analyze → propose → approve → execute | ✅ Production | HubSpot CRM (upsert-by-email, idempotent deals, 429 backoff) |
support_ops |
triage → investigate → respond → escalate → resolve | Pairs with Jira / ServiceNow | |
finance_recon |
ingest → match → flag variance → approve → post | Pairs with QuickBooks / SAP |
support_opsandfinance_reconare honest scaffolds — they raise on.run()unlessdry_run=TrueorFORGEFLOW_ALLOW_TEMPLATE_WORKFLOWS=1is set, and the React console labels them as such. Three named templates ≠ three production workflows.sales_opsis the fully-wired reference path: follow docs/sales-ops-production.md to run it against a real HubSpot account on Fly.io in under an hour.
Most "agent demos" collapse the moment they meet production reality: there's no audit trail, no cost ceiling, no way to pause for a human, secrets leak into prompts, a single flaky API takes down the whole run, and swapping a mock tool for a real one means rewriting the agent. ForgeFlow is the opinionated reference implementation of everything that sits around the LLM call to make agentic automation safe to deploy:
- Durable state — every node is checkpointed to PostgreSQL, so any worker can resume any run after a crash.
- Human-in-the-loop — workflows pause via LangGraph
interrupt_beforeand resume on an approve/reject webhook or Slack button. - Pluggable tools — agents talk to the world through the Model Context Protocol (MCP), so you swap a mock CRM for HubSpot without touching agent code.
- Defense in depth — PII redaction, prompt-injection guards, SSRF protection, untrusted-tool-output quarantine, and an outbound-email allowlist.
- Cost & resilience controls — per-run budget guard, circuit breakers, retry-with-backoff, and token-level cost tracking.
- Full observability — LangSmith / Phoenix / Langfuse tracing, Prometheus metrics, immutable audit log, and LLM-as-judge evaluation.
| Audience | What ForgeFlow gives you |
|---|---|
| Platform / ML engineers | A batteries-included blueprint for shipping agents with checkpointing, RBAC, observability, and connectors already wired. |
| Enterprises | Human approvals, audit trails, cost ceilings, multi-tenancy, and on-prem / air-gapped deployment paths. |
| OSS contributors | A clean, typed, well-tested (306 tests) codebase with clear extension points — connectors, MCP tools, and workflow templates. |
| Recruiters & evaluators | A demonstration of production agentic-AI engineering: LangGraph, MCP, A2A, Kubernetes, Terraform, and a polished React 19 console. |
- 🧠 Supervisor multi-agent orchestration — deterministic, auditable hub-and-spoke routing built on LangGraph
StateGraph. - 🔌 8 real enterprise connectors — HubSpot, Salesforce, Jira, ServiceNow, GitHub, SAP S/4HANA, QuickBooks Online, and Microsoft Graph — all behind a single resilient connector base.
- 🛡️ Security-first by design — see SECURITY_AUDIT.md for the full threat model and the fixes that close each finding.
- 🔁 Provider-agnostic — OpenAI, Anthropic Claude, or a fully local Ollama daemon (privacy / air-gapped mode).
- 📊 Operate it, don't just run it — a 13-view React console for runs, approvals, cost, audit, memory, agents, and evaluations.
- 🚢 Deploy anywhere — Docker Compose, Kubernetes, Helm, Terraform (AWS), Fly.io, and an offline bundle for air-gapped sites.
🤖 AI-Powered Multi-Agent Development
- Supervisor agent emits a structured
RoutingDecision(never calls tools directly) so routing stays deterministic and auditable — forgeflow/agents/supervisor.py - Researcher agent — web search + URL scraping + enrichment, with SSRF-guarded fetches
- Analyzer agent — 0–10 ICP scoring with risk flags and a recommended action
- Executor agent — drafts proposals, writes to the CRM, and sends pinned-recipient email
- Pluggable LLM providers — OpenAI (default), Anthropic Claude, or local Ollama via a single
get_model()factory — forgeflow/models/provider.py - Agent-to-Agent (A2A) protocol — JSON-RPC 2.0,
AgentCardcapability discovery, and an in-workflow dispatch registry — forgeflow/a2a/
🔄 Workflow Automation & Orchestration
- LangGraph
StateGraphwith PostgreSQL checkpointing — every node persisted, any worker resumes anythread_id— forgeflow/graph/ - Human-in-the-loop via
interrupt_before+ approve/reject webhooks and Slack deep-link buttons — forgeflow/api/routers/approvals.py - Approval escalation — a background job ratchets stale approvals through level 1 → 2 → auto-reject — forgeflow/jobs/escalation.py
- SSE streaming — stream agent reasoning + tool calls live with
astream()overStreamingResponse - Event-driven mode — consume Redis Streams or Kafka events into a shared
EventDispatcher— forgeflow/events/ - Dry-run simulation — run the full LLM plan with all side effects (CRM writes, emails, Slack) skipped
- Workflow template marketplace — file-based registry with a
manifest.yamlschema and a CLI validator — forgeflow/marketplace/, templates/
🧪 Testing, Evaluation & Validation
- 306 tests across unit + integration suites — tests/
- LLM-as-judge evaluation — faithfulness, relevance, coherence, and hallucination detection in one pass — forgeflow/evaluation/judge.py
- Eval regression gate in CI —
.github/workflows/eval.ymlchecks scores against a baseline (tests/eval_baseline.json) - HubSpot pre-flight validator — probes your real CRM end-to-end before deploy — scripts/validate_hubspot.py
- Type + lint gates —
ruff(E/F/I/UP/B/SIM/ANN) andmypyenforced in CI
🛡️ Security (Defense in Depth)
- PII redactor — conservative regex scrubbing of common identifiers — forgeflow/security/pii_redactor.py
- Prompt-injection guard — heuristic risk scoring on inbound text — forgeflow/security/prompt_guard.py
- SSRF guard — blocks private IPs, IMDS, and non-http(s) schemes on every agent-controlled URL — forgeflow/security/ssrf_guard.py
- Tool-output quarantine — wraps tool results in an
<UNTRUSTED_TOOL_OUTPUT>envelope to defeat 2nd-order injection — forgeflow/security/tool_output_guard.py - Outbound-email allowlist — pins recipients so an LLM can't exfiltrate via
email_send— forgeflow/security/email_allowlist.py - JWT + RBAC, immutable partitioned audit log, rate limiting, and a CORS allowlist (never
*) — forgeflow/middleware/
📈 Observability & Cost Control
- Tracing — pick LangSmith, Phoenix, Langfuse, or none via
TRACING_PROVIDER; OTel auto-configures for OTLP backends — forgeflow/observability/tracing_provider.py - Prometheus —
/metrics/prometheusexposition endpoint - Cost tracking —
tiktoken-based token counting with a per-model cost table and per-agent breakdown — forgeflow/observability/cost_tracker.py - Budget guard — halts a workflow before projected spend exceeds
BUDGET_LIMIT_USD— forgeflow/resilience/budget_guard.py - Circuit breaker — CLOSED/OPEN/HALF_OPEN state machine stops cascading failures — forgeflow/resilience/circuit_breaker.py
- Opt-in anonymous telemetry — off by default, PII-clean allowlist — forgeflow/telemetry/
🏢 Enterprise & Platform
- Multi-tenancy —
workspacesas the tenant root with a nullableworkspace_idon every tenant-scoped row (query scoping in progress) — forgeflow/api/routers/workspaces.py - Multi-modal input — PDF text extraction + vision-LLM image description — forgeflow/multimodal/
- Semantic memory — pgvector cosine recall, namespace-scoped — forgeflow/memory/pgvector_store.py
- Deployment targets — Docker Compose, Kubernetes, Helm, Terraform (AWS), Fly.io, and an air-gapped offline bundle
ForgeFlow is a hub-and-spoke system: a FastAPI control plane drives a checkpointed LangGraph state machine, agents reach the outside world only through an MCP tool server, and a React console operates the whole thing.
graph TB
subgraph Client["🖥️ Presentation"]
Console["React 19 Console :8501<br/>13 views · landing · architecture"]
end
subgraph ControlPlane["⚙️ Control Plane — FastAPI :8000"]
API["REST + SSE API"]
MW["Middleware stack<br/>RBAC · RateLimit · Security · Audit"]
Jobs["Escalation job · Event dispatcher"]
end
subgraph Orchestration["🧠 Orchestration — LangGraph"]
Supervisor["Supervisor Agent<br/>structured routing"]
Researcher["Researcher"]
Analyzer["Analyzer"]
Executor["Executor"]
Human["⏸ Human Approval<br/>interrupt_before"]
end
subgraph Tools["🔌 MCP Tool Server :8001 — FastMCP"]
Search["Web search"]
Connectors["8 connectors:<br/>HubSpot · Salesforce · Jira ·<br/>ServiceNow · GitHub · SAP ·<br/>QuickBooks · MS Graph"]
MultiModal["PDF · Image tools"]
end
subgraph Data["💾 State & Memory"]
PG["PostgreSQL 16 + pgvector<br/>checkpoints · audit · memory · tenants"]
end
subgraph Observe["📊 Observability"]
Trace["LangSmith / Phoenix / Langfuse"]
Prom["Prometheus"]
end
Console -->|nginx proxy /api/*| API
API --> MW --> Supervisor
Jobs --> Supervisor
Supervisor -->|qualify| Researcher
Supervisor -->|analyze| Analyzer
Supervisor -->|propose| Executor
Supervisor -->|await| Human
Human -->|approve| Executor
Researcher --> Tools
Executor --> Tools
Connectors --> External["External SaaS APIs"]
Orchestration -->|checkpoint every node| PG
Researcher -->|semantic recall| PG
API --> Observe
Orchestration --> Observe
POST /workflows/run ─┐
▼
RBAC → RateLimit → Security → Audit middleware
▼
LangGraph compiled graph (PostgreSQL-checkpointed)
▼
QUALIFY ── Researcher ─► MCP: web_search / scrape_url (SSRF-guarded)
▼
ANALYZE ── Analyzer ──► ICP score 0–10 + risk flags
│ score < 4.0 ─► DISQUALIFIED
▼ score ≥ 4.0
PROPOSE ── Executor ──► draft_proposal (LLM) ─► PostgreSQL proposals
▼
APPROVE ── Human ─────► ⏸ interrupt_before → Slack card / POST /approvals/{token}/approve
│ rejected ─► DONE
▼ approved
EXECUTE ── Executor ──► MCP: send_email (pinned) + CRM upsert ─► mark "proposed"
▼
DONE (cost tracked · evaluated · audited · traced)
| Decision | Choice | Why |
|---|---|---|
| Orchestration | LangGraph | Built-in interrupt_before, PostgreSQL checkpointing, and streaming — production-proven |
| Tool discovery | MCP | Swap backends without touching agent code; a fast-growing open standard |
| Agent comms | A2A (JSON-RPC 2.0) | Capability-based discovery; swappable to gRPC for scale |
| Memory | PostgreSQL + pgvector | Co-locate semantic + transactional data; one datastore to operate |
| Evaluation | LLM-as-judge | Faithfulness, relevance, coherence, and hallucination in a single pass |
| Resilience | Circuit breaker + tenacity | Stops cascading failures at the API boundary |
| Frontend | React 19 + Vite + nginx | Single-origin SPA, reverse-proxied /api/*, hand-authored CSS with oklch tokens |
| Layer | Technologies |
|---|---|
| Orchestration | LangGraph · LangChain Core · langgraph-checkpoint-postgres |
| LLM providers | OpenAI · Anthropic Claude · Ollama (local) |
| Tools | MCP (FastMCP, streamable-HTTP) · langchain-mcp-adapters · Tavily |
| API | FastAPI · Uvicorn · Pydantic v2 · pydantic-settings |
| Data | PostgreSQL 16 · pgvector · asyncpg · psycopg3 · Alembic |
| Frontend | React 19 · Vite · TanStack Router + Query · TypeScript |
| Resilience | tenacity · custom circuit breaker · budget guard |
| Observability | LangSmith · OpenTelemetry · Phoenix · Langfuse · Prometheus · tiktoken |
| Security | PyJWT · custom RBAC · PII / prompt / SSRF / tool-output guards |
| Events | Redis Streams · Kafka (aiokafka) |
| Infra | Docker Compose · Kubernetes · Helm · Terraform (AWS) · Fly.io |
| Quality | pytest · pytest-asyncio · ruff · mypy |
For a real HubSpot pipeline on Fly.io, jump to docs/sales-ops-production.md. The runbook below is for local evaluation.
- Docker + Docker Compose
- An LLM provider — one of:
- an OpenAI API key (default), or
- an Anthropic API key, or
- a local Ollama daemon (privacy / air-gapped mode)
- (Optional) a Tavily API key for real web search, and a LangSmith key for tracing
- (Optional, for the
sales_opsproduction path) a HubSpot Private App token with the 6 CRM scopes in the production runbook
git clone https://github.com/JoelJohnsonThomas/forgeflow.git
cd forgeflow
cp .env.example .env
# Edit .env — at minimum set OPENAI_API_KEY, API_SECRET_KEY, POSTGRES_PASSWORD, DEV_LOGIN_PASSWORDGenerate a strong secret with
openssl rand -hex 32forAPI_SECRET_KEY. Startup fails fast without it.
docker compose --profile migration run --rm migrate # apply Alembic migrations once
docker compose up # start the stack| Service | URL | Description |
|---|---|---|
| React Console | http://localhost:8501 | Landing page + 13-view operations console (nginx, proxies /api/*) |
| FastAPI | http://localhost:8000/docs | REST API + interactive OpenAPI UI |
| MCP Server | http://localhost:8001 | Tool server for agents |
| PostgreSQL | localhost:5432 | Database + pgvector |
# Option A — via the API
curl -X POST http://localhost:8000/workflows/run \
-H "Content-Type: application/json" \
-H "X-Role: sales_rep" \
-d '{"lead_data": {"company_name": "Stripe"}, "workflow_type": "sales_ops"}'
# Option B — the demo script (auto-approves the proposal)
python scripts/run_demo.py "Stripe" approvecurl http://localhost:8000/health
# → {"status":"healthy","database":"connected","graph":"compiled"}Then open http://localhost:8501 and watch the run land in /console/runs. 🎉
🔒 Local-first mode (privacy / air-gapped with Ollama)
ForgeFlow can run entirely against a local Ollama daemon — no data leaves your machine.
pip install 'forgeflow[ollama]'
ollama pull llama3.2:3b # worker model (fast)
ollama pull llama3.1:8b # supervisor + judge (stronger)
echo "LLM_PROVIDER=ollama" >> .env
echo "OLLAMA_BASE_URL=http://localhost:11434" >> .env
docker compose upNote: pgvector embeddings currently call OpenAI. Fully-offline embeddings are tracked in the roadmap. Anthropic Claude is also supported:
pip install 'forgeflow[anthropic]'and setLLM_PROVIDER=anthropic.
All configuration is environment-based and loaded through Pydantic Settings (forgeflow/config.py). Copy .env.example to .env — the key knobs:
| Variable | Default | Purpose |
|---|---|---|
LLM_PROVIDER |
openai |
openai · ollama · anthropic |
OPENAI_API_KEY |
— | Required when LLM_PROVIDER=openai |
API_SECRET_KEY |
— | Required. Signs JWTs — generate with openssl rand -hex 32 |
POSTGRES_PASSWORD |
— | Required by docker-compose |
DEV_LOGIN_ENABLED / DEV_LOGIN_PASSWORD |
true / — |
Demo /auth/login. Set false in production and front with an OIDC IdP |
DOCS_ENABLED |
true |
Disable /docs + /redoc in production |
CORS_ALLOW_ORIGINS |
localhost:5173,8501 |
Comma-separated allowlist — never * |
BUDGET_LIMIT_USD |
5.0 |
Per-workflow spend ceiling enforced by the budget guard |
TRACING_PROVIDER |
langsmith |
langsmith · phoenix · langfuse · none |
TAVILY_API_KEY |
— | Real web search (optional) |
SLACK_BOT_TOKEN |
— | HITL approval cards in Slack (optional) |
Optional extras gate heavier dependencies:
[ollama],[anthropic],[otel],[multimodal],[events],[events-kafka]. Install with e.g.pip install 'forgeflow[otel,multimodal]'.
curl -X POST http://localhost:8000/workflows/run \
-H "Content-Type: application/json" \
-H "X-Role: sales_rep" \
-d '{
"lead_data": {"company_name": "Acme Corp", "industry": "fintech"},
"workflow_type": "sales_ops"
}'curl -N -X POST http://localhost:8000/workflows/stream \
-H "Content-Type: application/json" \
-d '{"lead_data": {"company_name": "Stripe"}, "workflow_type": "sales_ops"}'# 1. See what's waiting
curl http://localhost:8000/approvals/pending
# 2. Approve (or /reject) — the workflow resumes from the checkpoint
curl -X POST http://localhost:8000/approvals/<token>/approve \
-H "Content-Type: application/json" \
-d '{"approver": "manager@acme.com", "comment": "Good fit, proceed"}'curl -X POST http://localhost:8000/workflows/run \
-H "Content-Type: application/json" \
-d '{"lead_data": {"ticket": "Login broken"}, "workflow_type": "support_ops", "dry_run": true}'curl "http://localhost:8000/memory/search?q=enterprise%20fintech%20leads&limit=5"python scripts/marketplace.py validate templates/community/my_workflow/Connectors are real (non-mock) integrations exposed to agents as MCP tools. All are built on a single resilient base (forgeflow/connectors/base.py) that honors Retry-After on 429s, applies exponential backoff with jitter on 502/503/504, and distinguishes RetryableError from PermanentError. Each degrades gracefully when credentials are absent.
| Connector | Pairs with | Notes |
|---|---|---|
| HubSpot | sales_ops |
Upsert-by-email contacts, idempotent forgeflow_run_id deals |
| Salesforce | sales_ops |
Leads + opportunities + SOQL |
| Jira Cloud | support_ops |
Issue create + transitions |
| ServiceNow | Incident mgmt | Table API incidents + change requests |
| GitHub | DevOps / PR review | Issues, PRs, releases, repo metadata |
| SAP S/4HANA | finance_recon |
OData v2 + CSRF for orders + invoices |
| QuickBooks Online | finance_recon |
Ledger + journal entries (OAuth 2.0) |
| Microsoft Graph | HITL approvals | Teams / Outlook / Calendar (Slack alternative) |
The MCP server mounts 14 tool routers (search, CRM, email, data, Slack, + the 8 connectors above, plus multi-modal) — see forgeflow/mcp/server/main.py.
Interactive OpenAPI docs live at http://localhost:8000/docs. Core endpoints:
Workflows
POST /workflows/run Trigger a workflow (sync)
POST /workflows/stream Trigger with SSE streaming
GET /workflows/{id} Run status + state
GET /workflows/{id}/trace Per-agent execution traces
Approvals · Agents · Memory
GET /approvals/pending Proposals awaiting review
POST /approvals/{token}/approve Resume (approved)
POST /approvals/{token}/reject Resume (rejected)
GET /agents Registered A2A agents
GET /agents/dispatch A2A capability resolution map
GET /agents/{id}/status Agent health + run count
POST /agents/{id}/message Send an A2A message
POST /memory/store Store semantic memory
GET /memory/search?q= Cosine similarity search
DELETE /memory/{id} Delete a memory
Metrics · Audit · Marketplace · Workspaces · Auth
GET /metrics/ System KPIs (total_runs, success_rate, avg_cost_usd…)
GET /metrics/cost Cost by agent
GET /metrics/cost/by_workflow_type
GET /metrics/cost/top_runs Top N most expensive runs
GET /metrics/cost/alerts Budget alert state
GET /metrics/evaluation LLM-judge score aggregates
GET /metrics/runs Recent run history
GET /metrics/prometheus Prometheus exposition
GET /audit/search Filterable, paginated audit log
GET /audit/stats Audit aggregates over N days
GET /marketplace/templates List installable templates
GET /marketplace/templates/{name}
POST /marketplace/templates/refresh
GET /workspaces/ List tenants
POST /workspaces/ Create a tenant
GET /workspaces/{slug}
POST /auth/login Issue a JWT (demo path, gate behind DEV_LOGIN_ENABLED)
POST /auth/introspect Decode + validate a JWT
POST /auth/logout
Authentication. Requests carry a bearer JWT (Authorization: Bearer <token>). The middleware maps role → permissions (forgeflow/middleware/auth.py). For local development, /auth/login issues a token when DEV_LOGIN_ENABLED=true; in production, disable it and front the API with an OIDC IdP. A legacy X-Role header is accepted for migration.
Errors. Standard HTTP semantics — 401 (no/invalid token), 403 (insufficient role), 422 (validation), 429 (rate-limited), 5xx (upstream/LLM failure surfaced after retries + circuit breaker).
ForgeFlow ships a documented threat model and the controls that close each finding — see SECURITY_AUDIT.md.
Built-in controls
- Prompt-injection guard scores inbound text; tool-output guard wraps every tool result in an
<UNTRUSTED_TOOL_OUTPUT>envelope to blunt 2nd-order injection. - SSRF guard rejects private IPs, cloud metadata (IMDS), and non-http(s) schemes on every agent-controlled URL.
- PII redactor scrubs common identifiers (conservative — prefers over-redaction).
- Outbound-email allowlist pins recipients so the LLM can't choose where mail goes.
- JWT + RBAC, an immutable partitioned audit log, rate limiting, and a strict CORS allowlist (never
*).
Best practices
- Set a strong
API_SECRET_KEY(openssl rand -hex 32); never commit.env. - Set
DEV_LOGIN_ENABLED=falseandDOCS_ENABLED=falsein production. - Use a secrets manager (AWS Parameter Store, GCP Secret Manager, Vault) —
.envis for local only. - The SPA authenticates per-user via
/auth/login; nginx no longer injects a shared admin secret.
Reporting a vulnerability. Please do not open a public issue for security reports. Email the maintainer or use GitHub's private security advisory flow; see SECURITY_AUDIT.md for the disclosure process.
| Concern | Approach |
|---|---|
| Horizontal scaling | The API is stateless — workers share one PostgreSQL checkpointer, so any worker resumes any thread_id. Scale with docker compose up --scale api=4 or a Kubernetes HPA (api 2→10, mcp 1→5). |
| Memory at scale | ivfflat cosine index works well for <1M vectors; switch to HNSW for higher recall at scale. |
| MCP transport | Defaults to streamable-http; co-located deployments can use stdio for lower latency. |
| Cost control | BudgetGuard blocks an LLM call before projected spend exceeds BUDGET_LIMIT_USD; cost is tracked per token, per agent, per workflow. |
| Resilience | Circuit breaker + tenacity retries isolate flaky upstreams; Retry-After honored on 429s. |
| Resource baseline | A full local stack (Postgres + MCP + API + frontend) runs comfortably on ~4 GB RAM / 2 vCPU for evaluation. |
| Metric | Score | Notes |
|---|---|---|
| Faithfulness | 0.91 | Outputs grounded in research context |
| Relevance | 0.88 | Proposals matched to company-specific signals |
| Coherence | 0.93 | Well-structured, internally consistent |
| Hallucination rate | 3.2% | Invented specifics caught by the judge |
| Avg cost / run | $0.042 | gpt-4o-mini workers, gpt-4o supervisor |
| Avg latency | 12.4s | Full qualify → propose pipeline |
| Qualification accuracy | 91% | vs. a 20-example manually-labeled set |
Scores generated by an LLM judge on 20 synthetic test cases — illustrative, not a benchmark.
| Target | Where | Notes |
|---|---|---|
| Docker Compose | docker-compose.yml · docker-compose.prod.yml | Local + single-host production |
| Kubernetes | k8s/ | StatefulSet + Deployments + HPAs + NetworkPolicies + Ingress |
| Helm | helm/forgeflow/ | Templated chart with an Alembic pre-upgrade hook |
| Terraform (AWS) | terraform/aws/ | VPC + EKS + RDS PG16 (pgvector) + Secrets Manager + ECR + IRSA |
| Fly.io | fly/ · scripts/deploy_fly.sh | 3 apps + managed Postgres + 6PN networking |
| Air-gapped | docs/deployment/AIRGAPPED.md · scripts/build_offline_bundle.sh | Offline bundle builder |
# Install dev dependencies
pip install -e '.[dev]' # or: pip install -r requirements-dev.txt
# Quality gates
make lint # ruff + mypy
make fmt # ruff format
make test # full suite with coverage
make test-unit # fast unit tests only
# Start just the DB for local API dev
docker compose up postgres
# Run the API locally (hot-reload)
uvicorn forgeflow.api.main:app --reload
# Run the React console (Vite dev server, HMR, proxies /api → :8000)
cd frontend && npm install && npm run dev # → http://localhost:5173| Extend… | Pattern to copy | Add |
|---|---|---|
| A new connector | forgeflow/connectors/github.py | A Connector subclass + a matching MCP tool router under forgeflow/mcp/server/tools/ |
| A new MCP tool | any router in forgeflow/mcp/server/tools/ | Mount it in forgeflow/mcp/server/main.py |
| A new workflow | forgeflow/workflows/sales_ops/ | models.py, prompts.py, stages.py, pipeline.py + a manifest.yaml template |
| A new LLM provider | forgeflow/models/provider.py | A lazy-imported branch in get_model() |
| A new tracing backend | forgeflow/observability/tracing_provider.py | A TRACING_PROVIDER branch |
forgeflow/
├── agents/ # Supervisor, Researcher, Analyzer, Executor
├── graph/ # LangGraph StateGraph wiring + PostgreSQL checkpointer
├── state/ # Shared WorkflowState schema
├── mcp/ # FastMCP server (14 tool routers) + MCP client adapter
├── connectors/ # 8 enterprise connectors on a resilient base
├── a2a/ # A2A protocol, registry, transport, dispatcher
├── memory/ # pgvector semantic store + relational store
├── workflows/ # sales_ops · support_ops · finance_recon pipelines
├── api/ # FastAPI app, routers, schemas, dependencies
├── middleware/ # RBAC + JWT, audit, rate limiter, security
├── auth/ · rbac/ # JWT issuance + role policies
├── security/ # PII · prompt · SSRF · tool-output · email guards
├── resilience/ # Retry (tenacity), circuit breaker, budget guard
├── observability/ # Tracing providers, cost tracker, metrics, Prometheus
├── evaluation/ # LLM judge, metrics, dataset, eval runner
├── events/ # Redis Streams + Kafka consumers + dispatcher
├── jobs/ # Approval escalation background job
├── marketplace/ # File-based template registry
├── multimodal/ # PDF + image ingestion
├── telemetry/ # Opt-in anonymous usage emitter
└── notifications/ # Slack HITL cards
frontend/ # React 19 + Vite SPA (13 console views + landing + architecture)
tests/ # unit/ + integration/ (306 tests)
scripts/ # seed_db · run_demo · run_eval · validate_hubspot · deploy_fly · marketplace
alembic/ # 5 migrations (schema · pgvector · RBAC · escalation · multi-tenant)
k8s/ · helm/ · terraform/ · fly/ # Deployment targets
templates/ # Built-in + community workflow templates
docs/ # Production runbook · air-gapped guide · images
Phases 0–6 are shipped (full history in ROADMAP.md). Highlights and what's next:
✅ Shipped — supervisor multi-agent core · PostgreSQL checkpointing · MCP tool server · A2A protocol · pgvector memory · JWT + RBAC + audit · cost tracking + budget guard · circuit breaker · LLM-as-judge evals · 8 enterprise connectors · Prometheus + OTel + Phoenix/Langfuse tracing · Slack HITL · approval escalation · event-driven mode (Redis/Kafka) · multi-modal (PDF + images) · template marketplace · React 19 console · K8s/Helm/Terraform(AWS)/Fly.io/air-gapped deploys.
🚧 In progress / next (good first issues — see ROADMAP.md)
- Multi-tenant query scoping — extend
workspace_idfiltering to all tenant-scoped endpoints (foundation + reference endpoints shipped). - Embeddings provider abstraction —
get_embeddings()factory for Ollama/Cohere/Voyage to unblock 100%-offline mode. - Terraform for GCP & Azure — mirror the AWS module with GKE/Cloud SQL and AKS/Flexible Server.
- Voice / Whisper transcription —
transcribe_audioMCP tool alongside the PDF/image pipeline.
Contributions are welcome! Start with CONTRIBUTING.md and the Code of Conduct.
- Fork & branch —
git checkout -b feat/your-feature(orfix/…,docs/…). - Set up —
pip install -e '.[dev]'anddocker compose up postgres. - Code to the standards — keep it typed;
make lint(ruff + mypy) andmake fmtmust pass. - Test — add tests next to the suite;
make testmust stay green (306+ and counting). - Open a PR — describe the change, link any issue, and ensure CI is green. Issues tagged
good first issueandhelp wantedare great entry points.
See COMMUNITY.md for discussion channels.
Licensed under the Apache License 2.0.
Built for the bleeding edge of agentic AI deployment.
⭐ Star the repo if ForgeFlow helps you ship agents to production.
