Automates the mapping of messy reinsurance spreadsheets (Bordereaux) to a standardized schema using Small Language Models (Groq/Llama 3.3).
Security prerequisite for any deployment. RiskFlow's authentication assumes the operator has configured Entra Conditional Access to require phishing-resistant authentication (FIDO2 / passkey / certificate-based / Windows Hello for Business) and compliant-device-only access. Without these policies, the deployment is vulnerable to adversary-in-the-middle phishing kits (evilginx, EvilProxy, Modlishka) which can defeat PKCE-only MFA. The operator-side runbook for these policies is the Entra ID auth operator runbook; see
docs/azure-auth-implementation-plan.md§"Threat model" for the underlying analysis. This is not optional — it is the deployment's load-bearing security assumption.GUI auth status (local dev). The Streamlit GUI's sign-in flow is currently broken end-to-end (#331). The replacement is a server-side Backend-For-Frontend (#332, accepted in ADR-0001). For local development today, leave
ENTRA_TENANT_ID/ENTRA_AUDIENCEunset (the API'sNullIdentityProviderengages) or mint a bearer token viaaz account get-access-tokenfor direct API calls.
- Getting started tutorial — upload your first file in 5 minutes
- Get started with BFF auth — set up Entra and sign in via your browser in 30 minutes (newcomer-friendly; no prior OAuth knowledge required)
- Features overview — what RiskFlow delivers, with acceptance test checklist
- API reference — all endpoints, parameters, errors
- Architecture decisions — durable design choices in Nygard ADR format (start at ADR-0001)
- Full documentation index — tutorials, how-to guides, explanations, reference
- Python 3.12+
- uv
- Docker & Docker Compose (for Redis)
- A Groq account and API key (free tier available)
# Copy environment template and add your Groq API key
cp .env.example .env
# Edit .env and set GROQ_API_KEY=gsk_your_key_here
# Start everything: API + Redis + GUI
docker compose up -dOpen http://localhost:8501 for the GUI, or http://localhost:8000/docs for the API.
# Install dependencies
uv sync
# Copy environment template and add your Groq API key
cp .env.example .env
# Start Redis (still needs Docker)
docker compose up -d redis
# Run the API
uv run uvicorn src.entrypoint.main:app --reload --port 8000
# Run the GUI (in a separate terminal)
uv run streamlit run gui/app.pyProtected routes (e.g. /upload) require an Entra bearer token. To exercise
them locally without minting tokens — e.g. to drive the Groq mapping pipeline
or the #197 rate-limit experiment — run the dev-only auth-disabled overlay:
docker compose -f docker-compose.yml -f docker-compose.dev-auth-disabled.yml up -d
⚠️ DEV ONLY. This overlay runssrc.entrypoint.dev_main:app, which opens every protected route (no token required). Never use it in production or CI — those runsrc.entrypoint.main:app, which is fully auth-gated. The overlay is opt-in (explicit-f) and never auto-loads. See ADR-0003.
The committed docker-compose.yml is tuned for a 4-CPU host (cpus: "4", Dockerfile CMD sets --workers 4). The default Codespaces machine has only 2 CPUs, so the unmodified compose file fails to start with range of CPUs is from 0.01 to 2.00, as there are only 2 CPUs available.
docker-compose.codespaces.yml is a small overlay that caps the api service to 2 CPUs and overrides the command to --workers 2. Pass it alongside the base compose file:
docker compose -f docker-compose.yml -f docker-compose.codespaces.yml up -dNo other changes are required. The overlay only touches the api service; Redis and the GUI use the same configuration as the regular Docker quickstart.
# Run tests
uv run pytest -x -v tests/unit/
# Type checking
uv run mypy src/
# Lint and format
uv run ruff check src/
uv run ruff format src/- Red — Write a failing test in
tests/unit/ - Green — Implement the minimum code in
src/domain/orsrc/adapters/to make it pass - Check — Run
uv run mypy src/anduv run ruff check src/ - Commit — If all pass, commit with a descriptive message
Claude Code hooks enforce this — they block any commit where mypy, pytest, ruff check, or ruff format fail. GitHub Actions CI provides the same checks on PRs and pushes to main.
Hexagonal (Ports & Adapters). Dependencies only point inward.
graph LR
subgraph External
Client([Client])
GUI([Streamlit GUI<br>port 8501])
Excel[(Excel/CSV)]
Groq([Groq API])
Redis[(Redis)]
end
subgraph Adapters
HTTP[HTTP Adapter<br>FastAPI Routes]
Parser[Parser Adapter<br>Polars Ingestor]
SchemaLoader[Schema Loader<br>YAML Parser]
SLM[SLM Adapter<br>Groq Mapper]
Cache[Cache Adapter<br>Redis Client]
CorrCache[Correction Cache<br>Redis Hash]
SessionStore[Session Store<br>Redis + TTL]
SchemaStore[Schema Store<br>Redis]
JobStore[Job Store<br>Redis / In-Memory]
end
subgraph Ports
IngestorPort{{IngestorPort}}
MapperPort{{MapperPort}}
CachePort{{CachePort}}
CorrectionCachePort{{CorrectionCachePort}}
SessionStorePort{{MappingSessionStorePort}}
SchemaStorePort{{SchemaStorePort}}
JobStorePort{{JobStorePort}}
SchemaLoaderPort{{SchemaLoaderPort}}
end
subgraph Domain
Service[MappingService]
Session[MappingSession<br>CREATED to FINALISED]
Models[TargetSchema<br>ColumnMapping<br>MappingResult<br>ConfidenceReport<br>Correction]
RecordFactory[record_factory<br>Dynamic pydantic models]
DateFormat[date_format<br>Column-level detection]
Errors[Domain Errors]
end
GUI -->|HTTP| HTTP
Client -->|upload, corrections, schemas| HTTP
Client -->|sessions, async jobs| HTTP
HTTP --> Service
Service --> IngestorPort
Service --> MapperPort
Service --> CachePort
Service --> CorrectionCachePort
Service --> RecordFactory
Service --> Models
Service --> DateFormat
IngestorPort -.-> Parser
MapperPort -.-> SLM
CachePort -.-> Cache
CorrectionCachePort -.-> CorrCache
SessionStorePort -.-> SessionStore
SchemaStorePort -.-> SchemaStore
JobStorePort -.-> JobStore
SchemaLoaderPort -.-> SchemaLoader
Parser --> Excel
SLM --> Groq
Cache --> Redis
CorrCache --> Redis
SessionStore --> Redis
SchemaStore --> Redis
Data flows:
- Batch: Upload → Parse headers → Check cache → (miss?) Check corrections → SLM maps uncorrected headers → Merge → Validate rows → Return results with confidence report
- Interactive: Upload → SLM suggests → User edits mappings → Finalise → Validate rows → Return results
Endpoints:
Every endpoint except /health, /ready, /live, /docs, /redoc, and /openapi.json requires Authorization: Bearer <token>. See docs/reference/api.md for the authentication contract and docs/azure-auth-implementation-plan.md for the Entra ID setup.
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/upload |
POST | required | Synchronous upload with optional ?sheet_name, ?cedent_id, ?schema |
/upload/async |
POST | required | Async upload, returns job ID for polling |
/jobs |
GET | required | List all async jobs with filename and upload date |
/jobs/{id} |
GET | required | Poll async job status and result |
/sheets |
POST | required | List sheet names in an Excel file |
/corrections |
POST | required | Submit human-verified mapping corrections |
/schemas |
GET | required | List available target schemas |
/schemas/{name} |
GET | required | View a schema's full definition |
/schemas |
POST | required | Create a runtime schema from JSON |
/schemas/{name} |
DELETE | required | Delete a runtime schema |
/sessions |
POST | required | Upload file, get SLM suggestion + preview (interactive) |
/sessions/{id} |
GET | required | Current session state |
/sessions/{id}/mappings |
PUT | required | Edit mappings before finalising |
/sessions/{id}/target-fields |
PATCH | required | Add custom target fields to a session |
/sessions/{id}/finalise |
POST | required | Validate rows with user's mapping |
/sessions/{id} |
DELETE | required | Cleanup session + temp file |
/me |
GET | required | Authenticated caller's identity + cedent assignments (Phase 1 returns empty list) |
/health |
GET | open | Combined health check (includes Redis status) |
/ready |
GET | open | Kubernetes readiness probe (503 if Redis unreachable) |
/live |
GET | open | Kubernetes liveness probe (200 if process alive) |
Unauthenticated requests to protected endpoints return 401 UNAUTHORIZED with an RFC 6750 WWW-Authenticate header. When the API can't reach Entra's JWKS endpoint, requests fail with 503 AUTH_INFRASTRUCTURE_UNAVAILABLE rather than 401 (the token isn't necessarily invalid — we just can't check it).
src/
entrypoint/ # FastAPI wiring (composition root) — incl. auth + middleware wiring
domain/
model/ # TargetSchema, MappingSession, ColumnMapping, date_format, errors,
# User (authenticated caller identity)
service/ # MappingService (orchestration)
ports/
input/ # IngestorPort
output/ # MapperPort, CachePort, SessionStorePort, SchemaStorePort,
# IdentityProviderPort, ...
adapters/
http/ # FastAPI routes, RequestIdMiddleware, SecurityHeadersMiddleware,
# auth dependency (Depends(require_user))
auth/ # EntraJwtValidator + JwksCache (Entra ID JWT validation)
slm/ # Groq API mapper
storage/ # Redis cache, session store, schema store, job store
parsers/ # Polars ingestor, YAML schema loader
The default target schema (schemas/standard_reinsurance.yaml) maps Bordereaux data to:
| Field | Type | Constraints |
|---|---|---|
Policy_ID |
String | Not empty |
Inception_Date |
Date | Required |
Expiry_Date |
Date | Must not precede Inception_Date |
Sum_Insured |
Float | Non-negative |
Gross_Premium |
Float | Non-negative |
Currency |
Currency | USD, GBP, EUR, JPY |
The schema is configurable via YAML. Custom schemas can define different fields, types, constraints, cross-field rules, and SLM hints. See:
- Schema reference — field types, constraints, YAML format
- How to use a custom schema — step-by-step guide
This project is dual-licensed:
- Open Source: GNU General Public License v3.0 — free for open-source use, modification, and distribution under GPL terms.
- Commercial: For use in proprietary or closed-source products without GPL obligations, a commercial license is available. Contact ricjhill for details.
© 2025-2026 ricjhill. All rights reserved.