Build and test a database-aware storefront assistant using Eval Protocol and a Postgres MCP server. This repo follows a test-driven agent development workflow.
For background and a walkthrough, see the blog post: Test-Driven Agent Development with Eval Protocol.
- Python 3.10+
- Docker (for Postgres + MCP server)
- uv (recommended) or pip
- Create and activate a virtual environment
uv venv .venv
source .venv/bin/activate- Install in editable mode
uv pip install -e .- (Optional) Start Postgres + MCP server
# Starts Postgres (Chinook) and the Postgres MCP server
docker compose up -d- Fast local tests (no external model calls):
pytest -q- Full MCP/agent integration tests (require Docker up and a model key):
export RUN_MCP_EVAL=1
export FIREWORKS_API_KEY=your_fireworks_api_key
pytest -q- Run a single test with a summary line printed:
EP_PRINT_SUMMARY=1 pytest tests/pytest/test_storefront_agent_eval.py::test_storefront_agent_browse -q- Emit a JSON summary artifact for CI:
EP_SUMMARY_JSON=artifacts/ pytest -q
# writes JSON files under ./artifacts/- RUN_MCP_EVAL=1: enable MCP/agent integration test suite
- FIREWORKS_API_KEY: API key for Fireworks models used in agent tests
- EP_PRINT_SUMMARY=1: print a concise summary line to stdout
- EP_SUMMARY_JSON=: write machine-readable summary JSON(s)
- EP_MAX_DATASET_ROWS=<N|none>: clamp dataset/messages length per run
- docker-compose.yml defines:
- db: Postgres 16 with Chinook schema/data
- mcp: Postgres MCP server exposing tools (e.g., execute_sql) on port 8010
- tests/pytest/: evaluation tests (batch and pointwise)
- prompts/: system prompt(s)
- external/: third-party assets (Chinook database SQL, MCP server repo)
- scripts/: helper scripts (MCP proxy, etc.)
- Editable install errors about "Multiple top-level packages" were resolved by explicitly disabling package discovery in pyproject.toml.
- If MCP tests fail to connect, ensure
docker compose psshows both db and mcp healthy. - The agent tests hit real models—credentials and network access are required.