diff --git a/CHANGELOG.md b/CHANGELOG.md index f20eed6..f9d6f14 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,44 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [1.9.0] — 2026-05-21 — Junior quick-start truthing + +The v1.x roadmap is fully closed and the kit ships to GitHub Packages as `@padosoft/agentic-qa-kit`. This release closes the four gaps an external junior would have hit if they tried to follow the README quick-start in v1.8: + +### Added + +- **`aqa install-agent-files --targets …`** CLI verb (PR #52). Cables the existing `renderForTargets()` from `@aqa/adapters` into a real command. Generates `CLAUDE.md` + `AGENTS.md` + `GEMINI.md` + `.github/copilot-instructions.md` plus per-agent skills under `.claude/`, `.agents/`, `.gemini/`, `.github/`. Flags: `--targets `, `--project-name `, `--force`, `--dry-run`. Unknown target fails fast without writing anything. Existing files preserved unless `--force`. +- **`aqa report [--run-id ] [--format md|json|both]`** CLI verb (PR #53). Renders `events.jsonl` + `findings.jsonl` from a run into `report.md` (auditor-friendly) and `report.json` (stable shape consumed by the admin UI). Defaults to the latest run by file mtime so hash-suffixed `--seed` ids work alongside ISO-prefixed ones. Strict on bad inputs: missing artifacts, malformed JSONL, traversal in `--run-id`, symlinked run dirs all error fast. +- **`aqa admin [--port ] [--host ]`** CLI verb (PR #54). Boots the admin SPA + `makeApi()` in a single Node process on `http://127.0.0.1:5173`. The bundled SPA ships inside the kit tarball; the in-memory store is seeded from `.aqa/runs//{events,findings}.jsonl` so the admin shows real local data out of the box. Path-traversal-safe static serving with SPA fallback for client-side routes. +- **`@aqa/pack-author`** new workspace package (PR #54). Extracted `runPackNew` from `@aqa/kit` to break the kit↔server build cycle that emerged when kit started depending on server (via the new `aqa admin` command). `@aqa/server`'s `POST /api/packs/scaffold` and `@aqa/kit`'s `aqa pack new` both consume it. Kit keeps a 5-line re-export shim so existing in-kit imports work unchanged. +- **GitHub Packages publish pipeline** (PR #55). New `.github/workflows/publish.yml` runs on every `v*` tag and publishes `@padosoft/agentic-qa-kit` to `https://npm.pkg.github.com` with `--provenance`. The kit publishes as a single bundled `dist/cli.cjs` (~460 KB) via esbuild — every `@aqa/*` workspace dep + every npm dep is inlined; only Node built-ins stay external. `packages/kit/scripts/publish-prep.mjs` swaps `@aqa/kit` → `@padosoft/agentic-qa-kit` and pins `workspace:*` deps to the kit's version at publish time only (the workspace keeps its internal name so other packages can keep referencing it). +- **README + `docs/getting-started.md` rewritten** to match the actually-shipped CLI surface. Adds the `.npmrc` snippet for GH Packages auth, the 10-step quick-start, the `aqa admin` boot path, and a single-command `bun run e2e:ecosystem` pointer for monorepo contributors. + +### Changed + +- **kit `package.json` `name` field policy.** The workspace name stays `@aqa/kit` so other monorepo packages can reference it. The published artifact's `name` is set at publish time from the new `aqa.publishName` declaration (`@padosoft/agentic-qa-kit`). This dual-naming keeps internal imports stable while satisfying GH Packages' ` === ` requirement. +- **Bundle format.** Kit now ships as `dist/cli.cjs` (CJS-in-.cjs) instead of separate per-file ESM modules. The `.cjs` extension overrides the package-level `"type": "module"` so Node loads it as CJS and bundled deps that internally `require('process')` resolve cleanly. +- **`packages/server/src/api.ts`** imports `runPackNew` from `@aqa/pack-author` (was `@aqa/kit`). +- **`packages/server/src/index.ts`** re-exports `ApiHandler`, `ApiMethod`, `ApiRequest`, `ApiResponse` so kit can consume them type-only. + +### Fixed + +- **`doctor.ts` hint no longer says `(Task 4)`** — the verb exists now, so the suggestion is the full command a junior can paste. +- **Slugifier caps at 64 chars** (Slug schema max). Previously a long project directory name would slip through `aqa init` / `aqa install-agent-files` and trip `aqa validate` later. Caps then re-strips trailing dashes so the truncated slug stays schema-conformant. +- **`KNOWN_TARGETS` derived from `@aqa/adapters.adapters`** instead of being a hardcoded duplicate — adding a new adapter (e.g. `opencode`) auto-extends `--targets`. + +## [1.8.3] — 2026-05-20 — Live ecosystem e2e + roadmap closure sync + +PR #51 — dedicated ecosystem Playwright smoke (`packages/admin/test/e2e/ecosystem-live.e2e.ts`) with a single-command stack bootstrap (`scripts/ecosystem-stack.mjs`). Boots `examples/bun-api`, runs a real `aqa run --profile smoke`, serves live `/api/*` from `@aqa/server.makeApi` + `MemoryStore`, drives the admin against the live backend, asserts `finding_emitted` is visible from `/api/audit` and chain verification returns `CHAIN OK`. Command: `bun run e2e:ecosystem`. + +## [1.8.2] — 2026-05-20 — Ecosystem smoke e2e hardening + +PR #50 — `scripts/e2e-cli.mjs` no longer stops at version/help/doctor/validate only: boots a local HTTP `/healthz` target, seeds a schema-valid local smoke pack/profile, executes `aqa run --profile smoke` with the real HTTP probe runner, and asserts run artifacts are emitted under `.aqa/runs//`. + +## [1.8.1] — 2026-05-20 — Audit chain canonical reconciliation + +PR #49 — aligned `@aqa/compliance.verifyEventChain` with `@aqa/runner.EventChainWriter`: hash recomputation excludes `prev_hash` from canonical body, and first-record `prev_hash: null` is canonical instead of expecting all-zero literal. + ## [1.3.0] — 2026-05-18 ### Added — quality polish (no new packages) diff --git a/README.md b/README.md index 12b546e..d73b1ba 100644 --- a/README.md +++ b/README.md @@ -73,7 +73,7 @@ Coding agents (Claude Code, Codex CLI, Gemini CLI, GitHub Copilot CLI) are great ## Quick start (junior-friendly) -> **Status note:** the kit reached **v1.0 GA** (24-task roadmap complete) and is now at **v1.1**. The 18 workspace packages (`@aqa/schemas`, `@aqa/kit`, `@aqa/runner`, `@aqa/reporter`, `@aqa/server`, `@aqa/admin`, `@aqa/compliance`, `@aqa/methodology`, …) ship from this monorepo. Detailed walk-through: [`docs/getting-started.md`](docs/getting-started.md). +> **Status note:** the kit reached **v1.0 GA** (24-task roadmap complete) and is now at **v1.9**. The `@padosoft/agentic-qa-kit` CLI ships as a single bundled tarball from GitHub Packages. Detailed walk-through: [`docs/getting-started.md`](docs/getting-started.md). ### 1. Install Bun @@ -85,65 +85,111 @@ curl -fsSL https://bun.sh/install | bash powershell -c "irm bun.sh/install.ps1 | iex" ``` -### 2. Install the kit in your project +### 2. Tell your project where to find the kit (GitHub Packages auth) + +GitHub Packages requires authentication even for public packages. One-time setup per machine — create a PAT with `read:packages` scope at [github.com/settings/tokens](https://github.com/settings/tokens), then add it to a per-project `.npmrc`: + +```ini +# .npmrc — at the root of your project +@padosoft:registry=https://npm.pkg.github.com +//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN} +``` + +Export the token in your shell (or your CI secrets): + +```bash +export GITHUB_TOKEN=ghp_XXXXXXXXXXXXXXXXXXXX +``` + +### 3. Install the kit in your project ```bash cd /path/to/your/project -bun add -d agentic-qa-kit +bun add -d @padosoft/agentic-qa-kit ``` -> _If you don't have a project yet, clone `examples/bun-api` from this repo (available in v0.1.0)._ +> _If you don't have a project yet, clone `examples/bun-api` from this repo as a starting point._ -### 3. Initialize the AQA workspace +### 4. Initialize the AQA workspace + verify ```bash -bunx aqa init +bunx aqa init # scaffold .aqa/{project,risk-map,profiles}.yaml + testing.md +bunx aqa doctor # green/yellow/red checklist of kit health +bunx aqa validate # schema-check every .aqa/* file against @aqa/schemas ``` -Detects your stack and creates `.aqa/` with `testing.md`, `risk-map.yaml`, `profiles.yaml`, and scenarios for the packs your project matches. +`init` detects your stack (Bun/Node, framework, DB, SUT type) and creates a `.aqa/` directory anchored to the packs your project matches. -### 4. Install agent-specific files (pick one or many) +### 5. Install agent-specific files (one or many) ```bash bunx aqa install-agent-files --targets claude,codex,gemini,copilot ``` -This generates `CLAUDE.md` + `.claude/skills/aqa-*`, `AGENTS.md` + `.agents/skills/`, `GEMINI.md` + `.gemini/skills/`, `.github/copilot-instructions.md` + `.github/skills/`. +Generates `CLAUDE.md` + `.claude/skills/aqa-*`, `AGENTS.md` + `.agents/skills/`, `GEMINI.md` + `.gemini/skills/`, and `.github/copilot-instructions.md` + `.github/skills/`. Existing files are preserved unless you pass `--force`. Add `--dry-run` to see what would change first. -### 5. Run your first agentic QA pass +### 6. Edit `.aqa/risk-map.yaml` (declare what must never break) + +Replace the placeholder risk with the one that actually matters for your project. **The risk map is the heart of the kit — generic risks produce generic findings.** + +```yaml +- id: r-token-replay + category: auth + title: Tokens remain valid past rotation + severity: critical + likelihood: possible + invariants: + - id: inv-token-rotation + statement: Old tokens become invalid within 60 seconds of rotation. +``` + +### 7. Run your first agentic QA pass ```bash bunx aqa run --profile smoke ``` -A 10-minute, non-destructive sweep. When it finishes: +A fast, non-destructive sweep. Each run is written to `.aqa/runs//` with `events.jsonl`, `findings.jsonl`, and 3-level replay artifacts (`repro.sh`, `repro.curl`, `repro.playwright.ts`). + +### 8. Render the report ```bash -bunx aqa report +bunx aqa report # latest run, Markdown + JSON +bunx aqa report --run-id # explicit run +bunx aqa report --format md # just report.md ``` -You'll see findings like: +Writes `report.md` and `report.json` inside the same run directory. You'll see findings like: ``` AQA-2026-0001 [P1] Cross-tenant data leak (verified, 3/3 deterministic replay) AQA-2026-0002 [P3] Missing rate limit on /api/search ``` -### 6. Open the admin panel +### 9. Boot the admin panel (single command) ```bash -bun --filter @aqa/admin dev +bunx aqa admin ``` -Then open the local URL shown by Vite (normally `http://127.0.0.1:5173`) and inspect runs, findings, replay artifacts, and audit chain state. +Opens `http://127.0.0.1:5173`. The admin SPA + API server boot in one process, seeded from your local `.aqa/runs/`. Inspect runs, findings, replay artifacts, and verify the hash-chained audit log in-browser. `Ctrl-C` to stop. + +| Flag | Effect | +|---|---| +| `--port ` | listen on a specific port (default 5173) | +| `--host ` | bind host (default `127.0.0.1`; use `0.0.0.0` to expose on LAN) | -### 7. Reproduce from generated artifacts +### 10. Reproduce from generated artifacts ```bash ls .aqa/runs// +# events.jsonl findings.jsonl report.md report.json +# repro.sh repro.curl repro.playwright.ts ``` -Each run stores replay artifacts (`repro.sh`, `repro.curl`, `repro.playwright.ts`) so you can reproduce findings deterministically and confirm fixes. +Each finding ships with a deterministic replay artifact so you can reproduce it, hand it to a teammate, or attach it to a PR. + +> **Want the whole ecosystem in one go?** From a clone of `padosoft/agentic-qa-kit`, run `bun run e2e:ecosystem`. It boots `examples/bun-api`, runs a real `aqa run --profile smoke` against it, and opens the admin against the live data. Single command, end-to-end smoke. ## The mental model in 7 words @@ -156,12 +202,13 @@ Every concept in AQA is one of these seven things or a tool that operates on the ## How you use it 1. `aqa init`: detect your repo and scaffold `.aqa/`. -2. Edit `risk-map.yaml`: declare what must never break. -3. Install agent files: Claude/Codex/Gemini/Copilot instructions + skills. +2. `aqa install-agent-files --targets …`: write Claude/Codex/Gemini/Copilot instructions + skills. +3. Edit `risk-map.yaml`: declare what must never break. 4. `aqa run --profile smoke`: execute scenarios with probes + oracles. -5. Open admin: `bun --filter @aqa/admin dev`. -6. Inspect findings, replay deterministically, verify audit chain. -7. Iterate risks + scenarios until `release-gate` is green. +5. `aqa report`: render `report.md` + `report.json` from the latest run. +6. `aqa admin`: boot the SPA + API on `127.0.0.1:5173`, seeded from local runs. +7. Inspect findings, replay deterministically, verify audit chain. +8. Iterate risks + scenarios until `release-gate` is green. ## Multi-agent @@ -219,10 +266,12 @@ Full diagram: [`docs/architecture/reference.md`](docs/architecture/reference.md) | `v1.5` | **Admin design integration — shipped** | 30-screen hi-fi prototype bundled, Playwright E2E gate, theme + palette + Findings kanban | | `v1.6` | **`aqa run` + bundled packs — shipped** | Three-tier pack discovery, atomic run-dir, applies_when filtering, agent-mode rejection until driver lands | | `v1.7` | **Pack authoring + admin CRUD — shipped** | `PACK-AUTHORING.md`, `aqa pack new`, admin Create-pack/Import-manifest wizards, full Profile/Risk/Scenario CRUD (Delete/Edit/Clone), Agents wired to `/api/agents`, Operations + Admin pages wired to `/api/audit` / `/api/cost/summary` / `/api/queue` / `/api/notifications` / `/api/tokens` / `/api/orgs`, scenario YAML editor, schema-conforming mock-id migration, `Agent` schema, `agents:read`/`agents:edit` permissions, atomic `Store.createProfile/createScenario` | +| `v1.8` | **Live ecosystem e2e — shipped** | Real HTTP probe runner, release-gate finding enforcement, single-command ecosystem stack (`bun run e2e:ecosystem`), Playwright admin-against-live-API smoke, audit-chain canonical reconciliation | +| `v1.9` | **Junior quick-start truthing — shipped** | `aqa install-agent-files` + `aqa report` + `aqa admin` CLI verbs (previously documented but unwired), `@aqa/pack-author` extracted to break kit↔server build cycle, esbuild bundled `dist/cli.cjs`, GitHub Packages publish workflow on `v*` tags, README quick-start rewritten to match the actually-shipped CLI surface | ## Status -**GA (`v1.0` shipped, `v1.7` current).** The full 24-task roadmap is closed: +**GA (`v1.0` shipped, `v1.9` current).** The full 24-task roadmap is closed: schemas, CLI (`@aqa/kit`), 5 baseline packs, multi-agent adapters (Claude/Codex/Gemini/Copilot), runner with hash-chained audit, reporter with 3-level replay, admin panel, server + runner fleet, on-prem LLM diff --git a/docs/getting-started.md b/docs/getting-started.md index 6330a47..6df4daa 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -14,14 +14,36 @@ > Windows users: PowerShell 7+ is the supported shell. WSL also works. -## 1. Install the kit in your project (3 min) +## 1. Authenticate to GitHub Packages (2 min, one-time) + +The kit is published as `@padosoft/agentic-qa-kit` on GitHub Packages. Public packages on GH Packages still require auth — create a personal access token (PAT) once and tell `bun`/`npm` how to use it. + +1. Create a PAT at with scope **`read:packages`** (that's the only one you need to install). +2. Add this `.npmrc` to your project root: + +```ini +@padosoft:registry=https://npm.pkg.github.com +//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN} +``` + +3. Export the token in your shell (`~/.zshrc`, `~/.bashrc`, or the equivalent on Windows): + +```bash +export GITHUB_TOKEN=ghp_XXXXXXXXXXXXXXXXXXXX +``` + +> For CI, set `GITHUB_TOKEN` as a workflow/runner secret — never commit it. + +## 2. Install the kit in your project (1 min) ```bash cd path/to/your/project -bun add -D agentic-qa-kit +bun add -D @padosoft/agentic-qa-kit ``` -## 2. Bootstrap `.aqa/` (1 min) +The CLI is the only thing that gets installed — a single bundled `cli.cjs` (~460 KB) with all `@aqa/*` workspace deps inlined. The admin SPA and the 5 bundled packs ride along inside the same tarball. + +## 3. Bootstrap `.aqa/` (1 min) ```bash bunx aqa init @@ -37,25 +59,29 @@ This writes four files (non-destructive — existing files are skipped): └── testing.md # human-readable rationale for the QA conventions ``` -Open each file and tailor them to your SUT. **The risk map is the heart of -the kit — generic risks produce generic findings.** +Open each file and tailor them to your SUT. **The risk map is the heart of the kit — generic risks produce generic findings.** -## 3. Verify the install (1 min) +## 4. Verify the install (1 min) ```bash bunx aqa doctor # ✓/⚠/✗ checklist bunx aqa validate # schema-check .aqa/* against @aqa/schemas (CI-safe) ``` -## 4. Install agent instruction files (2 min) +## 5. Install agent instruction files (2 min) ```bash bunx aqa install-agent-files --targets claude,codex,gemini,copilot ``` -This generates agent-specific instruction files and skills in your repo. +This generates `CLAUDE.md`, `AGENTS.md`, `GEMINI.md`, `.github/copilot-instructions.md` plus per-agent skills under `.claude/skills/`, `.agents/skills/`, `.gemini/skills/`, and `.github/skills/`. + +Flags worth knowing: +- `--force` — overwrite existing files (default: skip). +- `--dry-run` — preview what would change without touching disk. +- `--project-name ` — override the slug embedded in the headers (default: directory name, slugified, capped at 64 chars). -## 5. Define one real risk (3 min) +## 6. Define one real risk (3 min) Replace the placeholder in `.aqa/risk-map.yaml`: @@ -70,37 +96,47 @@ Replace the placeholder in `.aqa/risk-map.yaml`: statement: Old tokens become invalid within 60 seconds of rotation. ``` -A good invariant is **one sentence**, **falsifiable**, and **independent of -implementation**. +A good invariant is **one sentence**, **falsifiable**, and **independent of implementation**. -## 6. Run the smoke profile (3 min) +## 7. Run the smoke profile (3 min) ```bash bunx aqa run --profile smoke ``` -Optional immediate report: +Each run writes `events.jsonl`, `findings.jsonl`, and per-finding replay artifacts (`repro.sh`, `repro.curl`, `repro.playwright.ts`) under `.aqa/runs//`. + +## 8. Render the report (10 sec) ```bash -bunx aqa report +bunx aqa report # latest run, both formats +bunx aqa report --run-id # explicit run +bunx aqa report --format md # just report.md ``` -Then open the admin panel: +Output lands inside the same run directory as `report.md` (auditor-friendly) and `report.json` (machine-readable, same shape the admin UI consumes). + +## 9. Boot the admin (10 sec) ```bash -bun --filter @aqa/admin dev +bunx aqa admin ``` +Opens `http://127.0.0.1:5173`. The SPA + API run in one process and the in-memory store is auto-seeded from `.aqa/runs/`. Browse runs, drill into findings, replay deterministically, verify the hash-chained audit log. `Ctrl-C` to stop. + +| Flag | Effect | +|---|---| +| `--port ` | listen on a specific port (default 5173; 0 = OS-assigned) | +| `--host ` | bind host (default `127.0.0.1`; use `0.0.0.0` to expose on LAN) | + ## Where to go next -- **`docs/methodology/agentic-qa.md`** — the Risk × Invariant × Probe × Oracle - methodology, in long form. -- **`docs/ecosystem-explained.md`** — every concept in the kit, with a worked - example. +- **`docs/methodology/agentic-qa.md`** — the Risk × Invariant × Probe × Oracle methodology, in long form. +- **`docs/ecosystem-explained.md`** — every concept in the kit, with a worked example. - **`docs/architecture/reference.md`** — the component map and data flow. +- **`docs/PACK-AUTHORING.md`** — write your own pack (`aqa pack new `). - **`docs/design/admin-panel-template.md`** — the full admin UI spec. - **`docs/RULES.md`** — the hard rules every contribution must obey. - **`docs/adr/`** — architecture decisions (start with ADR-001). -When you hit something the docs don't cover, file an issue. The kit is -junior-friendly **on purpose**. +When you hit something the docs don't cover, file an issue. The kit is junior-friendly **on purpose**.