Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,44 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [1.9.0] — 2026-05-21 — Junior quick-start truthing

The v1.x roadmap is fully closed and the kit ships to GitHub Packages as `@padosoft/agentic-qa-kit`. This release closes the four gaps an external junior would have hit if they tried to follow the README quick-start in v1.8:

### Added

- **`aqa install-agent-files --targets …`** CLI verb (PR #52). Cables the existing `renderForTargets()` from `@aqa/adapters` into a real command. Generates `CLAUDE.md` + `AGENTS.md` + `GEMINI.md` + `.github/copilot-instructions.md` plus per-agent skills under `.claude/`, `.agents/`, `.gemini/`, `.github/`. Flags: `--targets <csv|repeat>`, `--project-name <slug>`, `--force`, `--dry-run`. Unknown target fails fast without writing anything. Existing files preserved unless `--force`.
- **`aqa report [--run-id <id>] [--format md|json|both]`** CLI verb (PR #53). Renders `events.jsonl` + `findings.jsonl` from a run into `report.md` (auditor-friendly) and `report.json` (stable shape consumed by the admin UI). Defaults to the latest run by file mtime so hash-suffixed `--seed` ids work alongside ISO-prefixed ones. Strict on bad inputs: missing artifacts, malformed JSONL, traversal in `--run-id`, symlinked run dirs all error fast.
- **`aqa admin [--port <n>] [--host <h>]`** CLI verb (PR #54). Boots the admin SPA + `makeApi()` in a single Node process on `http://127.0.0.1:5173`. The bundled SPA ships inside the kit tarball; the in-memory store is seeded from `.aqa/runs/<id>/{events,findings}.jsonl` so the admin shows real local data out of the box. Path-traversal-safe static serving with SPA fallback for client-side routes.
- **`@aqa/pack-author`** new workspace package (PR #54). Extracted `runPackNew` from `@aqa/kit` to break the kit↔server build cycle that emerged when kit started depending on server (via the new `aqa admin` command). `@aqa/server`'s `POST /api/packs/scaffold` and `@aqa/kit`'s `aqa pack new` both consume it. Kit keeps a 5-line re-export shim so existing in-kit imports work unchanged.
- **GitHub Packages publish pipeline** (PR #55). New `.github/workflows/publish.yml` runs on every `v*` tag and publishes `@padosoft/agentic-qa-kit` to `https://npm.pkg.github.com` with `--provenance`. The kit publishes as a single bundled `dist/cli.cjs` (~460 KB) via esbuild — every `@aqa/*` workspace dep + every npm dep is inlined; only Node built-ins stay external. `packages/kit/scripts/publish-prep.mjs` swaps `@aqa/kit` → `@padosoft/agentic-qa-kit` and pins `workspace:*` deps to the kit's version at publish time only (the workspace keeps its internal name so other packages can keep referencing it).
- **README + `docs/getting-started.md` rewritten** to match the actually-shipped CLI surface. Adds the `.npmrc` snippet for GH Packages auth, the 10-step quick-start, the `aqa admin` boot path, and a single-command `bun run e2e:ecosystem` pointer for monorepo contributors.
Comment on lines +12 to +21

### Changed

- **kit `package.json` `name` field policy.** The workspace name stays `@aqa/kit` so other monorepo packages can reference it. The published artifact's `name` is set at publish time from the new `aqa.publishName` declaration (`@padosoft/agentic-qa-kit`). This dual-naming keeps internal imports stable while satisfying GH Packages' `<scope> === <repo-owner>` requirement.
- **Bundle format.** Kit now ships as `dist/cli.cjs` (CJS-in-.cjs) instead of separate per-file ESM modules. The `.cjs` extension overrides the package-level `"type": "module"` so Node loads it as CJS and bundled deps that internally `require('process')` resolve cleanly.
- **`packages/server/src/api.ts`** imports `runPackNew` from `@aqa/pack-author` (was `@aqa/kit`).
- **`packages/server/src/index.ts`** re-exports `ApiHandler`, `ApiMethod`, `ApiRequest`, `ApiResponse` so kit can consume them type-only.

### Fixed

- **`doctor.ts` hint no longer says `(Task 4)`** — the verb exists now, so the suggestion is the full command a junior can paste.
- **Slugifier caps at 64 chars** (Slug schema max). Previously a long project directory name would slip through `aqa init` / `aqa install-agent-files` and trip `aqa validate` later. Caps then re-strips trailing dashes so the truncated slug stays schema-conformant.
- **`KNOWN_TARGETS` derived from `@aqa/adapters.adapters`** instead of being a hardcoded duplicate — adding a new adapter (e.g. `opencode`) auto-extends `--targets`.

## [1.8.3] — 2026-05-20 — Live ecosystem e2e + roadmap closure sync

PR #51 — dedicated ecosystem Playwright smoke (`packages/admin/test/e2e/ecosystem-live.e2e.ts`) with a single-command stack bootstrap (`scripts/ecosystem-stack.mjs`). Boots `examples/bun-api`, runs a real `aqa run --profile smoke`, serves live `/api/*` from `@aqa/server.makeApi` + `MemoryStore`, drives the admin against the live backend, asserts `finding_emitted` is visible from `/api/audit` and chain verification returns `CHAIN OK`. Command: `bun run e2e:ecosystem`.

## [1.8.2] — 2026-05-20 — Ecosystem smoke e2e hardening

PR #50 — `scripts/e2e-cli.mjs` no longer stops at version/help/doctor/validate only: boots a local HTTP `/healthz` target, seeds a schema-valid local smoke pack/profile, executes `aqa run --profile smoke` with the real HTTP probe runner, and asserts run artifacts are emitted under `.aqa/runs/<run-id>/`.

## [1.8.1] — 2026-05-20 — Audit chain canonical reconciliation

PR #49 — aligned `@aqa/compliance.verifyEventChain` with `@aqa/runner.EventChainWriter`: hash recomputation excludes `prev_hash` from canonical body, and first-record `prev_hash: null` is canonical instead of expecting all-zero literal.

## [1.3.0] — 2026-05-18

### Added — quality polish (no new packages)
Expand Down
97 changes: 73 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Coding agents (Claude Code, Codex CLI, Gemini CLI, GitHub Copilot CLI) are great

## Quick start (junior-friendly)

> **Status note:** the kit reached **v1.0 GA** (24-task roadmap complete) and is now at **v1.1**. The 18 workspace packages (`@aqa/schemas`, `@aqa/kit`, `@aqa/runner`, `@aqa/reporter`, `@aqa/server`, `@aqa/admin`, `@aqa/compliance`, `@aqa/methodology`, …) ship from this monorepo. Detailed walk-through: [`docs/getting-started.md`](docs/getting-started.md).
> **Status note:** the kit reached **v1.0 GA** (24-task roadmap complete) and is now at **v1.9**. The `@padosoft/agentic-qa-kit` CLI ships as a single bundled tarball from GitHub Packages. Detailed walk-through: [`docs/getting-started.md`](docs/getting-started.md).

### 1. Install Bun

Expand All @@ -85,65 +85,111 @@ curl -fsSL https://bun.sh/install | bash
powershell -c "irm bun.sh/install.ps1 | iex"
```

### 2. Install the kit in your project
### 2. Tell your project where to find the kit (GitHub Packages auth)

GitHub Packages requires authentication even for public packages. One-time setup per machine — create a PAT with `read:packages` scope at [github.com/settings/tokens](https://github.com/settings/tokens), then add it to a per-project `.npmrc`:

```ini
# .npmrc — at the root of your project
@padosoft:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN}
```

Export the token in your shell (or your CI secrets):

```bash
export GITHUB_TOKEN=ghp_XXXXXXXXXXXXXXXXXXXX
```

### 3. Install the kit in your project

```bash
cd /path/to/your/project
bun add -d agentic-qa-kit
bun add -d @padosoft/agentic-qa-kit
```
Comment on lines +88 to 109

> _If you don't have a project yet, clone `examples/bun-api` from this repo (available in v0.1.0)._
> _If you don't have a project yet, clone `examples/bun-api` from this repo as a starting point._

### 3. Initialize the AQA workspace
### 4. Initialize the AQA workspace + verify

```bash
bunx aqa init
bunx aqa init # scaffold .aqa/{project,risk-map,profiles}.yaml + testing.md
bunx aqa doctor # green/yellow/red checklist of kit health
bunx aqa validate # schema-check every .aqa/* file against @aqa/schemas
```

Detects your stack and creates `.aqa/` with `testing.md`, `risk-map.yaml`, `profiles.yaml`, and scenarios for the packs your project matches.
`init` detects your stack (Bun/Node, framework, DB, SUT type) and creates a `.aqa/` directory anchored to the packs your project matches.

### 4. Install agent-specific files (pick one or many)
### 5. Install agent-specific files (one or many)

```bash
bunx aqa install-agent-files --targets claude,codex,gemini,copilot
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove non-existent CLI commands from quick-start

This step documents bunx aqa install-agent-files, but the CLI in this commit does not implement that command (the router in packages/kit/src/cli/aqa.ts only handles init, doctor, validate, run, and pack new). Following the new quick-start now fails with aqa: unknown command, which blocks first-time onboarding; the same mismatch also affects the newly documented aqa report and aqa admin steps in this patch.

Useful? React with 👍 / 👎.

```
Comment on lines +123 to 127

This generates `CLAUDE.md` + `.claude/skills/aqa-*`, `AGENTS.md` + `.agents/skills/`, `GEMINI.md` + `.gemini/skills/`, `.github/copilot-instructions.md` + `.github/skills/`.
Generates `CLAUDE.md` + `.claude/skills/aqa-*`, `AGENTS.md` + `.agents/skills/`, `GEMINI.md` + `.gemini/skills/`, and `.github/copilot-instructions.md` + `.github/skills/`. Existing files are preserved unless you pass `--force`. Add `--dry-run` to see what would change first.

### 5. Run your first agentic QA pass
### 6. Edit `.aqa/risk-map.yaml` (declare what must never break)

Replace the placeholder risk with the one that actually matters for your project. **The risk map is the heart of the kit — generic risks produce generic findings.**

```yaml
- id: r-token-replay
category: auth
title: Tokens remain valid past rotation
severity: critical
likelihood: possible
invariants:
- id: inv-token-rotation
statement: Old tokens become invalid within 60 seconds of rotation.
```

### 7. Run your first agentic QA pass

```bash
bunx aqa run --profile smoke
```

A 10-minute, non-destructive sweep. When it finishes:
A fast, non-destructive sweep. Each run is written to `.aqa/runs/<run-id>/` with `events.jsonl`, `findings.jsonl`, and 3-level replay artifacts (`repro.sh`, `repro.curl`, `repro.playwright.ts`).

### 8. Render the report

```bash
bunx aqa report
bunx aqa report # latest run, Markdown + JSON
bunx aqa report --run-id <id> # explicit run
bunx aqa report --format md # just report.md
```
Comment on lines +154 to 160

You'll see findings like:
Writes `report.md` and `report.json` inside the same run directory. You'll see findings like:

```
AQA-2026-0001 [P1] Cross-tenant data leak (verified, 3/3 deterministic replay)
AQA-2026-0002 [P3] Missing rate limit on /api/search
```

### 6. Open the admin panel
### 9. Boot the admin panel (single command)

```bash
bun --filter @aqa/admin dev
bunx aqa admin
```

Then open the local URL shown by Vite (normally `http://127.0.0.1:5173`) and inspect runs, findings, replay artifacts, and audit chain state.
Opens `http://127.0.0.1:5173`. The admin SPA + API server boot in one process, seeded from your local `.aqa/runs/`. Inspect runs, findings, replay artifacts, and verify the hash-chained audit log in-browser. `Ctrl-C` to stop.

Comment on lines +169 to +176
| Flag | Effect |
|---|---|
| `--port <n>` | listen on a specific port (default 5173) |
| `--host <h>` | bind host (default `127.0.0.1`; use `0.0.0.0` to expose on LAN) |

### 7. Reproduce from generated artifacts
### 10. Reproduce from generated artifacts

```bash
ls .aqa/runs/<run-id>/
# events.jsonl findings.jsonl report.md report.json
# repro.sh repro.curl repro.playwright.ts
```

Each run stores replay artifacts (`repro.sh`, `repro.curl`, `repro.playwright.ts`) so you can reproduce findings deterministically and confirm fixes.
Each finding ships with a deterministic replay artifact so you can reproduce it, hand it to a teammate, or attach it to a PR.

> **Want the whole ecosystem in one go?** From a clone of `padosoft/agentic-qa-kit`, run `bun run e2e:ecosystem`. It boots `examples/bun-api`, runs a real `aqa run --profile smoke` against it, and opens the admin against the live data. Single command, end-to-end smoke.

## The mental model in 7 words

Expand All @@ -156,12 +202,13 @@ Every concept in AQA is one of these seven things or a tool that operates on the
## How you use it

1. `aqa init`: detect your repo and scaffold `.aqa/`.
2. Edit `risk-map.yaml`: declare what must never break.
3. Install agent files: Claude/Codex/Gemini/Copilot instructions + skills.
2. `aqa install-agent-files --targets …`: write Claude/Codex/Gemini/Copilot instructions + skills.
3. Edit `risk-map.yaml`: declare what must never break.
4. `aqa run --profile smoke`: execute scenarios with probes + oracles.
5. Open admin: `bun --filter @aqa/admin dev`.
6. Inspect findings, replay deterministically, verify audit chain.
7. Iterate risks + scenarios until `release-gate` is green.
5. `aqa report`: render `report.md` + `report.json` from the latest run.
6. `aqa admin`: boot the SPA + API on `127.0.0.1:5173`, seeded from local runs.
7. Inspect findings, replay deterministically, verify audit chain.
8. Iterate risks + scenarios until `release-gate` is green.

## Multi-agent

Expand Down Expand Up @@ -219,10 +266,12 @@ Full diagram: [`docs/architecture/reference.md`](docs/architecture/reference.md)
| `v1.5` | **Admin design integration — shipped** | 30-screen hi-fi prototype bundled, Playwright E2E gate, theme + palette + Findings kanban |
| `v1.6` | **`aqa run` + bundled packs — shipped** | Three-tier pack discovery, atomic run-dir, applies_when filtering, agent-mode rejection until driver lands |
| `v1.7` | **Pack authoring + admin CRUD — shipped** | `PACK-AUTHORING.md`, `aqa pack new`, admin Create-pack/Import-manifest wizards, full Profile/Risk/Scenario CRUD (Delete/Edit/Clone), Agents wired to `/api/agents`, Operations + Admin pages wired to `/api/audit` / `/api/cost/summary` / `/api/queue` / `/api/notifications` / `/api/tokens` / `/api/orgs`, scenario YAML editor, schema-conforming mock-id migration, `Agent` schema, `agents:read`/`agents:edit` permissions, atomic `Store.createProfile/createScenario` |
| `v1.8` | **Live ecosystem e2e — shipped** | Real HTTP probe runner, release-gate finding enforcement, single-command ecosystem stack (`bun run e2e:ecosystem`), Playwright admin-against-live-API smoke, audit-chain canonical reconciliation |
| `v1.9` | **Junior quick-start truthing — shipped** | `aqa install-agent-files` + `aqa report` + `aqa admin` CLI verbs (previously documented but unwired), `@aqa/pack-author` extracted to break kit↔server build cycle, esbuild bundled `dist/cli.cjs`, GitHub Packages publish workflow on `v*` tags, README quick-start rewritten to match the actually-shipped CLI surface |

## Status

**GA (`v1.0` shipped, `v1.7` current).** The full 24-task roadmap is closed:
**GA (`v1.0` shipped, `v1.9` current).** The full 24-task roadmap is closed:
schemas, CLI (`@aqa/kit`), 5 baseline packs, multi-agent adapters
(Claude/Codex/Gemini/Copilot), runner with hash-chained audit, reporter
with 3-level replay, admin panel, server + runner fleet, on-prem LLM
Expand Down
80 changes: 58 additions & 22 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,36 @@

> Windows users: PowerShell 7+ is the supported shell. WSL also works.

## 1. Install the kit in your project (3 min)
## 1. Authenticate to GitHub Packages (2 min, one-time)

The kit is published as `@padosoft/agentic-qa-kit` on GitHub Packages. Public packages on GH Packages still require auth — create a personal access token (PAT) once and tell `bun`/`npm` how to use it.

1. Create a PAT at <https://github.com/settings/tokens> with scope **`read:packages`** (that's the only one you need to install).
2. Add this `.npmrc` to your project root:

```ini
@padosoft:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN}
```

3. Export the token in your shell (`~/.zshrc`, `~/.bashrc`, or the equivalent on Windows):

```bash
export GITHUB_TOKEN=ghp_XXXXXXXXXXXXXXXXXXXX
```

> For CI, set `GITHUB_TOKEN` as a workflow/runner secret — never commit it.

## 2. Install the kit in your project (1 min)

```bash
cd path/to/your/project
bun add -D agentic-qa-kit
bun add -D @padosoft/agentic-qa-kit
```

## 2. Bootstrap `.aqa/` (1 min)
The CLI is the only thing that gets installed — a single bundled `cli.cjs` (~460 KB) with all `@aqa/*` workspace deps inlined. The admin SPA and the 5 bundled packs ride along inside the same tarball.

Comment on lines +17 to +45
## 3. Bootstrap `.aqa/` (1 min)

```bash
bunx aqa init
Expand All @@ -37,25 +59,29 @@ This writes four files (non-destructive — existing files are skipped):
└── testing.md # human-readable rationale for the QA conventions
```

Open each file and tailor them to your SUT. **The risk map is the heart of
the kit — generic risks produce generic findings.**
Open each file and tailor them to your SUT. **The risk map is the heart of the kit — generic risks produce generic findings.**

## 3. Verify the install (1 min)
## 4. Verify the install (1 min)

```bash
bunx aqa doctor # ✓/⚠/✗ checklist
bunx aqa validate # schema-check .aqa/* against @aqa/schemas (CI-safe)
```

## 4. Install agent instruction files (2 min)
## 5. Install agent instruction files (2 min)

```bash
bunx aqa install-agent-files --targets claude,codex,gemini,copilot
```

This generates agent-specific instruction files and skills in your repo.
This generates `CLAUDE.md`, `AGENTS.md`, `GEMINI.md`, `.github/copilot-instructions.md` plus per-agent skills under `.claude/skills/`, `.agents/skills/`, `.gemini/skills/`, and `.github/skills/`.

Flags worth knowing:
- `--force` — overwrite existing files (default: skip).
- `--dry-run` — preview what would change without touching disk.
- `--project-name <slug>` — override the slug embedded in the headers (default: directory name, slugified, capped at 64 chars).
Comment on lines +71 to +82

## 5. Define one real risk (3 min)
## 6. Define one real risk (3 min)

Replace the placeholder in `.aqa/risk-map.yaml`:

Expand All @@ -70,37 +96,47 @@ Replace the placeholder in `.aqa/risk-map.yaml`:
statement: Old tokens become invalid within 60 seconds of rotation.
```

A good invariant is **one sentence**, **falsifiable**, and **independent of
implementation**.
A good invariant is **one sentence**, **falsifiable**, and **independent of implementation**.

## 6. Run the smoke profile (3 min)
## 7. Run the smoke profile (3 min)

```bash
bunx aqa run --profile smoke
```

Optional immediate report:
Each run writes `events.jsonl`, `findings.jsonl`, and per-finding replay artifacts (`repro.sh`, `repro.curl`, `repro.playwright.ts`) under `.aqa/runs/<run-id>/`.

## 8. Render the report (10 sec)

```bash
bunx aqa report
bunx aqa report # latest run, both formats
bunx aqa report --run-id <id> # explicit run
bunx aqa report --format md # just report.md
```
Comment on lines +109 to 115

Then open the admin panel:
Output lands inside the same run directory as `report.md` (auditor-friendly) and `report.json` (machine-readable, same shape the admin UI consumes).

## 9. Boot the admin (10 sec)

```bash
bun --filter @aqa/admin dev
bunx aqa admin
```

Opens `http://127.0.0.1:5173`. The SPA + API run in one process and the in-memory store is auto-seeded from `.aqa/runs/`. Browse runs, drill into findings, replay deterministically, verify the hash-chained audit log. `Ctrl-C` to stop.

| Flag | Effect |
|---|---|
| `--port <n>` | listen on a specific port (default 5173; 0 = OS-assigned) |
| `--host <h>` | bind host (default `127.0.0.1`; use `0.0.0.0` to expose on LAN) |

Comment on lines +119 to +131
## Where to go next

- **`docs/methodology/agentic-qa.md`** — the Risk × Invariant × Probe × Oracle
methodology, in long form.
- **`docs/ecosystem-explained.md`** — every concept in the kit, with a worked
example.
- **`docs/methodology/agentic-qa.md`** — the Risk × Invariant × Probe × Oracle methodology, in long form.
- **`docs/ecosystem-explained.md`** — every concept in the kit, with a worked example.
- **`docs/architecture/reference.md`** — the component map and data flow.
- **`docs/PACK-AUTHORING.md`** — write your own pack (`aqa pack new <slug>`).
- **`docs/design/admin-panel-template.md`** — the full admin UI spec.
- **`docs/RULES.md`** — the hard rules every contribution must obey.
- **`docs/adr/`** — architecture decisions (start with ADR-001).

When you hit something the docs don't cover, file an issue. The kit is
junior-friendly **on purpose**.
When you hit something the docs don't cover, file an issue. The kit is junior-friendly **on purpose**.
Loading