Releases: padosoft/agentic-qa-kit
v1.9.0 — Junior quick-start truthing + GitHub Packages publish
What's new
Closes the junior quick-start truthing macro. The README's promised CLI surface now exists end-to-end, and the kit publishes to GitHub Packages as `@padosoft/agentic-qa-kit`.
CLI verbs newly wired (previously documented but unimplemented)
- `aqa install-agent-files --targets claude,codex,gemini,copilot` — writes `CLAUDE.md` / `AGENTS.md` / `GEMINI.md` / `.github/copilot-instructions.md` plus per-agent skills. `--force` / `--dry-run` / `--project-name` flags. (PR #52)
- `aqa report [--run-id ] [--format md|json|both]` — renders `events.jsonl` + `findings.jsonl` from a run into `report.md` + `report.json` (the same JSON shape the admin UI consumes). Defaults to the latest run by file mtime. (PR #53)
- `aqa admin [--port ] [--host ]` — boots the admin SPA + `makeApi()` in a single Node process on `127.0.0.1:5173`. SPA + API ship in the kit tarball. In-memory store auto-seeded from `.aqa/runs/`. (PR #54)
Publishing
- GitHub Packages publish pipeline. On every `v*` tag, `.github/workflows/publish.yml` runs the esbuild bundler (`dist/cli.cjs`, ~570 KB CJS-in-.cjs with every `@aqa/` workspace + npm dep inlined), swaps the package name from `@aqa/kit` to `@padosoft/agentic-qa-kit`, strips bundled `@aqa/` deps from the published manifest, and runs `npm publish --provenance --access public` against `https://npm.pkg.github.com\`. (PR #55)
- New `@aqa/pack-author` workspace package extracted to break the `@aqa/kit` ↔ `@aqa/server` build cycle that emerged when kit started depending on server. Both kit and server depend on pack-author for `runPackNew` now; the cycle is gone from the dep graph entirely.
Docs
- README + `docs/getting-started.md` rewritten 1:1 with shipped verbs. Adds the GH Packages auth `.npmrc` snippet (the single biggest junior trap on first install — public packages on GH Packages still require auth). 10-step quick-start, `aqa admin` single-command boot, `bun run e2e:ecosystem` pointer for monorepo contributors. (PR #56)
Consume flow (works once this release is published)
```bash
1. One-time GitHub Packages auth setup in your project
cat > .npmrc <<'NPMRC'
@padosoft:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN}
NPMRC
export GITHUB_TOKEN=ghp_XXXXXXXXXXXXXXXXXXXX # needs read:packages
2. Install + run
bun add -d @padosoft/agentic-qa-kit
bunx aqa init
bunx aqa install-agent-files --targets claude,codex,gemini,copilot
bunx aqa run --profile smoke
bunx aqa report
bunx aqa admin # → http://127.0.0.1:5173
```
Stats
- 12 packages (added: `@aqa/pack-author`)
- 270 tests pass (102 in kit, 89 in server, etc.)
- 5 sub-task PRs (#52 #53 #54 #55 #56), 11 Copilot review iterations across them
- Macro PR #57
Full changelog
See CHANGELOG.md § 1.9.0.
v1.8.3 — live ecosystem e2e + roadmap closure
What's Changed
Full Changelog: v1.8.2...v1.8.3
v1.8.2 — ecosystem smoke e2e hardening
What's Changed
- docs(v1.7.1): record slice 4g in PROGRESS.md by @lopadova in #43
- feat(v1.7): wire Admin SSO config page end-to-end (slice 4h) by @lopadova in #44
- feat(v1.7): enable SSO config save flow (slice 4i) by @lopadova in #45
- feat(v1.7): autoload AuditChainViewer from /api/audit initial chain (slice 4j) by @lopadova in #46
- feat(v1.8): real HTTP probe runner + release-gate strict findings by @lopadova in #47
- docs(v1.x): README/docs closure refresh for current GA workflow by @lopadova in #48
- fix(v1.8.1): align audit verifier with EventChainWriter canonical chain by @lopadova in #49
- test(v1.8.2): extend CLI smoke to real HTTP run + artifact checks by @lopadova in #50
Full Changelog: v1.7.1...v1.8.2
v1.7.1 — Users + Roles admin wiring
v1.7.1 — Users + Roles admin wiring
- New
GET /api/users(settings:read) returning the store user-directory snapshot. - New
GET /api/roles(settings:read) returning the@aqa/authrole/permission matrix plusall_permissionsfromPermission.options. - New
StoreProvider.listUsers()contract and sharedStoreUserDirectoryEntrytype in@aqa/store. - Admin
PageUsersnow reads/api/userswith fixture fallback. - Admin
PageRolesnow reads/api/rolesand renders the live permission matrix (admin:everythingwildcard), with fixture fallback. - Tests added: server route coverage + admin users/roles e2e.
Tag points to commit 77f3b1c (PR #42).
v1.7.0 — pack authoring + admin CRUD
v1.7.0 — Pack authoring + Admin CRUD
The final v1.7 release closes the loop on pack creation, full Profile/Risk/Scenario CRUD wizards in the admin, end-to-end Agents wiring, and live reads for every Operations + Admin page that has a matching server route.
Pack authoring (slices 1–3)
- New community guide
docs/PACK-AUTHORING.md. - New CLI
aqa pack new <slug>(atomic backup-rename--force, slug + symlink + traversal rejection, in-memory schema validation of generated Scenario/RiskMap/PackManifest). - Admin Create-pack wizard wired over the new CLI (
POST /api/packs/scaffold).
Admin CRUD (slice 4c)
- Profiles: Delete (#29), Edit/Save (#30), Clone (#31). Atomic
Store.createProfile. App-leveldeletedProfiles/updatedProfiles/createdProfilesMaps withaqa:profile-*CustomEvents. Tombstone-clear on re-create. - Risks: Delete (#32 — returns
{ id, deleted: true }), Edit/Save (#33). Mock risk + invariant ids migrated to schema-conforming dashed Slugs. - Scenarios: Delete (#34 — new
Store.deleteScenario), Edit/Save (#35), Clone (#36 — atomicStore.createScenario), shared YAML-textarea Edit/Clone wizard (#37) with debounced parse + prototype-pollution guard. Mock scenario ids migrated to dashed slugs; explicitcategoryfield for tree grouping.
Agents (slice 4d)
- New
@aqa/schemas AgentwithSafeRepoPath-validatedfiles[](rejects.., drive letters, UNC roots). - New
agents:read/agents:editpermissions (agents:installaliased for back-compat). - New server routes:
GET /api/agents,GET /api/agents/:id,POST /api/agents/:id/install,POST /api/agents/:id/uninstall. - PageAgents fetches the live list with fixture fallback; install/uninstall buttons wired with in-flight guard + toasts.
Operations + Admin sections (slices 4e, 4f)
- PageAudit / PageAdminAudit →
GET /api/auditwith sharednormalizeAuditEventsForViewerhelper. - PageQueue →
GET /api/queuewith EnqueuedJob → fixture-shape adapter (filters terminaldonejobs). - PageCost →
GET /api/cost/summarywith explicit MTDfrom/to, tenant headerspadosoft/gescat. - PageNotifications →
GET /api/notifications; SELF resolves fromSESSION_USER.id; filter kinds derived fromNotificationKindenum. - PageTokens →
GET /api/tokenswith ApiToken → fixture-shape adapter (heuristic owner-prefix → kind). - PageOrg →
GET /api/orgs; subtitle joins live slugs. fmtDate/fmtDateTime/fmtRelativemade null-safe (em-dash for missing timestamps).- Users/Roles/SSO admin pages deferred — no server scaffolding yet.
Stats
- 13 PRs merged this slice (#28..#40 + #41 docs).
- Server: 86 unit tests (+19 from v1.6).
- Admin E2E: 132 tests (+45 from v1.6).
- New schemas:
Agent. - New permissions:
agents:read,agents:edit. - New store methods:
createProfile,createScenario,deleteScenario,listAgents,loadAgent,installAgent,uninstallAgent.
See `docs/PROGRESS.md` for the per-slice breakdown and architecture lessons.
v1.7.0-rc.1 — pack authoring (slices 1+2)
First release candidate for v1.7. Delivers the pack-authoring story (slices 1 and 2 of 4 planned).
What's in
📖 docs/PACK-AUTHORING.md
End-to-end tutorial for community pack authors:
- Directory layout, manifest schema, scenarios + risks structure
- Three distribution patterns (workspace pack / vendored copy / npm scope alias)
- Programmatic validation (
aqa validate) - Honest about current limitations: no-network
NO_NETWORK_PROBEstub returns{probe_id, status: 200, body: null}for every probe kind, onlyhttp_status/response_contains/response_not_containsoracles are wired, no custom oracle/probe loader yet
🛠 aqa pack new
CLI to scaffold a runnable pack at <cwd>/packs/<slug>/:
aqa pack new pack-myapp --sut-type api
aqa pack new pack-frontend --sut-type web --description "Smoke tests for the marketing site"The scaffold produces a starter scenario whose http_status: 200 oracle passes cleanly against the stub probe out of the box (avoiding the iter-17 footgun where bundled packs emitted synthetic findings). Supports:
--sut-type(api / web / cli / lib / agent / pipeline)--force— atomic backup-rename overwrite (non-destructive on failure)--description,--author,--license(SPDX)
Hardened against:
- Symlinks at packs/ parent and packDir (
lstatSyncchecks) - Non-directory parent (regular file at
<root>/packs) - Over-length slugs (cap at 52 chars to keep every derived ID within the 64-char Slug schema cap)
- TOCTOU on the existence check (uses non-recursive mkdir + explicit lstat)
- Schema-invalid generated output (in-memory
PackManifest/Scenario/RiskMapvalidation before writing) - Validation failures destroying the existing pack (backup-rename + restore-on-error)
🧪 Tests
54 in @aqa/kit (50 pack-new + 42 run-cmd subset). Lint + typecheck clean.
What's still pending in v1.7
- Slice 3 — Admin "Create pack" wizard (future PR)
- Slice 4 — Audit + wire/implement/document all 81 silent placeholder buttons across
packages/admin/src/app.tsx(slices 4a–4f, future PRs; plan documented indocs/internal/admin-placeholder-audit.md) - Final v1.7.0 tag after slices 3 + 4 ship
Review
19 iterations on PR #25 with both Copilot and Codex review bots. All real issues addressed; final iter returned 0 new must-fix items.
v1.6.0 — aqa run CLI + bundled packs
v1.6 — aqa run + ecosystem foundation
The missing piece between aqa init and a real audit trail. After 21 review iterations with Copilot + Codex (every one surfaced a real bug or coverage gap, zero false alarms), the inner loop is end-to-end usable.
What lands
aqa runCLI command. Loads.aqa/project.yaml+.aqa/profiles.yamlvia the canonical@aqa/schemasshapes, resolves packs from three discovery tiers (project'spacks/*,node_modules/@aqa/*, kit-bundleddist/packs/*), filters scenarios by the selected profile'stags, and runs each one via@aqa/runner.runScenario. Streams events + findings into.aqa/runs/<run_id>/.- Flags:
--profile <name>(defaults tosmokeif present, else first profile) and--seed <string>(deterministicrun_idfor tests + replay). - Bundled packs. All 5 baseline packs (
pack-core,pack-api-core,pack-web-ui,pack-llm-agent,pack-security) now ship inside@aqa/kit's npm tarball via abundle-packs.mjsbuild step. A freshaqa init+aqa run --profile smokeworks with only@aqa/kitinstalled. - SUT-aware init.
aqa initpicks the right packs from the detectedsut_type(api →pack-api-core, web →pack-web-ui, agent →pack-llm-agent, else →pack-core). Theframeworkclause onpack-api-corewas dropped so plain Node/Bun APIs without a recognized framework still get coverage. - Hardened orchestration: atomic run-dir creation (no TOCTOU on concurrent seeded runs), pack-manifest scenario discovery (no glob-scanning), path-traversal + symlink-escape rejection, applies_when filtering, manifest-name dedup with priority (project > node_modules > bundled), legacy bare-slug pack-name aliasing, agent-mode profile rejection until that driver lands, unrelated-broken-pack tolerance with structured warnings.
- Structured
RunResultwithok,runId,runDir,scenariosRun,findingsCount, cappederrorstring (MAX_DETAIL_PER_KIND+ "…+N more" truncation), and awarningsarray for non-fatal diagnostics. Detail samples (pack_error_samples,scenario_error_samples, …) live in therun_finishedaudit event for auditors. - 42 TDD tests in
packages/kit/test/run-cmd.test.ts— every behavior above is covered, written before the code existed.
Known scoped follow-ups (v1.7)
- Real HTTP probe runner. Today's
runScenariostill uses the no-network probe stub (@aqa/runner'sNO_NETWORK_PROBE). The release-gate "fail on any finding" semantic (fromrequire_deterministic_replay: true) is deferred until probes hit a real SUT — every finding the stub produces is synthetic. EventChainWriter↔verifyEventChainreconciliation. Writer omitsprev_hashfrom the canonical body and emitsnullfor seq=0;@aqa/compliance.verifyEventChainincludesprev_hashand expects"0…". Tests ship a local writer-matching verifier; reconciling the two implementations is a separate cleanup.- Pack authoring story. v1.7 will add
docs/PACK-AUTHORING.md(community tutorial),aqa pack new <slug>(scaffolding CLI), and an admin "Create pack" wizard — plus a full audit pass on every placeholder button in the admin panel so nothing renders as a "muted click". - Browser-driven ecosystem smoke. Playwright test that starts admin + runs
aqa runagainstexamples/bun-apiand asserts findings appear in the admin UI end-to-end.
v1.5.0 — admin design integration
v1.5 — Admin design integration
The hi-fi prototype shipped by Claude Design (30 screens) is now the official admin web panel. The bundled prototype was ported to Vite + React 19 + TypeScript strict, all 30 screens render in production, and a Playwright suite drives every screen.
What landed
- 30 screens, real markup — 8.9k LOC ported to
packages/admin/src/app.tsx(bundled,@ts-nocheckfor the prototype's design-tool conventions). Dark-themed, token-driven CSS. - Vite production build — replaced design-tool CDN React/Babel scripts with a regular Vite SPA.
bun run devboots in <500ms;bun run buildships a static bundle. - Playwright suite —
packages/admin/test/e2e/*.e2e.tscovers per-screen smoke, audit chain verify (OK + tampered), Findings views (Clusters/List/Kanban), Replay tabs, risk-map matrix, theme, palette. Real DOM, no mocks.bun run test:e2eruns it. - CI gating — new
E2E (Playwright, admin UI)job in.github/workflows/ci.ymlbuilds the admin and runs the Playwright suite against the dev server. - Quality — Biome ignores the bundled prototype to keep lint targeted; smoke filter tolerates the prototype's intentional
console.errordemo calls.
Known scoped tradeoffs
- In-memory routing only (the prototype was never URL-driven); reading
window.locationon boot is deferred to a follow-up. - Live-mode currently animates time but still reads in-file mock data; wiring
VITE_AQA_SERVER_URLto a real fetch layer is deferred to the next macro task.
What's next (v1.6)
Full end-to-end ecosystem smoke via Playwright: boot server + runner pool + admin in a single command, drive a real aqa run against examples/bun-api, verify findings appear in the admin and the audit chain stays valid end-to-end.
v1.4.0 — Admin API surface + issue #3 closed
v1.4.0 — Admin API surface + issue #3 closed
Backend gap closure ahead of the parallel admin v2 design integration.
Server expansion
packages/server/src/api.ts makeApi() grows from 4 → 28 routes:
- Runs: list, detail, events, create
- Findings: list, detail, status mutation (with audit reason)
- Packs: list, detail, install, uninstall
- Profiles: list, detail, save, delete
- Risks: list, detail, save, delete
- Scenarios: list, detail, save
- Audit: scoped event query
- Cost: per-window summary aggregation
- Queue: snapshot + runner tap
- Notifications: list, mark-read
- Saved views: list, save, delete
- API tokens: list, create, revoke
- Tenancy: list orgs, list projects, create org, create project
All routes are permission-gated via @aqa/auth, tenant-scoped via
x-aqa-org / x-aqa-project headers, and return shape-compliant
@aqa/schemas objects. Multi-tenant fail-closed: missing scope → 400;
cross-tenant ID lookup → 404 (so probing for IDs in other projects
gains no information).
Schemas
6 new @aqa/schemas namespaces with Draft 2020-12 JSON Schemas
emitted (schemas/v1/ now ships 15 files):
NotificationSavedViewApiTokenCostSummaryTenancy.Org+Tenancy.ProjectRef
Store
StoreProvider extended with 15+ methods covering the new endpoints.
MemoryStore implements all of them; PostgresStore retains the
explicit not implemented pattern so a misconfigured production
deployment fails loudly.
RunnerQueue gains snapshot(), requeue(id), kill(id) for the
admin queue ops screen.
Issue #3 closed
Three remaining Zod superRefines mirrored into JSON Schema:
Finding.status='duplicate' ⇒ duplicate_of requiredReproLevel.deterministic=true ⇒ attempts >= 1ProfilesFile.profile.name === key(via$comment— cross-field)
Cross-field invariants JSON Schema cannot express
(duplicate_of !== id, successes === attempts,
finished_at >= started_at, profile.name === key) surfaced via
$comment on the emitted schemas.
Ajv 2020 round-trip test (packages/schemas/test/ajv-roundtrip.test.ts)
validates every fixture against the emitted schema — catches Zod ↔
JSON-Schema divergence at build time.
All 6 emitter patches now resolve the #/definitions/<name>
indirection that zod-to-json-schema emits.
Docs
docs/design/admin-panel-spec-v2.md— full enterprise design brief
(tokens, 30 screens, component library, interactions, a11y, perf,
deliverables) for the external designer who builds the React
template in parallel.docs/PROGRESS.mdupdated with v1.4 entry, the post-design Playwright
smoke roadmap, and the final closing step (README + docs refresh
pass: audit v0.x references, finalise quick-start, write the
"How you use it" workflow section, prune obsolete docs).
Review loop
Codex + Copilot iterated 2 times before merge, surfacing and addressing
5 must-fix items across schema enum alignment, tenant-scope enforcement
on runs / findings detail, MemoryStore audit filter leniency on
unstamped events. CI 14/15 green throughout.
Numbers
- 28 server routes (was 4)
- 15 JSON Schemas (was 9)
- 205 tests (was 165)
- 19 packages (added
@aqa/compliancepreviously; this release adds no new package)
PR: #22.
v1.3.0 — Quality batch
v1.3.0 — Quality batch
Six post-v1.2 polish items + an extended review-and-fix loop. No new packages; all quality / coverage / docs / correctness.
What landed
1. Admin server↔UI mapping
packages/admin/src/data/api.tsfetches fromVITE_AQA_SERVER_URL(real@aqa/servershape) with explicit error surfacing — no silent mock fallback in live mode.mapRun()/mapFinding()translateRun.Run(state,totals.findings,totals.llm_cost_usd) andFinding.Finding(statusenumdraft|verified|rejected|duplicate|fixed,verification_floorenumbug_level|scenario_level|agent_level,discovered_at) into the UI types. Screens stay source-agnostic.live/mockbadge + red error banner on Runs and Findings.
2. Admin sub-screens (6 detail routes)
/runs/$runId, /findings/$findingId, /risk-map/$riskId, /profiles/$profileName, /packs/$packSlug, /scenarios/$scenarioId. Each with Breadcrumb + PageHeader. Runs table rows are clickable links.
3. Admin unit tests (12 new, 176 total)
test/audit.test.ts(5):parseEventLines×2,verifyEventChain×3 (good chain, tampered, vacuous truth).test/cluster.test.ts(6):signatureOf(identity, normalisation, divergence),clusterFindings(grouping, worst-severity, sort).
4. CLI E2E smoke gate
scripts/e2e-cli.mjs runs against a fresh tmpdir sandbox (seeded with a minimal package.json + aqa init). All four checks (--version, --help, doctor, validate) must exit 0. Wired into CI as a new e2e-cli job in .github/workflows/ci.yml.
5. Threat model expansion
docs/security/threat-model.md from 12-line stub to full STRIDE catalog: trust-boundary diagram, 20 specific threats with current mitigation + status, agentic-specific cross-cutting threats (tool-result poisoning, confirmation bypass, supply chain, cost-based DoS).
6. CHANGELOG.md backfill
Entries for v0.2.0 → v1.3.0 in Keep-a-Changelog format.
Review loop
Codex + Copilot review iterated 3 times before merge, surfacing and addressing 21 must-fix items across schema enum alignment, fake-live fallback, error-vs-not-found splitting, CLI E2E hardness, threat-model precision (S-03 narrower scope, D-01 / S-01 / I-03 downgraded from Mitigated to Partial / Unmitigated to reflect actual code). All inline comments addressed. CI 15/15 green.