Skip to content

Releases: padosoft/agentic-qa-kit

v1.9.0 — Junior quick-start truthing + GitHub Packages publish

21 May 00:40
697bbb9

Choose a tag to compare

What's new

Closes the junior quick-start truthing macro. The README's promised CLI surface now exists end-to-end, and the kit publishes to GitHub Packages as `@padosoft/agentic-qa-kit`.

CLI verbs newly wired (previously documented but unimplemented)

  • `aqa install-agent-files --targets claude,codex,gemini,copilot` — writes `CLAUDE.md` / `AGENTS.md` / `GEMINI.md` / `.github/copilot-instructions.md` plus per-agent skills. `--force` / `--dry-run` / `--project-name` flags. (PR #52)
  • `aqa report [--run-id ] [--format md|json|both]` — renders `events.jsonl` + `findings.jsonl` from a run into `report.md` + `report.json` (the same JSON shape the admin UI consumes). Defaults to the latest run by file mtime. (PR #53)
  • `aqa admin [--port ] [--host ]` — boots the admin SPA + `makeApi()` in a single Node process on `127.0.0.1:5173`. SPA + API ship in the kit tarball. In-memory store auto-seeded from `.aqa/runs/`. (PR #54)

Publishing

  • GitHub Packages publish pipeline. On every `v*` tag, `.github/workflows/publish.yml` runs the esbuild bundler (`dist/cli.cjs`, ~570 KB CJS-in-.cjs with every `@aqa/` workspace + npm dep inlined), swaps the package name from `@aqa/kit` to `@padosoft/agentic-qa-kit`, strips bundled `@aqa/` deps from the published manifest, and runs `npm publish --provenance --access public` against `https://npm.pkg.github.com\`. (PR #55)
  • New `@aqa/pack-author` workspace package extracted to break the `@aqa/kit` ↔ `@aqa/server` build cycle that emerged when kit started depending on server. Both kit and server depend on pack-author for `runPackNew` now; the cycle is gone from the dep graph entirely.

Docs

  • README + `docs/getting-started.md` rewritten 1:1 with shipped verbs. Adds the GH Packages auth `.npmrc` snippet (the single biggest junior trap on first install — public packages on GH Packages still require auth). 10-step quick-start, `aqa admin` single-command boot, `bun run e2e:ecosystem` pointer for monorepo contributors. (PR #56)

Consume flow (works once this release is published)

```bash

1. One-time GitHub Packages auth setup in your project

cat > .npmrc <<'NPMRC'
@padosoft:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN}
NPMRC
export GITHUB_TOKEN=ghp_XXXXXXXXXXXXXXXXXXXX # needs read:packages

2. Install + run

bun add -d @padosoft/agentic-qa-kit
bunx aqa init
bunx aqa install-agent-files --targets claude,codex,gemini,copilot
bunx aqa run --profile smoke
bunx aqa report
bunx aqa admin # → http://127.0.0.1:5173
```

Stats

  • 12 packages (added: `@aqa/pack-author`)
  • 270 tests pass (102 in kit, 89 in server, etc.)
  • 5 sub-task PRs (#52 #53 #54 #55 #56), 11 Copilot review iterations across them
  • Macro PR #57

Full changelog

See CHANGELOG.md § 1.9.0.

v1.8.3 — live ecosystem e2e + roadmap closure

20 May 11:34
374b2e3

Choose a tag to compare

What's Changed

  • test(v1.8.3): live ecosystem e2e + roadmap closure sync by @lopadova in #51

Full Changelog: v1.8.2...v1.8.3

v1.8.2 — ecosystem smoke e2e hardening

20 May 11:00
4d87647

Choose a tag to compare

What's Changed

  • docs(v1.7.1): record slice 4g in PROGRESS.md by @lopadova in #43
  • feat(v1.7): wire Admin SSO config page end-to-end (slice 4h) by @lopadova in #44
  • feat(v1.7): enable SSO config save flow (slice 4i) by @lopadova in #45
  • feat(v1.7): autoload AuditChainViewer from /api/audit initial chain (slice 4j) by @lopadova in #46
  • feat(v1.8): real HTTP probe runner + release-gate strict findings by @lopadova in #47
  • docs(v1.x): README/docs closure refresh for current GA workflow by @lopadova in #48
  • fix(v1.8.1): align audit verifier with EventChainWriter canonical chain by @lopadova in #49
  • test(v1.8.2): extend CLI smoke to real HTTP run + artifact checks by @lopadova in #50

Full Changelog: v1.7.1...v1.8.2

v1.7.1 — Users + Roles admin wiring

20 May 08:06
77f3b1c

Choose a tag to compare

v1.7.1 — Users + Roles admin wiring

  • New GET /api/users (settings:read) returning the store user-directory snapshot.
  • New GET /api/roles (settings:read) returning the @aqa/auth role/permission matrix plus all_permissions from Permission.options.
  • New StoreProvider.listUsers() contract and shared StoreUserDirectoryEntry type in @aqa/store.
  • Admin PageUsers now reads /api/users with fixture fallback.
  • Admin PageRoles now reads /api/roles and renders the live permission matrix (admin:everything wildcard), with fixture fallback.
  • Tests added: server route coverage + admin users/roles e2e.

Tag points to commit 77f3b1c (PR #42).

v1.7.0 — pack authoring + admin CRUD

20 May 03:12
15b9bc6

Choose a tag to compare

v1.7.0 — Pack authoring + Admin CRUD

The final v1.7 release closes the loop on pack creation, full Profile/Risk/Scenario CRUD wizards in the admin, end-to-end Agents wiring, and live reads for every Operations + Admin page that has a matching server route.

Pack authoring (slices 1–3)

  • New community guide docs/PACK-AUTHORING.md.
  • New CLI aqa pack new <slug> (atomic backup-rename --force, slug + symlink + traversal rejection, in-memory schema validation of generated Scenario/RiskMap/PackManifest).
  • Admin Create-pack wizard wired over the new CLI (POST /api/packs/scaffold).

Admin CRUD (slice 4c)

  • Profiles: Delete (#29), Edit/Save (#30), Clone (#31). Atomic Store.createProfile. App-level deletedProfiles / updatedProfiles / createdProfiles Maps with aqa:profile-* CustomEvents. Tombstone-clear on re-create.
  • Risks: Delete (#32 — returns { id, deleted: true }), Edit/Save (#33). Mock risk + invariant ids migrated to schema-conforming dashed Slugs.
  • Scenarios: Delete (#34 — new Store.deleteScenario), Edit/Save (#35), Clone (#36 — atomic Store.createScenario), shared YAML-textarea Edit/Clone wizard (#37) with debounced parse + prototype-pollution guard. Mock scenario ids migrated to dashed slugs; explicit category field for tree grouping.

Agents (slice 4d)

  • New @aqa/schemas Agent with SafeRepoPath-validated files[] (rejects .., drive letters, UNC roots).
  • New agents:read / agents:edit permissions (agents:install aliased for back-compat).
  • New server routes: GET /api/agents, GET /api/agents/:id, POST /api/agents/:id/install, POST /api/agents/:id/uninstall.
  • PageAgents fetches the live list with fixture fallback; install/uninstall buttons wired with in-flight guard + toasts.

Operations + Admin sections (slices 4e, 4f)

  • PageAudit / PageAdminAudit → GET /api/audit with shared normalizeAuditEventsForViewer helper.
  • PageQueue → GET /api/queue with EnqueuedJob → fixture-shape adapter (filters terminal done jobs).
  • PageCost → GET /api/cost/summary with explicit MTD from/to, tenant headers padosoft/gescat.
  • PageNotifications → GET /api/notifications; SELF resolves from SESSION_USER.id; filter kinds derived from NotificationKind enum.
  • PageTokens → GET /api/tokens with ApiToken → fixture-shape adapter (heuristic owner-prefix → kind).
  • PageOrg → GET /api/orgs; subtitle joins live slugs.
  • fmtDate / fmtDateTime / fmtRelative made null-safe (em-dash for missing timestamps).
  • Users/Roles/SSO admin pages deferred — no server scaffolding yet.

Stats

  • 13 PRs merged this slice (#28..#40 + #41 docs).
  • Server: 86 unit tests (+19 from v1.6).
  • Admin E2E: 132 tests (+45 from v1.6).
  • New schemas: Agent.
  • New permissions: agents:read, agents:edit.
  • New store methods: createProfile, createScenario, deleteScenario, listAgents, loadAgent, installAgent, uninstallAgent.

See `docs/PROGRESS.md` for the per-slice breakdown and architecture lessons.

v1.7.0-rc.1 — pack authoring (slices 1+2)

18 May 20:57
6cc0013

Choose a tag to compare

First release candidate for v1.7. Delivers the pack-authoring story (slices 1 and 2 of 4 planned).

What's in

📖 docs/PACK-AUTHORING.md

End-to-end tutorial for community pack authors:

  • Directory layout, manifest schema, scenarios + risks structure
  • Three distribution patterns (workspace pack / vendored copy / npm scope alias)
  • Programmatic validation (aqa validate)
  • Honest about current limitations: no-network NO_NETWORK_PROBE stub returns {probe_id, status: 200, body: null} for every probe kind, only http_status / response_contains / response_not_contains oracles are wired, no custom oracle/probe loader yet

🛠 aqa pack new

CLI to scaffold a runnable pack at <cwd>/packs/<slug>/:

aqa pack new pack-myapp --sut-type api
aqa pack new pack-frontend --sut-type web --description "Smoke tests for the marketing site"

The scaffold produces a starter scenario whose http_status: 200 oracle passes cleanly against the stub probe out of the box (avoiding the iter-17 footgun where bundled packs emitted synthetic findings). Supports:

  • --sut-type (api / web / cli / lib / agent / pipeline)
  • --force — atomic backup-rename overwrite (non-destructive on failure)
  • --description, --author, --license (SPDX)

Hardened against:

  • Symlinks at packs/ parent and packDir (lstatSync checks)
  • Non-directory parent (regular file at <root>/packs)
  • Over-length slugs (cap at 52 chars to keep every derived ID within the 64-char Slug schema cap)
  • TOCTOU on the existence check (uses non-recursive mkdir + explicit lstat)
  • Schema-invalid generated output (in-memory PackManifest/Scenario/RiskMap validation before writing)
  • Validation failures destroying the existing pack (backup-rename + restore-on-error)

🧪 Tests

54 in @aqa/kit (50 pack-new + 42 run-cmd subset). Lint + typecheck clean.

What's still pending in v1.7

  • Slice 3 — Admin "Create pack" wizard (future PR)
  • Slice 4 — Audit + wire/implement/document all 81 silent placeholder buttons across packages/admin/src/app.tsx (slices 4a–4f, future PRs; plan documented in docs/internal/admin-placeholder-audit.md)
  • Final v1.7.0 tag after slices 3 + 4 ship

Review

19 iterations on PR #25 with both Copilot and Codex review bots. All real issues addressed; final iter returned 0 new must-fix items.

v1.6.0 — aqa run CLI + bundled packs

18 May 16:36
21d7b10

Choose a tag to compare

v1.6 — aqa run + ecosystem foundation

The missing piece between aqa init and a real audit trail. After 21 review iterations with Copilot + Codex (every one surfaced a real bug or coverage gap, zero false alarms), the inner loop is end-to-end usable.

What lands

  • aqa run CLI command. Loads .aqa/project.yaml + .aqa/profiles.yaml via the canonical @aqa/schemas shapes, resolves packs from three discovery tiers (project's packs/*, node_modules/@aqa/*, kit-bundled dist/packs/*), filters scenarios by the selected profile's tags, and runs each one via @aqa/runner.runScenario. Streams events + findings into .aqa/runs/<run_id>/.
  • Flags: --profile <name> (defaults to smoke if present, else first profile) and --seed <string> (deterministic run_id for tests + replay).
  • Bundled packs. All 5 baseline packs (pack-core, pack-api-core, pack-web-ui, pack-llm-agent, pack-security) now ship inside @aqa/kit's npm tarball via a bundle-packs.mjs build step. A fresh aqa init + aqa run --profile smoke works with only @aqa/kit installed.
  • SUT-aware init. aqa init picks the right packs from the detected sut_type (api → pack-api-core, web → pack-web-ui, agent → pack-llm-agent, else → pack-core). The framework clause on pack-api-core was dropped so plain Node/Bun APIs without a recognized framework still get coverage.
  • Hardened orchestration: atomic run-dir creation (no TOCTOU on concurrent seeded runs), pack-manifest scenario discovery (no glob-scanning), path-traversal + symlink-escape rejection, applies_when filtering, manifest-name dedup with priority (project > node_modules > bundled), legacy bare-slug pack-name aliasing, agent-mode profile rejection until that driver lands, unrelated-broken-pack tolerance with structured warnings.
  • Structured RunResult with ok, runId, runDir, scenariosRun, findingsCount, capped error string (MAX_DETAIL_PER_KIND + "…+N more" truncation), and a warnings array for non-fatal diagnostics. Detail samples (pack_error_samples, scenario_error_samples, …) live in the run_finished audit event for auditors.
  • 42 TDD tests in packages/kit/test/run-cmd.test.ts — every behavior above is covered, written before the code existed.

Known scoped follow-ups (v1.7)

  • Real HTTP probe runner. Today's runScenario still uses the no-network probe stub (@aqa/runner's NO_NETWORK_PROBE). The release-gate "fail on any finding" semantic (from require_deterministic_replay: true) is deferred until probes hit a real SUT — every finding the stub produces is synthetic.
  • EventChainWriterverifyEventChain reconciliation. Writer omits prev_hash from the canonical body and emits null for seq=0; @aqa/compliance.verifyEventChain includes prev_hash and expects "0…". Tests ship a local writer-matching verifier; reconciling the two implementations is a separate cleanup.
  • Pack authoring story. v1.7 will add docs/PACK-AUTHORING.md (community tutorial), aqa pack new <slug> (scaffolding CLI), and an admin "Create pack" wizard — plus a full audit pass on every placeholder button in the admin panel so nothing renders as a "muted click".
  • Browser-driven ecosystem smoke. Playwright test that starts admin + runs aqa run against examples/bun-api and asserts findings appear in the admin UI end-to-end.

v1.5.0 — admin design integration

18 May 11:34
f7b879f

Choose a tag to compare

v1.5 — Admin design integration

The hi-fi prototype shipped by Claude Design (30 screens) is now the official admin web panel. The bundled prototype was ported to Vite + React 19 + TypeScript strict, all 30 screens render in production, and a Playwright suite drives every screen.

What landed

  • 30 screens, real markup — 8.9k LOC ported to packages/admin/src/app.tsx (bundled, @ts-nocheck for the prototype's design-tool conventions). Dark-themed, token-driven CSS.
  • Vite production build — replaced design-tool CDN React/Babel scripts with a regular Vite SPA. bun run dev boots in <500ms; bun run build ships a static bundle.
  • Playwright suitepackages/admin/test/e2e/*.e2e.ts covers per-screen smoke, audit chain verify (OK + tampered), Findings views (Clusters/List/Kanban), Replay tabs, risk-map matrix, theme, palette. Real DOM, no mocks. bun run test:e2e runs it.
  • CI gating — new E2E (Playwright, admin UI) job in .github/workflows/ci.yml builds the admin and runs the Playwright suite against the dev server.
  • Quality — Biome ignores the bundled prototype to keep lint targeted; smoke filter tolerates the prototype's intentional console.error demo calls.

Known scoped tradeoffs

  • In-memory routing only (the prototype was never URL-driven); reading window.location on boot is deferred to a follow-up.
  • Live-mode currently animates time but still reads in-file mock data; wiring VITE_AQA_SERVER_URL to a real fetch layer is deferred to the next macro task.

What's next (v1.6)

Full end-to-end ecosystem smoke via Playwright: boot server + runner pool + admin in a single command, drive a real aqa run against examples/bun-api, verify findings appear in the admin and the audit chain stays valid end-to-end.

v1.4.0 — Admin API surface + issue #3 closed

18 May 09:21
bb0f7c6

Choose a tag to compare

v1.4.0 — Admin API surface + issue #3 closed

Backend gap closure ahead of the parallel admin v2 design integration.

Server expansion

packages/server/src/api.ts makeApi() grows from 4 → 28 routes:

  • Runs: list, detail, events, create
  • Findings: list, detail, status mutation (with audit reason)
  • Packs: list, detail, install, uninstall
  • Profiles: list, detail, save, delete
  • Risks: list, detail, save, delete
  • Scenarios: list, detail, save
  • Audit: scoped event query
  • Cost: per-window summary aggregation
  • Queue: snapshot + runner tap
  • Notifications: list, mark-read
  • Saved views: list, save, delete
  • API tokens: list, create, revoke
  • Tenancy: list orgs, list projects, create org, create project

All routes are permission-gated via @aqa/auth, tenant-scoped via
x-aqa-org / x-aqa-project headers, and return shape-compliant
@aqa/schemas objects. Multi-tenant fail-closed: missing scope → 400;
cross-tenant ID lookup → 404 (so probing for IDs in other projects
gains no information).

Schemas

6 new @aqa/schemas namespaces with Draft 2020-12 JSON Schemas
emitted (schemas/v1/ now ships 15 files):

  • Notification
  • SavedView
  • ApiToken
  • CostSummary
  • Tenancy.Org + Tenancy.ProjectRef

Store

StoreProvider extended with 15+ methods covering the new endpoints.
MemoryStore implements all of them; PostgresStore retains the
explicit not implemented pattern so a misconfigured production
deployment fails loudly.

RunnerQueue gains snapshot(), requeue(id), kill(id) for the
admin queue ops screen.

Issue #3 closed

Three remaining Zod superRefines mirrored into JSON Schema:

  • Finding.status='duplicate' ⇒ duplicate_of required
  • ReproLevel.deterministic=true ⇒ attempts >= 1
  • ProfilesFile.profile.name === key (via $comment — cross-field)

Cross-field invariants JSON Schema cannot express
(duplicate_of !== id, successes === attempts,
finished_at >= started_at, profile.name === key) surfaced via
$comment on the emitted schemas.

Ajv 2020 round-trip test (packages/schemas/test/ajv-roundtrip.test.ts)
validates every fixture against the emitted schema — catches Zod ↔
JSON-Schema divergence at build time.

All 6 emitter patches now resolve the #/definitions/<name>
indirection that zod-to-json-schema emits.

Docs

  • docs/design/admin-panel-spec-v2.md — full enterprise design brief
    (tokens, 30 screens, component library, interactions, a11y, perf,
    deliverables) for the external designer who builds the React
    template in parallel.
  • docs/PROGRESS.md updated with v1.4 entry, the post-design Playwright
    smoke roadmap, and the final closing step (README + docs refresh
    pass: audit v0.x references, finalise quick-start, write the
    "How you use it" workflow section, prune obsolete docs).

Review loop

Codex + Copilot iterated 2 times before merge, surfacing and addressing
5 must-fix items across schema enum alignment, tenant-scope enforcement
on runs / findings detail, MemoryStore audit filter leniency on
unstamped events. CI 14/15 green throughout.

Numbers

  • 28 server routes (was 4)
  • 15 JSON Schemas (was 9)
  • 205 tests (was 165)
  • 19 packages (added @aqa/compliance previously; this release adds no new package)

PR: #22.

v1.3.0 — Quality batch

18 May 01:04
a1408f3

Choose a tag to compare

v1.3.0 — Quality batch

Six post-v1.2 polish items + an extended review-and-fix loop. No new packages; all quality / coverage / docs / correctness.

What landed

1. Admin server↔UI mapping

  • packages/admin/src/data/api.ts fetches from VITE_AQA_SERVER_URL (real @aqa/server shape) with explicit error surfacing — no silent mock fallback in live mode.
  • mapRun() / mapFinding() translate Run.Run (state, totals.findings, totals.llm_cost_usd) and Finding.Finding (status enum draft|verified|rejected|duplicate|fixed, verification_floor enum bug_level|scenario_level|agent_level, discovered_at) into the UI types. Screens stay source-agnostic.
  • live/mock badge + red error banner on Runs and Findings.

2. Admin sub-screens (6 detail routes)

/runs/$runId, /findings/$findingId, /risk-map/$riskId, /profiles/$profileName, /packs/$packSlug, /scenarios/$scenarioId. Each with Breadcrumb + PageHeader. Runs table rows are clickable links.

3. Admin unit tests (12 new, 176 total)

  • test/audit.test.ts (5): parseEventLines ×2, verifyEventChain ×3 (good chain, tampered, vacuous truth).
  • test/cluster.test.ts (6): signatureOf (identity, normalisation, divergence), clusterFindings (grouping, worst-severity, sort).

4. CLI E2E smoke gate

scripts/e2e-cli.mjs runs against a fresh tmpdir sandbox (seeded with a minimal package.json + aqa init). All four checks (--version, --help, doctor, validate) must exit 0. Wired into CI as a new e2e-cli job in .github/workflows/ci.yml.

5. Threat model expansion

docs/security/threat-model.md from 12-line stub to full STRIDE catalog: trust-boundary diagram, 20 specific threats with current mitigation + status, agentic-specific cross-cutting threats (tool-result poisoning, confirmation bypass, supply chain, cost-based DoS).

6. CHANGELOG.md backfill

Entries for v0.2.0 → v1.3.0 in Keep-a-Changelog format.

Review loop

Codex + Copilot review iterated 3 times before merge, surfacing and addressing 21 must-fix items across schema enum alignment, fake-live fallback, error-vs-not-found splitting, CLI E2E hardness, threat-model precision (S-03 narrower scope, D-01 / S-01 / I-03 downgraded from Mitigated to Partial / Unmitigated to reflect actual code). All inline comments addressed. CI 15/15 green.

PR: #20 + docs follow-up #21.