21 May 00:40

lopadova

697bbb9

v1.9.0 — Junior quick-start truthing + GitHub Packages publish Latest

Latest

What's new

Closes the junior quick-start truthing macro. The README's promised CLI surface now exists end-to-end, and the kit publishes to GitHub Packages as `@padosoft/agentic-qa-kit`.

CLI verbs newly wired (previously documented but unimplemented)

`aqa install-agent-files --targets claude,codex,gemini,copilot` — writes `CLAUDE.md` / `AGENTS.md` / `GEMINI.md` / `.github/copilot-instructions.md` plus per-agent skills. `--force` / `--dry-run` / `--project-name` flags. (PR #52)
`aqa report [--run-id ] [--format md|json|both]` — renders `events.jsonl` + `findings.jsonl` from a run into `report.md` + `report.json` (the same JSON shape the admin UI consumes). Defaults to the latest run by file mtime. (PR #53)
`aqa admin [--port ] [--host ]` — boots the admin SPA + `makeApi()` in a single Node process on `127.0.0.1:5173`. SPA + API ship in the kit tarball. In-memory store auto-seeded from `.aqa/runs/`. (PR #54)

Publishing

GitHub Packages publish pipeline. On every `v*` tag, `.github/workflows/publish.yml` runs the esbuild bundler (`dist/cli.cjs`, ~570 KB CJS-in-.cjs with every `@aqa/` workspace + npm dep inlined), swaps the package name from `@aqa/kit` to `@padosoft/agentic-qa-kit`, strips bundled `@aqa/` deps from the published manifest, and runs `npm publish --provenance --access public` against `https://npm.pkg.github.com\`. (PR #55)
New `@aqa/pack-author` workspace package extracted to break the `@aqa/kit` ↔ `@aqa/server` build cycle that emerged when kit started depending on server. Both kit and server depend on pack-author for `runPackNew` now; the cycle is gone from the dep graph entirely.

Docs

README + `docs/getting-started.md` rewritten 1:1 with shipped verbs. Adds the GH Packages auth `.npmrc` snippet (the single biggest junior trap on first install — public packages on GH Packages still require auth). 10-step quick-start, `aqa admin` single-command boot, `bun run e2e:ecosystem` pointer for monorepo contributors. (PR #56)

Consume flow (works once this release is published)

```bash

1. One-time GitHub Packages auth setup in your project

cat > .npmrc <<'NPMRC'
@padosoft:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN}
NPMRC
export GITHUB_TOKEN=ghp_XXXXXXXXXXXXXXXXXXXX # needs read:packages

2. Install + run

bun add -d @padosoft/agentic-qa-kit
bunx aqa init
bunx aqa install-agent-files --targets claude,codex,gemini,copilot
bunx aqa run --profile smoke
bunx aqa report
bunx aqa admin # → http://127.0.0.1:5173
```

Stats

12 packages (added: `@aqa/pack-author`)
270 tests pass (102 in kit, 89 in server, etc.)
5 sub-task PRs (#52 #53 #54 #55 #56), 11 Copilot review iterations across them
Macro PR #57

Full changelog

See CHANGELOG.md § 1.9.0.

Contributors

padosoft

Assets 2

20 May 11:34

lopadova

v1.8.3

374b2e3

v1.8.3 — live ecosystem e2e + roadmap closure

What's Changed

test(v1.8.3): live ecosystem e2e + roadmap closure sync by @lopadova in #51

Full Changelog: v1.8.2...v1.8.3

Contributors

lopadova

Assets 2

20 May 11:00

lopadova

v1.8.2

4d87647

v1.8.2 — ecosystem smoke e2e hardening

What's Changed

docs(v1.7.1): record slice 4g in PROGRESS.md by @lopadova in #43
feat(v1.7): wire Admin SSO config page end-to-end (slice 4h) by @lopadova in #44
feat(v1.7): enable SSO config save flow (slice 4i) by @lopadova in #45
feat(v1.7): autoload AuditChainViewer from /api/audit initial chain (slice 4j) by @lopadova in #46
feat(v1.8): real HTTP probe runner + release-gate strict findings by @lopadova in #47
docs(v1.x): README/docs closure refresh for current GA workflow by @lopadova in #48
fix(v1.8.1): align audit verifier with EventChainWriter canonical chain by @lopadova in #49
test(v1.8.2): extend CLI smoke to real HTTP run + artifact checks by @lopadova in #50

Full Changelog: v1.7.1...v1.8.2

Contributors

lopadova

Assets 2

20 May 08:06

lopadova

v1.7.1

77f3b1c

v1.7.1 — Users + Roles admin wiring

New GET /api/users (settings:read) returning the store user-directory snapshot.
New GET /api/roles (settings:read) returning the @aqa/auth role/permission matrix plus all_permissions from Permission.options.
New StoreProvider.listUsers() contract and shared StoreUserDirectoryEntry type in @aqa/store.
Admin PageUsers now reads /api/users with fixture fallback.
Admin PageRoles now reads /api/roles and renders the live permission matrix (admin:everything wildcard), with fixture fallback.
Tests added: server route coverage + admin users/roles e2e.

Tag points to commit 77f3b1c (PR #42).

Assets 2

20 May 03:12

lopadova

v1.7.0

15b9bc6

v1.7.0 — pack authoring + admin CRUD

v1.7.0 — Pack authoring + Admin CRUD

The final v1.7 release closes the loop on pack creation, full Profile/Risk/Scenario CRUD wizards in the admin, end-to-end Agents wiring, and live reads for every Operations + Admin page that has a matching server route.

Pack authoring (slices 1–3)

New community guide docs/PACK-AUTHORING.md.
New CLI aqa pack new <slug> (atomic backup-rename --force, slug + symlink + traversal rejection, in-memory schema validation of generated Scenario/RiskMap/PackManifest).
Admin Create-pack wizard wired over the new CLI (POST /api/packs/scaffold).

Admin CRUD (slice 4c)

Profiles: Delete (#29), Edit/Save (#30), Clone (#31). Atomic Store.createProfile. App-level deletedProfiles / updatedProfiles / createdProfiles Maps with aqa:profile-* CustomEvents. Tombstone-clear on re-create.
Risks: Delete (#32 — returns { id, deleted: true }), Edit/Save (#33). Mock risk + invariant ids migrated to schema-conforming dashed Slugs.
Scenarios: Delete (#34 — new Store.deleteScenario), Edit/Save (#35), Clone (#36 — atomic Store.createScenario), shared YAML-textarea Edit/Clone wizard (#37) with debounced parse + prototype-pollution guard. Mock scenario ids migrated to dashed slugs; explicit category field for tree grouping.

Agents (slice 4d)

New @aqa/schemas Agent with SafeRepoPath-validated files[] (rejects .., drive letters, UNC roots).
New agents:read / agents:edit permissions (agents:install aliased for back-compat).
New server routes: GET /api/agents, GET /api/agents/:id, POST /api/agents/:id/install, POST /api/agents/:id/uninstall.
PageAgents fetches the live list with fixture fallback; install/uninstall buttons wired with in-flight guard + toasts.

Operations + Admin sections (slices 4e, 4f)

PageAudit / PageAdminAudit → GET /api/audit with shared normalizeAuditEventsForViewer helper.
PageQueue → GET /api/queue with EnqueuedJob → fixture-shape adapter (filters terminal done jobs).
PageCost → GET /api/cost/summary with explicit MTD from/to, tenant headers padosoft/gescat.
PageNotifications → GET /api/notifications; SELF resolves from SESSION_USER.id; filter kinds derived from NotificationKind enum.
PageTokens → GET /api/tokens with ApiToken → fixture-shape adapter (heuristic owner-prefix → kind).
PageOrg → GET /api/orgs; subtitle joins live slugs.
fmtDate / fmtDateTime / fmtRelative made null-safe (em-dash for missing timestamps).
Users/Roles/SSO admin pages deferred — no server scaffolding yet.

Stats

13 PRs merged this slice (#28..#40 + #41 docs).
Server: 86 unit tests (+19 from v1.6).
Admin E2E: 132 tests (+45 from v1.6).
New schemas: Agent.
New permissions: agents:read, agents:edit.
New store methods: createProfile, createScenario, deleteScenario, listAgents, loadAgent, installAgent, uninstallAgent.

See `docs/PROGRESS.md` for the per-slice breakdown and architecture lessons.

Assets 2

18 May 20:57

lopadova

v1.7.0-rc.1

6cc0013

v1.7.0-rc.1 — pack authoring (slices 1+2) Pre-release

Pre-release

First release candidate for v1.7. Delivers the pack-authoring story (slices 1 and 2 of 4 planned).

What's in

📖 docs/PACK-AUTHORING.md

End-to-end tutorial for community pack authors:

Directory layout, manifest schema, scenarios + risks structure
Three distribution patterns (workspace pack / vendored copy / npm scope alias)
Programmatic validation (aqa validate)
Honest about current limitations: no-network NO_NETWORK_PROBE stub returns {probe_id, status: 200, body: null} for every probe kind, only http_status / response_contains / response_not_contains oracles are wired, no custom oracle/probe loader yet

🛠 aqa pack new

CLI to scaffold a runnable pack at <cwd>/packs/<slug>/:

aqa pack new pack-myapp --sut-type api
aqa pack new pack-frontend --sut-type web --description "Smoke tests for the marketing site"

The scaffold produces a starter scenario whose http_status: 200 oracle passes cleanly against the stub probe out of the box (avoiding the iter-17 footgun where bundled packs emitted synthetic findings). Supports:

--sut-type (api / web / cli / lib / agent / pipeline)
--force — atomic backup-rename overwrite (non-destructive on failure)
--description, --author, --license (SPDX)

Hardened against:

Symlinks at packs/ parent and packDir (lstatSync checks)
Non-directory parent (regular file at <root>/packs)
Over-length slugs (cap at 52 chars to keep every derived ID within the 64-char Slug schema cap)
TOCTOU on the existence check (uses non-recursive mkdir + explicit lstat)
Schema-invalid generated output (in-memory PackManifest/Scenario/RiskMap validation before writing)
Validation failures destroying the existing pack (backup-rename + restore-on-error)

🧪 Tests

54 in @aqa/kit (50 pack-new + 42 run-cmd subset). Lint + typecheck clean.

What's still pending in v1.7

Slice 3 — Admin "Create pack" wizard (future PR)
Slice 4 — Audit + wire/implement/document all 81 silent placeholder buttons across packages/admin/src/app.tsx (slices 4a–4f, future PRs; plan documented in docs/internal/admin-placeholder-audit.md)
Final v1.7.0 tag after slices 3 + 4 ship

Review

19 iterations on PR #25 with both Copilot and Codex review bots. All real issues addressed; final iter returned 0 new must-fix items.

Assets 2

18 May 16:36

lopadova

v1.6.0

21d7b10

v1.6.0 — aqa run CLI + bundled packs

v1.6 — `aqa run` + ecosystem foundation

The missing piece between aqa init and a real audit trail. After 21 review iterations with Copilot + Codex (every one surfaced a real bug or coverage gap, zero false alarms), the inner loop is end-to-end usable.

What lands

aqa run CLI command. Loads .aqa/project.yaml + .aqa/profiles.yaml via the canonical @aqa/schemas shapes, resolves packs from three discovery tiers (project's packs/*, node_modules/@aqa/*, kit-bundled dist/packs/*), filters scenarios by the selected profile's tags, and runs each one via @aqa/runner.runScenario. Streams events + findings into .aqa/runs/<run_id>/.
Flags: --profile <name> (defaults to smoke if present, else first profile) and --seed <string> (deterministic run_id for tests + replay).
Bundled packs. All 5 baseline packs (pack-core, pack-api-core, pack-web-ui, pack-llm-agent, pack-security) now ship inside @aqa/kit's npm tarball via a bundle-packs.mjs build step. A fresh aqa init + aqa run --profile smoke works with only @aqa/kit installed.
SUT-aware init. aqa init picks the right packs from the detected sut_type (api → pack-api-core, web → pack-web-ui, agent → pack-llm-agent, else → pack-core). The framework clause on pack-api-core was dropped so plain Node/Bun APIs without a recognized framework still get coverage.
Hardened orchestration: atomic run-dir creation (no TOCTOU on concurrent seeded runs), pack-manifest scenario discovery (no glob-scanning), path-traversal + symlink-escape rejection, applies_when filtering, manifest-name dedup with priority (project > node_modules > bundled), legacy bare-slug pack-name aliasing, agent-mode profile rejection until that driver lands, unrelated-broken-pack tolerance with structured warnings.
Structured RunResult with ok, runId, runDir, scenariosRun, findingsCount, capped error string (MAX_DETAIL_PER_KIND + "…+N more" truncation), and a warnings array for non-fatal diagnostics. Detail samples (pack_error_samples, scenario_error_samples, …) live in the run_finished audit event for auditors.
42 TDD tests in packages/kit/test/run-cmd.test.ts — every behavior above is covered, written before the code existed.

Known scoped follow-ups (v1.7)

Real HTTP probe runner. Today's runScenario still uses the no-network probe stub (@aqa/runner's NO_NETWORK_PROBE). The release-gate "fail on any finding" semantic (from require_deterministic_replay: true) is deferred until probes hit a real SUT — every finding the stub produces is synthetic.
EventChainWriter ↔ verifyEventChain reconciliation. Writer omits prev_hash from the canonical body and emits null for seq=0; @aqa/compliance.verifyEventChain includes prev_hash and expects "0…". Tests ship a local writer-matching verifier; reconciling the two implementations is a separate cleanup.
Pack authoring story. v1.7 will add docs/PACK-AUTHORING.md (community tutorial), aqa pack new <slug> (scaffolding CLI), and an admin "Create pack" wizard — plus a full audit pass on every placeholder button in the admin panel so nothing renders as a "muted click".
Browser-driven ecosystem smoke. Playwright test that starts admin + runs aqa run against examples/bun-api and asserts findings appear in the admin UI end-to-end.

Assets 2

18 May 11:34

lopadova

v1.5.0

f7b879f

v1.5.0 — admin design integration

v1.5 — Admin design integration

The hi-fi prototype shipped by Claude Design (30 screens) is now the official admin web panel. The bundled prototype was ported to Vite + React 19 + TypeScript strict, all 30 screens render in production, and a Playwright suite drives every screen.

What landed

30 screens, real markup — 8.9k LOC ported to packages/admin/src/app.tsx (bundled, @ts-nocheck for the prototype's design-tool conventions). Dark-themed, token-driven CSS.
Vite production build — replaced design-tool CDN React/Babel scripts with a regular Vite SPA. bun run dev boots in <500ms; bun run build ships a static bundle.
Playwright suite — packages/admin/test/e2e/*.e2e.ts covers per-screen smoke, audit chain verify (OK + tampered), Findings views (Clusters/List/Kanban), Replay tabs, risk-map matrix, theme, palette. Real DOM, no mocks. bun run test:e2e runs it.
CI gating — new E2E (Playwright, admin UI) job in .github/workflows/ci.yml builds the admin and runs the Playwright suite against the dev server.
Quality — Biome ignores the bundled prototype to keep lint targeted; smoke filter tolerates the prototype's intentional console.error demo calls.

Known scoped tradeoffs

In-memory routing only (the prototype was never URL-driven); reading window.location on boot is deferred to a follow-up.
Live-mode currently animates time but still reads in-file mock data; wiring VITE_AQA_SERVER_URL to a real fetch layer is deferred to the next macro task.

What's next (v1.6)

Full end-to-end ecosystem smoke via Playwright: boot server + runner pool + admin in a single command, drive a real aqa run against examples/bun-api, verify findings appear in the admin and the audit chain stays valid end-to-end.

Assets 2

18 May 09:21

lopadova

v1.4.0

bb0f7c6

v1.4.0 — Admin API surface + issue #3 closed

Backend gap closure ahead of the parallel admin v2 design integration.

Server expansion

packages/server/src/api.ts makeApi() grows from 4 → 28 routes:

Runs: list, detail, events, create
Findings: list, detail, status mutation (with audit reason)
Packs: list, detail, install, uninstall
Profiles: list, detail, save, delete
Risks: list, detail, save, delete
Scenarios: list, detail, save
Audit: scoped event query
Cost: per-window summary aggregation
Queue: snapshot + runner tap
Notifications: list, mark-read
Saved views: list, save, delete
API tokens: list, create, revoke
Tenancy: list orgs, list projects, create org, create project

All routes are permission-gated via @aqa/auth, tenant-scoped via
x-aqa-org / x-aqa-project headers, and return shape-compliant
@aqa/schemas objects. Multi-tenant fail-closed: missing scope → 400;
cross-tenant ID lookup → 404 (so probing for IDs in other projects
gains no information).

Schemas

6 new @aqa/schemas namespaces with Draft 2020-12 JSON Schemas
emitted (schemas/v1/ now ships 15 files):

Notification
SavedView
ApiToken
CostSummary
Tenancy.Org + Tenancy.ProjectRef

Store

StoreProvider extended with 15+ methods covering the new endpoints.
MemoryStore implements all of them; PostgresStore retains the
explicit not implemented pattern so a misconfigured production
deployment fails loudly.

RunnerQueue gains snapshot(), requeue(id), kill(id) for the
admin queue ops screen.

Issue #3 closed

Three remaining Zod superRefines mirrored into JSON Schema:

Finding.status='duplicate' ⇒ duplicate_of required
ReproLevel.deterministic=true ⇒ attempts >= 1
ProfilesFile.profile.name === key (via $comment — cross-field)

Cross-field invariants JSON Schema cannot express
(duplicate_of !== id, successes === attempts,
finished_at >= started_at, profile.name === key) surfaced via
$comment on the emitted schemas.

Ajv 2020 round-trip test (packages/schemas/test/ajv-roundtrip.test.ts)
validates every fixture against the emitted schema — catches Zod ↔
JSON-Schema divergence at build time.

All 6 emitter patches now resolve the #/definitions/<name>
indirection that zod-to-json-schema emits.

Docs

docs/design/admin-panel-spec-v2.md — full enterprise design brief
(tokens, 30 screens, component library, interactions, a11y, perf,
deliverables) for the external designer who builds the React
template in parallel.
docs/PROGRESS.md updated with v1.4 entry, the post-design Playwright
smoke roadmap, and the final closing step (README + docs refresh
pass: audit v0.x references, finalise quick-start, write the
"How you use it" workflow section, prune obsolete docs).

Review loop

Codex + Copilot iterated 2 times before merge, surfacing and addressing
5 must-fix items across schema enum alignment, tenant-scope enforcement
on runs / findings detail, MemoryStore audit filter leniency on
unstamped events. CI 14/15 green throughout.

Numbers

28 server routes (was 4)
15 JSON Schemas (was 9)
205 tests (was 165)
19 packages (added @aqa/compliance previously; this release adds no new package)

PR: #22.

Assets 2

18 May 01:04

lopadova

v1.3.0

a1408f3

v1.3.0 — Quality batch

Six post-v1.2 polish items + an extended review-and-fix loop. No new packages; all quality / coverage / docs / correctness.

What landed

1. Admin server↔UI mapping

packages/admin/src/data/api.ts fetches from VITE_AQA_SERVER_URL (real @aqa/server shape) with explicit error surfacing — no silent mock fallback in live mode.
mapRun() / mapFinding() translate Run.Run (state, totals.findings, totals.llm_cost_usd) and Finding.Finding (status enum draft|verified|rejected|duplicate|fixed, verification_floor enum bug_level|scenario_level|agent_level, discovered_at) into the UI types. Screens stay source-agnostic.
live/mock badge + red error banner on Runs and Findings.

2. Admin sub-screens (6 detail routes)

/runs/$runId, /findings/$findingId, /risk-map/$riskId, /profiles/$profileName, /packs/$packSlug, /scenarios/$scenarioId. Each with Breadcrumb + PageHeader. Runs table rows are clickable links.

3. Admin unit tests (12 new, 176 total)

test/audit.test.ts (5): parseEventLines ×2, verifyEventChain ×3 (good chain, tampered, vacuous truth).
test/cluster.test.ts (6): signatureOf (identity, normalisation, divergence), clusterFindings (grouping, worst-severity, sort).

4. CLI E2E smoke gate

scripts/e2e-cli.mjs runs against a fresh tmpdir sandbox (seeded with a minimal package.json + aqa init). All four checks (--version, --help, doctor, validate) must exit 0. Wired into CI as a new e2e-cli job in .github/workflows/ci.yml.

5. Threat model expansion

docs/security/threat-model.md from 12-line stub to full STRIDE catalog: trust-boundary diagram, 20 specific threats with current mitigation + status, agentic-specific cross-cutting threats (tool-result poisoning, confirmation bypass, supply chain, cost-based DoS).

6. CHANGELOG.md backfill

Entries for v0.2.0 → v1.3.0 in Keep-a-Changelog format.

Review loop

Codex + Copilot review iterated 3 times before merge, surfacing and addressing 21 must-fix items across schema enum alignment, fake-live fallback, error-vs-not-found splitting, CLI E2E hardness, threat-model precision (S-03 narrower scope, D-01 / S-01 / I-03 downgraded from Mitigated to Partial / Unmitigated to reflect actual code). All inline comments addressed. CI 15/15 green.

PR: #20 + docs follow-up #21.

Assets 2

Releases: padosoft/agentic-qa-kit

v1.9.0 — Junior quick-start truthing + GitHub Packages publish

What's new

CLI verbs newly wired (previously documented but unimplemented)

Publishing

Docs

Consume flow (works once this release is published)

1. One-time GitHub Packages auth setup in your project

2. Install + run

Stats

Full changelog

Contributors

Uh oh!

v1.8.3 — live ecosystem e2e + roadmap closure

What's Changed

Contributors

Uh oh!

v1.8.2 — ecosystem smoke e2e hardening

What's Changed

Contributors

Uh oh!

v1.7.1 — Users + Roles admin wiring

Uh oh!

v1.7.0 — pack authoring + admin CRUD

v1.7.0 — Pack authoring + Admin CRUD

Pack authoring (slices 1–3)

Admin CRUD (slice 4c)

Agents (slice 4d)

Operations + Admin sections (slices 4e, 4f)

Stats

Uh oh!

v1.7.0-rc.1 — pack authoring (slices 1+2)

What's in

📖 docs/PACK-AUTHORING.md

🛠 aqa pack new

🧪 Tests

What's still pending in v1.7

Review

Uh oh!

v1.6.0 — aqa run CLI + bundled packs

v1.6 — aqa run + ecosystem foundation

What lands

Known scoped follow-ups (v1.7)

Uh oh!

v1.5.0 — admin design integration

v1.5 — Admin design integration

What landed

Known scoped tradeoffs

What's next (v1.6)

Uh oh!

v1.4.0 — Admin API surface + issue #3 closed

v1.4.0 — Admin API surface + issue #3 closed

Server expansion

Schemas

Store

Issue #3 closed

Docs

Review loop

Numbers

Uh oh!

v1.3.0 — Quality batch

v1.3.0 — Quality batch

What landed

1. Admin server↔UI mapping

2. Admin sub-screens (6 detail routes)

3. Admin unit tests (12 new, 176 total)

4. CLI E2E smoke gate

5. Threat model expansion

6. CHANGELOG.md backfill

Review loop

Uh oh!

v1.6 — `aqa run` + ecosystem foundation