Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
a9005e7
docs: add architecture and placeholder reference to CLAUDE.md
boinger Mar 18, 2026
3664832
test: add codebase-audit to skill validation and evals
boinger Mar 20, 2026
01407ce
docs: add codebase-audit to README, skill docs, and changelog
boinger Mar 20, 2026
b723f76
feat: add post-audit next steps (fix, quick-fix, or done)
boinger Mar 20, 2026
9913bca
fix: route substantive audit fixes through gstack review pipeline
boinger Mar 20, 2026
5a4bdfb
fix: enforce review pipeline for substantive fixes, reduce grep verbo…
boinger Mar 21, 2026
e6b2dde
fix: explicit plan mode exit before report generation
boinger Mar 21, 2026
baa6b99
refactor: rewrite post-audit flow to match gstack review chaining pat…
boinger Mar 21, 2026
0740a03
fix: use Bash heredoc for report writing, ban content-mode grep
boinger Mar 21, 2026
7cc298f
fix: context-dependent review recommendation in plan banner
boinger Mar 21, 2026
c5e47b9
fix: apply mechanical fixes before plan, plan contains only substanti…
boinger Mar 21, 2026
1cba305
refactor: embrace plan mode — audit is planning-for-a-plan
boinger Mar 21, 2026
9815c4d
feat: add review chaining AskUserQuestion after plan is written
boinger Mar 21, 2026
d685d7a
chore: regenerate Codex SKILL.md files after rebase
boinger Mar 21, 2026
a290cde
fix: rule numbering and changelog language
boinger Mar 21, 2026
07c2feb
fix: force Skill tool invocation before plan mode takes over
boinger Mar 21, 2026
23520da
fix: patch 4 bugs found by running /codebase-audit on itself
boinger Mar 21, 2026
969a3f6
chore: regenerate SKILL.md files and restore missing audit assets aft…
boinger Mar 22, 2026
a036b17
chore: regenerate SKILL.md files after rebase onto upstream/main
boinger Mar 23, 2026
8f6fc84
fix: update CI workflow to match upstream skill-docs.yml
boinger Mar 25, 2026
de14b0e
chore: merge upstream/main to resolve conflicts
boinger Mar 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
765 changes: 765 additions & 0 deletions .agents/skills/gstack-codebase-audit/SKILL.md

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -542,7 +542,7 @@ Thanks to @osc, @Explorer1092, @Qike-Li, @francoisaubert1, @itstimwhite, @yinanl
- **Claude now has an adversarial mode.** A fresh Claude subagent with no checklist bias reviews your code like an attacker — finding edge cases, race conditions, security holes, and silent data corruption that the structured review might miss. Findings are classified as FIXABLE (auto-fixed) or INVESTIGATE (your call).
- **Review dashboard shows "Adversarial" instead of "Codex Review."** The dashboard row reflects the new multi-model reality — it tracks whichever adversarial passes actually ran, not just Codex.

## [0.9.5.0] - 2026-03-21 — Builder Ethos
## [0.9.5.0] - 2026-03-21 — Builder Ethos + Codebase Audit

### Added

Expand All @@ -554,6 +554,7 @@ Thanks to @osc, @Explorer1092, @Qike-Li, @francoisaubert1, @itstimwhite, @yinanl
- **`/investigate` searches on hypothesis failure.** When your first debugging hypothesis is wrong, gstack searches for the exact error message and known framework issues before guessing again.
- **`/design-consultation` three-layer synthesis.** Competitive research now uses the structured Layer 1/2/3 framework to find where your product should deliberately break from category norms.
- **CEO review saves context when handing off to `/office-hours`.** When `/plan-ceo-review` suggests running `/office-hours` first, it now saves a handoff note with your system audit findings and any discussion so far. When you come back and re-invoke `/plan-ceo-review`, it picks up that context automatically — no more starting from scratch.
- **`/codebase-audit` — full codebase health check with a fix pipeline.** Run it against any project — new to you, old code, or code you wrote yesterday — and get a structured audit: bugs, security issues, architecture problems, tech debt, test gaps, and improvement opportunities. When it's done, it writes a fix plan and offers to chain into `/plan-eng-review` for the substantive items. Three modes: full audit, quick smoke test (2 min), and regression (diff against previous audit with score tracking). Includes health scoring (100-point scale, calibrated against real projects), dependency CVE scanning, git churn analysis, and machine-readable baseline output.

## [0.9.4.1] - 2026-03-20

Expand Down
83 changes: 83 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# CLAUDE.md

Project instructions for Claude Code working on gstack — a skill and tooling suite for Claude Code.

# gstack development

## Commands
Expand Down Expand Up @@ -307,3 +311,82 @@ The active skill lives at `~/.claude/skills/gstack/`. After making changes:
3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`

Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`

## Template placeholder reference

The generator (`scripts/gen-skill-docs.ts`) resolves `{{PLACEHOLDER}}` tokens in
`.tmpl` files. The full set (defined in `RESOLVERS` at line ~1090):

| Placeholder | Source | What it expands to |
|---|---|---|
| `COMMAND_REFERENCE` | `browse/src/commands.ts` | Categorized command table (Navigation, Reading, etc.) |
| `SNAPSHOT_FLAGS` | `browse/src/snapshot.ts` | Flag reference table for `snapshot` command |
| `PREAMBLE` | inline in generator | Shared skill preamble (session awareness, project context) |
| `BROWSE_SETUP` | inline in generator | `$B` alias setup + binary detection block |
| `BASE_BRANCH_DETECT` | inline in generator | Shell snippet to detect `main`/`master` dynamically |
| `QA_METHODOLOGY` | inline in generator | QA health rubric + bug-severity taxonomy |
| `DESIGN_METHODOLOGY` | inline in generator | Design review rubric + severity levels |
| `DESIGN_REVIEW_LITE` | inline in generator | Lightweight design review pass for `/review` |
| `REVIEW_DASHBOARD` | inline in generator | Review summary dashboard format |
| `TEST_BOOTSTRAP` | inline in generator | Test discovery + run block for `/ship` |

To add a new placeholder: add a resolver function + entry in `RESOLVERS`.

## Browse architecture

The `browse/` subsystem is a client-server headless browser built on Playwright.

**Daemon model:** The CLI (`cli.ts`) is a thin HTTP client. The server (`server.ts`)
is a persistent Chromium daemon that stays alive across commands (auto-shutdown
after 30 min idle). CLI auto-starts/restarts the server as needed.

**State file:** `.gstack/browse.json` stores `{ pid, port, token, startedAt,
binaryVersion }`. The CLI reads this to find the server; the server writes it on
startup. Token is a random UUID for auth.

**Command dispatch:** Commands are split into 3 sets in `commands.ts`:
- `READ_COMMANDS` — page inspection (text, html, links, js, console, etc.)
- `WRITE_COMMANDS` — page mutation (goto, click, fill, scroll, etc.)
- `META_COMMANDS` — server/tab/visual ops (screenshot, tabs, snapshot, chain, etc.)

The server routes each command to `handleReadCommand`, `handleWriteCommand`, or
`handleMetaCommand` based on set membership.

**Ref system:** `snapshot` assigns `@e1`/`@e2`/`@c1` refs to elements, stored in
`BrowserManager.refMap`. Later commands resolve `@e3` → the Playwright `Locator`
from the last snapshot. `-C` flag adds `@c` refs for non-ARIA clickable elements.

**Logging:** 3 `CircularBuffer` instances (console, network, dialog) in `buffers.ts`,
flushed to `.gstack/browse-{console,network,dialog}.log`. Ring buffer with fixed
capacity — old entries are overwritten.

**Tests:** Direct handler invocation against `BrowserManager` (no HTTP layer),
using a shared test server in `browse/test/test-server.ts`.

## Test infrastructure internals

Key helpers in `test/helpers/` that the test system depends on:

**`touchfiles.ts`** — Diff-based test selection. Maps test names → file glob
patterns. `selectTests()` checks `git diff` against base branch, runs only tests
whose dependencies changed. `GLOBAL_TOUCHFILES` (session-runner, eval-store,
llm-judge, gen-skill-docs, touchfiles itself, test-server) trigger all tests.

**`session-runner.ts`** — Spawns `claude -p` as a subprocess (not Agent SDK),
streams NDJSON output for real-time progress. Returns `SkillTestResult` with tool
calls, browse errors, cost estimate, exit reason, and full transcript.

**`eval-store.ts`** — `EvalCollector` accumulates test results, writes them to
`~/.gstack-dev/evals/{version}-{branch}-{tier}-{timestamp}.json`. Prints summary
table and auto-compares with the previous run. Comparison functions exported for
`eval:compare` CLI.

**`llm-judge.ts`** — Two judge types via `callJudge()` (claude-sonnet-4-6):
- `judge()` — doc quality scorer (clarity, completeness, actionability; 1-5 each)
- `outcomeJudge()` — planted-bug detection scorer (detection rate, false positives,
evidence quality)

**Observability files:**
- `~/.gstack-dev/e2e-live.json` — heartbeat updated during E2E runs
- Partial results persisted during long runs for crash recovery
- NDJSON transcripts saved per-test for debugging
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,11 @@ Fork it. Improve it. Make it yours. And if you want to hate on free open source

Open Claude Code and paste this. Claude does the rest.

> Install gstack: run **`git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /cso, /autoplan, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade. Then ask the user if they also want to add gstack to the current project so teammates get it.
> Install gstack: run **`git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codebase-audit, /codex, /cso, /autoplan, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade. Then ask the user if they also want to add gstack to the current project so teammates get it.

### Step 2: Add to your repo so teammates get it (optional)

> Add gstack to this project: run **`cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`** then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /cso, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
> Add gstack to this project: run **`cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`** then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codebase-audit, /codex, /cso, /autoplan, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.

Real files get committed to your repo (not a submodule), so `git clone` just works. Everything lives inside `.claude/`. Nothing touches your PATH or runs in the background.

Expand Down Expand Up @@ -156,6 +156,7 @@ Each skill feeds into the next. `/office-hours` writes a design doc that `/plan-
| `/canary` | **SRE** | Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures. |
| `/benchmark` | **Performance Engineer** | Baseline page load times, Core Web Vitals, and resource sizes. Compare before/after on every PR. |
| `/document-release` | **Technical Writer** | Update all project docs to match what you just shipped. Catches stale READMEs automatically. |
| `/codebase-audit` | **Code Auditor** | Full codebase audit from cold. Finds bugs, security issues, tech debt, architecture problems, and test gaps. Report only — never touches code. |
| `/retro` | **Eng Manager** | Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. `/retro global` runs across all your projects and AI tools (Claude Code, Codex, Gemini). |
| `/browse` | **QA Engineer** | Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command. `$B connect` launches your real Chrome as a headed window — watch every action live. |
| `/setup-browser-cookies` | **Session Manager** | Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages. |
Expand Down Expand Up @@ -266,8 +267,8 @@ Use /browse from gstack for all web browsing. Never use mcp__claude-in-chrome__*
Available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review,
/design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse,
/qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro,
/investigate, /document-release, /codex, /cso, /autoplan, /careful, /freeze, /guard,
/unfreeze, /gstack-upgrade.
/investigate, /document-release, /codebase-audit, /codex, /cso, /autoplan, /careful,
/freeze, /guard, /unfreeze, /gstack-upgrade.
```

## License
Expand Down
Loading