garrytan · garrytan · Mar 17, 2026 · Mar 16, 2026 · Mar 16, 2026 · Mar 16, 2026
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -202,6 +202,7 @@ Templates contain the workflows, tips, and examples that require human judgment.
 | `{{BROWSE_SETUP}}` | `gen-skill-docs.ts` | Binary discovery + setup instructions |
 | `{{BASE_BRANCH_DETECT}}` | `gen-skill-docs.ts` | Dynamic base branch detection for PR-targeting skills (ship, review, qa, plan-ceo-review) |
 | `{{QA_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared QA methodology block for /qa and /qa-only |
+| `{{DESIGN_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared design audit methodology for /plan-design-review and /qa-design-review |
 
 This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear.
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,20 @@
 # Changelog
 
+## 0.5.0 — 2026-03-16
+
+- **Your site just got a design review.** `/plan-design-review` opens your site and reviews it like a senior product designer — typography, spacing, hierarchy, color, responsive, interactions, and AI slop detection. Get letter grades (A-F) per category, a dual headline "Design Score" + "AI Slop Score", and a structured first impression that doesn't pull punches.
+- **It can fix what it finds, too.** `/qa-design-review` runs the same designer's eye audit, then iteratively fixes design issues in your source code with atomic `style(design):` commits and before/after screenshots. CSS-safe by default, with a stricter self-regulation heuristic tuned for styling changes.
+- **Know your actual design system.** Both skills extract your live site's fonts, colors, heading scale, and spacing patterns via JS — then offer to save the inferred system as a `DESIGN.md` baseline. Finally know how many fonts you're actually using.
+- **AI Slop detection is a headline metric.** Every report opens with two scores: Design Score and AI Slop Score. The AI slop checklist catches the 10 most recognizable AI-generated patterns — the 3-column feature grid, purple gradients, decorative blobs, emoji bullets, generic hero copy.
+- **Design regression tracking.** Reports write a `design-baseline.json`. Next run auto-compares: per-category grade deltas, new findings, resolved findings. Watch your design score improve over time.
+- **80-item design audit checklist** across 10 categories: visual hierarchy, typography, color/contrast, spacing/layout, interaction states, responsive, motion, content/microcopy, AI slop, and performance-as-design. Distilled from Vercel's 100+ rules, Anthropic's frontend design skill, and 6 other design frameworks.
+
+### For contributors
+
+- Added `{{DESIGN_METHODOLOGY}}` resolver to `gen-skill-docs.ts` — shared design audit methodology injected into both `/plan-design-review` and `/qa-design-review` templates, following the `{{QA_METHODOLOGY}}` pattern.
+- Added `~/.gstack-dev/plans/` as a local plans directory for long-range vision docs (not checked in). CLAUDE.md and TODOS.md updated.
+- Added `/setup-design-md` to TODOS.md (P2) for interactive DESIGN.md creation from scratch.
+
 ## 0.4.5 — 2026-03-16
 
 - **Review findings now actually get fixed, not just listed.** `/review` and `/ship` used to print informational findings (dead code, test gaps, N+1 queries) and then ignore them. Now every finding gets action: obvious mechanical fixes are applied automatically, and genuinely ambiguous issues are batched into a single question instead of 8 separate prompts. You see `[AUTO-FIXED] file:line Problem → what was done` for each auto-fix.

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -43,6 +43,8 @@ gstack/
 │   ├── skill-llm-eval.test.ts   # Tier 3: LLM-as-judge (~$0.15/run)
 │   └── skill-e2e.test.ts         # Tier 2: E2E via claude -p (~$3.85/run)
 ├── qa-only/         # /qa-only skill (report-only QA, no fixes)
+├── plan-design-review/  # /plan-design-review skill (report-only design audit)
+├── qa-design-review/    # /qa-design-review skill (design audit + fix loop)
 ├── ship/            # Ship workflow skill
 ├── review/          # PR review skill
 ├── plan-ceo-review/ # /plan-ceo-review skill
@@ -119,6 +121,12 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
 - No jargon: say "every question now tells you which project and branch you're in" not
   "AskUserQuestion format standardized across skill templates via preamble resolver."
 
+## Local plans
+
+Contributors can store long-range vision docs and design documents in `~/.gstack-dev/plans/`.
+These are local-only (not checked in). When reviewing TODOS.md, check `plans/` for candidates
+that may be ready to promote to TODOs or implement.
+
 ## E2E eval failure blame protocol
 
 When an E2E eval fails during `/ship` or any other workflow, **never claim "not

diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 **gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.**
 
-Ten opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, QA testing, engineering retrospectives, and post-ship documentation — all as slash commands.
+Twelve opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, design review, code review, one-command shipping, browser automation, QA testing, engineering retrospectives, and post-ship documentation — all as slash commands.
 
 ### Without gstack
 
@@ -19,11 +19,13 @@ Ten opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/
 |-------|------|--------------|
 | `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. |
 | `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. |
+| `/plan-design-review` | Senior product designer | Designer's eye audit. 80-item checklist, letter grades, AI Slop detection, DESIGN.md inference. Report only — never touches code. |
 | `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. |
 | `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. |
 | `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
 | `/qa` | QA + fix engineer | Test app, find bugs, fix them with atomic commits, re-verify. Before/after health scores and ship-readiness summary. Three tiers: Quick, Standard, Exhaustive. |
 | `/qa-only` | QA reporter | Report-only QA testing. Same methodology as /qa but never fixes anything. Use when you want a pure bug report without code changes. |
+| `/qa-design-review` | Designer + frontend engineer | Same design audit as /plan-design-review, then fixes what it finds. Atomic `style(design):` commits, before/after screenshots, CSS-safe self-regulation. |
 | `/setup-browser-cookies` | Session manager | Import cookies from your real browser (Comet, Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages without logging in manually. |
 | `/retro` | Engineering manager | Team-aware retro: your deep-dive + per-person praise and growth opportunities for every contributor. |
 | `/document-release` | Technical writer | Update README, ARCHITECTURE, CONTRIBUTING, and project docs to match what you just shipped. |
@@ -79,6 +81,30 @@ You:   /qa https://staging.myapp.com --quick
 
 Claude: [Smoke test: homepage + 5 pages, 30 seconds]
         Health Score: 91/100. No critical issues. 1 medium: mobile nav overlap.
+
+You:   /plan-design-review https://staging.myapp.com
+
+Claude: Design Score: B  |  AI Slop Score: C
+
+        "The site communicates competence but not confidence."
+        "I notice the spacing between sections is uniform — nothing feels important."
+        "The first 3 things my eye goes to are: the gradient hero, a 3-column
+         feature grid with icons in circles, and a cookie-cutter CTA."
+        "If I had to describe this in one word: template."
+
+        Top issues: generic typography (Inter everywhere), AI slop patterns
+        (3-column icon grid, purple gradient), flat heading scale.
+
+        [Full report with letter grades, 12 findings, inferred design system]
+        Want me to save this as your DESIGN.md?
+
+You:   /qa-design-review
+
+Claude: [Runs the same audit, then fixes 8 design issues]
+        style(design): FINDING-001 — replace Inter with system font stack
+        style(design): FINDING-003 — add heading scale contrast
+        style(design): FINDING-007 — break 3-column grid into asymmetric layout
+        Design Score: B → A-  |  AI Slop Score: C → A
 ```
 
 ## Who this is for
@@ -105,11 +131,11 @@ This is the setup I use. One person, ten parallel agents, each with the right co
 
 Open Claude Code and paste this. Claude will do the rest.
 
-> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /qa-only, /setup-browser-cookies, /retro, /document-release. Then ask the user if they also want to add gstack to the current project so teammates get it.
+> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /plan-design-review, /review, /ship, /browse, /qa, /qa-only, /qa-design-review, /setup-browser-cookies, /retro, /document-release. Then ask the user if they also want to add gstack to the current project so teammates get it.
 
 ### Step 2: Add to your repo so teammates get it (optional)
 
-> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /setup-browser-cookies, /retro, /document-release, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
+> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /plan-design-review, /review, /ship, /browse, /qa, /qa-only, /qa-design-review, /setup-browser-cookies, /retro, /document-release, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
 
 Real files get committed to your repo (not a submodule), so `git clone` just works. The binary and node\_modules are gitignored — teammates just need to run `cd .claude/skills/gstack && ./setup` once to build (or `/browse` handles it automatically on first use).
 
@@ -258,6 +284,99 @@ Not "make the idea smaller."
 
 ---
 
+## `/plan-design-review`
+
+This is my **senior designer mode**.
+
+Most developers cannot tell whether their site looks AI-generated. I could not, until I started paying attention. There is a growing class of sites that are functional but soulless — they work fine but scream "an AI built this and nobody with taste looked at it." Purple gradients, 3-column icon grids, uniform bubbly border-radius on everything, centered text on every section, decorative blobs floating in the background. The ChatGPT aesthetic.
+
+`/plan-design-review` gives the agent a designer's eye.
+
+It opens your site and reacts to it the way a Stripe or Linear designer would — immediately, viscerally, with opinions. The first output is a structured gut reaction: what the site communicates at a glance, what the eye is drawn to, and a one-word verdict. That is the most valuable part. Everything after is supporting evidence.
+
+Then it extracts your actual design system by reading the CSS: how many fonts are you really using? What is your heading scale? How many colors? Is your spacing on a grid or arbitrary? It structures this as an **Inferred Design System** and offers to save it as a `DESIGN.md` — so now you have a design system you never had before, derived from what you actually shipped.
+
+The audit covers 80 items across 10 categories. Typography, spacing, hierarchy, color, responsive design, interaction states, motion, content quality, performance feel, and — the headline feature — **AI Slop Detection**. Ten specific anti-patterns that mark a site as machine-generated.
+
+Every finding gets a letter grade (A through F) per category. The report opens with two headline scores: **Design Score** and **AI Slop Score**. "Your site got a B on design but a D on AI Slop" tells you exactly where to focus.
+
+### Example
+
+```
+You:   /plan-design-review https://myapp.com
+
+Claude: Design Score: C  |  AI Slop Score: D
+
+        "The site communicates generic SaaS energy."
+        "I notice the hero section uses a blue-to-purple gradient
+         with a 3-column feature grid below it — the single most
+         recognizable AI-generated layout pattern."
+        "The first 3 things my eye goes to are: the gradient, the
+         symmetrical icon grid, and a 'Get Started' button that
+         looks like every other 'Get Started' button."
+        "If I had to describe this in one word: template."
+
+        Inferred Design System:
+        - Fonts: Inter (body), Inter (headings) — 1 font family
+        - Colors: 14 unique non-gray colors (flag: >12)
+        - Heading scale: 36/24/18/16 — flat, no drama between sizes
+        - Spacing: mostly 16px and 24px — no rhythm
+
+        Top 5 Design Improvements:
+        1. Break the 3-column icon grid (High, AI Slop)
+        2. Add heading scale contrast — at least 2:1 ratio (High, Typography)
+        3. Replace uniform border-radius with intentional hierarchy (Medium, Spacing)
+        4. Kill the gradient hero — use a real image or bold typography (High, AI Slop)
+        5. Add a second font for headings — Inter-only reads as generic (Medium, Typography)
+
+        [Full report saved to .gstack/design-reports/]
+        Want me to save this inferred design system as your DESIGN.md?
+```
+
+This is report only — it never touches your code. Use `/qa-design-review` when you want it to fix what it finds.
+
+---
+
+## `/qa-design-review`
+
+This is my **designer who codes mode**.
+
+`/plan-design-review` tells you what is wrong. `/qa-design-review` fixes it.
+
+It runs the same 80-item audit, then enters a fix loop: for each design finding, it locates the source file, makes the minimal CSS/styling change, commits with `style(design): FINDING-NNN`, re-navigates to verify, and takes before/after screenshots. One commit per fix, fully bisectable.
+
+The self-regulation heuristic is tuned for design work — CSS-only changes get a free pass (they are inherently safe and reversible), but changes to component JSX/TSX files count against the risk budget. Hard cap at 30 fixes. If the risk score exceeds 20%, it stops and asks.
+
+```
+You:   /qa-design-review https://myapp.com
+
+Claude: [Runs full design audit — same output as /plan-design-review]
+        Design Score: C  |  AI Slop Score: D
+        12 findings (4 high, 5 medium, 3 polish)
+
+        Fixing 9 design issues...
+
+        style(design): FINDING-001 — replace 3-column icon grid with asymmetric layout
+        style(design): FINDING-002 — add heading scale 48/32/24/18/16
+        style(design): FINDING-003 — remove gradient hero, use bold typography
+        style(design): FINDING-004 — add second font for headings
+        style(design): FINDING-005 — vary border-radius by element role
+        style(design): FINDING-006 — left-align body text, reserve center for headings
+        style(design): FINDING-007 — add hover/focus states to all interactive elements
+        style(design): FINDING-008 — add prefers-reduced-motion media query
+        style(design): FINDING-009 — set max content width to 680px for body text
+
+        Final audit:
+        Design Score: C → B+  |  AI Slop Score: D → A
+        9 fixes applied (8 verified, 1 best-effort). 3 deferred.
+
+        [Report with before/after screenshots saved to .gstack/design-reports/]
+```
+
+Nine commits, each touching one concern. The AI Slop score went from D to A because the three most recognizable patterns (gradient hero, 3-column grid, uniform radius) are gone. The design score improved two grades because the typography now has a scale, the spacing has hierarchy, and interactive elements have proper states.
+
+---
+
 ## `/review`
 
 This is my **paranoid staff engineer mode**.
@@ -638,7 +757,7 @@ Or set `auto_upgrade: true` in `~/.gstack/config.yaml` to upgrade automatically
 
 Paste this into Claude Code:
 
-> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa qa-only setup-browser-cookies document-release; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa qa-only setup-browser-cookies document-release; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
+> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review plan-design-review review ship retro qa qa-only qa-design-review setup-browser-cookies document-release; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review plan-design-review review ship retro qa qa-only qa-design-review setup-browser-cookies document-release; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
 
 ## Development
 

diff --git a/TODOS.md b/TODOS.md
@@ -374,6 +374,14 @@
 **Priority:** P3
 **Depends on:** Ref staleness Parts 1+2 (shipped)
 
+## Design Review
+
+### /design-consultation interactive skill — SHIPPED
+
+~~**What:** Interactive skill that walks user through creating a DESIGN.md from scratch.~~
+
+Shipped as `/design-consultation` on garrytan/design branch. Renamed from `/setup-design-md` to reflect the consultant approach (agent proposes a complete coherent system, user adjusts). Includes competitive research via WebSearch, combined font+color preview page, coherence validation, and LLM-judged E2E tests.
+
 ## Document-Release
 
 ### Auto-invoke /document-release from /ship

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.4.5
+0.5.0