garrytan · schneidermr · Mar 20, 2026 · Mar 20, 2026 · Mar 20, 2026 · Mar 20, 2026
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -351,6 +351,33 @@ The `EvalCollector` accumulates test results and writes them in two ways:
 
 Tier 1 runs on every `bun test`. Tiers 2+3 are gated behind `EVALS=1`. The idea: catch 95% of issues for free, use LLMs only for judgment calls and integration testing.
 
+## Synthetic memory
+
+Long-running skills (`/review`, `/qa`, `/investigate`) accumulate findings, decisions, and session state that exceed the context window. When Claude compacts earlier messages, critical details are silently lost — specific findings get summarized away, user decisions are forgotten, and the agent re-tests hypotheses it already disproved.
+
+The synthetic memory layer uses two storage locations — session state is private, team knowledge is shareable:
+
+```
+~/.gstack/projects/$SLUG/           ← private, per-user session state
+├── state.md                         ← skill, phase, turn (plain markdown)
+├── findings-$BRANCH.md              ← branch-scoped finding registry
+├── handoff.md                       ← inter-skill context transfer
+└── $BRANCH-reviews.jsonl            ← upstream review/ship logs (unchanged)
+
+.gstack/                             ← repo-level, optionally committed
+├── decisions.log                    ← append-only user decision log
+└── anti-patterns.md                 ← failed fixes (never re-attempt)
+```
+
+**Key design decisions:**
+- **Two layers.** Session state (ephemeral, single-user) lives in `~/.gstack/` alongside upstream's existing JSONL. Team knowledge (decisions, anti-patterns) lives in `.gstack/` where teams can optionally commit it.
+- **Markdown everywhere.** Claude writes markdown reliably; JSON with arrays unreliably. A corrupted markdown line doesn't break the file. A corrupted JSON bracket does.
+- **Branch-scoped findings.** `findings-feat-auth.md` and `findings-feat-payments.md` don't interfere. Uses the same `$SLUG/$BRANCH` scoping as upstream's review logs.
+- **Checkpoint = print, not copy.** Every 5 tool calls, re-read files and print status to re-inject state. No file snapshots — the value is in the context injection.
+- **Anti-patterns from PR #403.** Failed fix attempts are recorded so future `/investigate` sessions never re-attempt the same broken approach.
+
+The protocol is defined in `lib/memory.md` and included by reference in each skill's SKILL.md.tmpl. Scripts in `scripts/` handle initialization, status display, and reset.
+
 ## What's intentionally not here
 
 - **No WebSocket streaming.** HTTP request/response is simpler, debuggable with curl, and fast enough. Streaming would add complexity for marginal benefit.

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,16 @@
 # Changelog
 
+## [0.11.19.0] - 2026-03-25 — Synthetic Memory
+
+### Added
+
+- **Skills now remember what they found — even when the context window forgets.** During long `/review`, `/qa`, and `/investigate` sessions, Claude's context window silently compresses away details like specific findings, user decisions, and what's been checked. Now, every finding is written to `.gstack/findings.md` immediately, every decision goes to `.gstack/decisions.log`, and a checkpoint re-syncs state every 5 tool calls. The files are the source of truth, not memory.
+- **`/ship` blocks on unresolved P0 findings.** If `/review` or `/qa` found a critical bug and it wasn't fixed, `/ship` will catch it and block — even if the finding was discovered in a previous session.
+- **Skills hand off context to each other.** When `/review` finishes, it writes a summary to `.gstack/handoff.md` that `/qa` picks up automatically. No more "I already told you about those bugs in the review."
+- **`/investigate` tracks hypotheses and fix attempts on disk.** After context compaction, the agent won't re-test a hypothesis it already disproved or forget how many fixes it's tried. The Iron Law's "3 failed fixes" rule now actually works across compaction boundaries.
+- **`/retro` incorporates findings patterns.** Weekly retrospectives now include systemic issue detection from accumulated findings (e.g., "4 SQL injection findings across sessions — consider a linter rule").
+- **New utility scripts:** `gstack-status` shows current session state at a glance, `gstack-reset` archives and clears memory for a fresh start.
+
 ## [0.11.18.2] - 2026-03-24
 
 ### Fixed

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -91,9 +91,12 @@ gstack/
 ├── retro/           # Retrospective skill (includes /retro global cross-project mode)
 ├── bin/             # CLI utilities (gstack-repo-mode, gstack-slug, gstack-config, etc.)
 ├── document-release/ # /document-release skill (post-ship doc updates)
+├── lib/             # Shared instruction fragments for skill templates
+│   └── memory.md    # Synthetic memory protocol (included by reference)
 ├── cso/             # /cso skill (OWASP Top 10 + STRIDE security audit)
 ├── design-consultation/ # /design-consultation skill (design system from scratch)
 ├── setup-deploy/    # /setup-deploy skill (one-time deploy config)
+├── scripts/         # Build + DX tooling (also init-memory.sh, gstack-status.sh, gstack-reset.sh)
 ├── .github/         # CI workflows + Docker image
 │   ├── workflows/   # evals.yml (E2E on Ubicloud), skill-docs.yml, actionlint.yml
 │   └── docker/      # Dockerfile.ci (pre-baked toolchain + Playwright/Chromium)
@@ -307,3 +310,26 @@ The active skill lives at `~/.claude/skills/gstack/`. After making changes:
 3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`
 
 Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`
+
+## gstack Synthetic Memory
+
+gstack uses two storage layers for file-backed memory that survives context
+window compaction during long sessions:
+
+**Session state** (`~/.gstack/projects/$SLUG/`, private, per-user):
+- `state.md` — current skill, phase, turn count (plain markdown)
+- `findings-$BRANCH.md` — branch-scoped finding registry (source of truth)
+- `handoff.md` — skill-to-skill context transfer (deleted after consumption)
+
+**Team knowledge** (`.gstack/` in project root, optionally committed):
+- `decisions.log` — append-only user decision audit trail
+- `anti-patterns.md` — failed fix attempts that should never be re-tried
+
+Key rules:
+- `findings-$BRANCH.md` is the source of truth for all findings — not conversation
+- Skills run checkpoints every 5 tool calls to re-inject state into context
+- `/ship` reads findings and blocks on unresolved P0 issues
+- `/investigate` searches anti-patterns before attempting any fix
+
+Skills auto-initialize via `scripts/init-memory.sh`. Session state uses the same
+`$SLUG/$BRANCH` scoping as upstream's review JSONL. The full protocol is in `lib/memory.md`.
diff --git a/TODOS.md b/TODOS.md
@@ -311,6 +311,18 @@ Linux cookie import shipped in v0.11.11.0 (Wave 3). Supports Chrome, Chromium, B
 
 ## Infrastructure
 
+### Add touchfiles for synthetic memory E2E tests
+
+**What:** Update `test/helpers/touchfiles.ts` so changes to `lib/memory.md`, `scripts/init-memory.sh`, or `scripts/gstack-*.sh` trigger relevant E2E tests (qa-*, review-*, investigate-*).
+
+**Why:** Currently, changes to memory protocol files don't trigger any E2E tests via diff-based selection. A regression in memory.md wording could silently break skill behavior.
+
+**Context:** Added as part of the synthetic memory layer (v0.9.5.0). Deferred because memory-specific E2E tests don't exist yet — adding touchfiles without corresponding tests would just trigger unrelated E2E runs.
+
+**Effort:** S
+**Priority:** P3
+**Depends on:** Memory-specific E2E test cases
+
 ### /setup-gstack-upload skill (S3 bucket)
 
 **What:** Configure S3 bucket for image hosting. One-time setup for visual PR annotations.

diff --git a/investigate/SKILL.md b/investigate/SKILL.md
@@ -313,6 +313,63 @@ plan's living status.
 
 # Systematic Debugging
 
+## Session Memory
+
+Before starting the investigation, initialize synthetic memory:
+
+```bash
+eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
+bash ~/.claude/skills/gstack/scripts/init-memory.sh
+```
+
+1. Read `~/.gstack/projects/$SLUG/state.md` — if a previous skill left state, note it
+2. Read `~/.gstack/projects/$SLUG/handoff.md` if it exists — incorporate prior context
+3. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` for any existing findings
+4. **Search `.gstack/anti-patterns.md`** for patterns matching the reported bug — if a
+   match is found, skip that approach and note: "Skipping — tried before (AP{NNN})"
+5. Update `state.md`: skill=investigate, phase=collecting_symptoms, turn=0
+
+Follow the Synthetic Memory Protocol from `lib/memory.md` throughout this investigation.
+Specifically:
+- Write EVERY finding to `~/.gstack/projects/$SLUG/findings-$BRANCH.md` IMMEDIATELY upon discovery
+- Run a CHECKPOINT every 5 tool calls
+- Log EVERY user decision to `.gstack/decisions.log`
+- On completion, write `~/.gstack/projects/$SLUG/handoff.md` for the next skill
+
+### Hypothesis Tracking
+
+Maintain a hypothesis log in `~/.gstack/projects/$SLUG/findings-$BRANCH.md` using this format:
+
+```markdown
+### H{N} — {description}
+- **Status:** testing | confirmed | disproven
+- **Evidence:** {what was tried and what happened}
+- **Tested:** {timestamp}
+```
+
+Before proposing a fix, read `findings-$BRANCH.md` hypotheses — do NOT
+re-test a hypothesis that was already disproven. This is critical because
+after compaction you may forget you already tried something.
+
+### Fix Attempt Log
+
+The Iron Law says stop after 3 failed fixes. Track them in `findings-$BRANCH.md`:
+
+```markdown
+### Fix Attempt {N} — {hypothesis}
+- **Action:** {what was changed}
+- **Result:** FAILED | SUCCESS
+- **Timestamp:** {timestamp}
+```
+
+Before each fix attempt, read `findings-$BRANCH.md` to count previous attempts.
+After compaction, you WILL forget how many fixes you've tried. The file knows.
+
+When a fix attempt fails, also append to `.gstack/anti-patterns.md` so future
+sessions never re-attempt the same approach.
+
+---
+
 ## Iron Law
 
 **NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.**
@@ -462,9 +519,21 @@ Status:          DONE | DONE_WITH_CONCERNS | BLOCKED
 
 ---
 
+## Investigation Completion
+
+Before presenting the debug report:
+1. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` to get the complete findings, hypotheses, and fix attempts
+2. Write `~/.gstack/projects/$SLUG/handoff.md` with the investigation summary, root cause, and fix details
+3. Present the report based on the FILES, not your memory
+
+The debug report MUST match what's in `findings-$BRANCH.md`. If your memory
+of hypotheses tested or fix attempts differs from the file, the file is correct.
+
+---
+
 ## Important Rules
 
-- **3+ failed fix attempts → STOP and question the architecture.** Wrong architecture, not failed hypothesis.
+- **3+ failed fix attempts → STOP and question the architecture.** Wrong architecture, not failed hypothesis. Read `findings-$BRANCH.md` fix attempts to verify the count — do NOT rely on memory.
 - **Never apply a fix you cannot verify.** If you can't reproduce and confirm, don't ship it.
 - **Never say "this should fix it."** Verify and prove it. Run the tests.
 - **If fix touches >5 files → AskUserQuestion** about blast radius before proceeding.

diff --git a/investigate/SKILL.md.tmpl b/investigate/SKILL.md.tmpl
@@ -36,6 +36,63 @@ hooks:
 
 # Systematic Debugging
 
+## Session Memory
+
+Before starting the investigation, initialize synthetic memory:
+
+```bash
+eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
+bash ~/.claude/skills/gstack/scripts/init-memory.sh
+```
+
+1. Read `~/.gstack/projects/$SLUG/state.md` — if a previous skill left state, note it
+2. Read `~/.gstack/projects/$SLUG/handoff.md` if it exists — incorporate prior context
+3. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` for any existing findings
+4. **Search `.gstack/anti-patterns.md`** for patterns matching the reported bug — if a
+   match is found, skip that approach and note: "Skipping — tried before (AP{NNN})"
+5. Update `state.md`: skill=investigate, phase=collecting_symptoms, turn=0
+
+Follow the Synthetic Memory Protocol from `lib/memory.md` throughout this investigation.
+Specifically:
+- Write EVERY finding to `~/.gstack/projects/$SLUG/findings-$BRANCH.md` IMMEDIATELY upon discovery
+- Run a CHECKPOINT every 5 tool calls
+- Log EVERY user decision to `.gstack/decisions.log`
+- On completion, write `~/.gstack/projects/$SLUG/handoff.md` for the next skill
+
+### Hypothesis Tracking
+
+Maintain a hypothesis log in `~/.gstack/projects/$SLUG/findings-$BRANCH.md` using this format:
+
+```markdown
+### H{N} — {description}
+- **Status:** testing | confirmed | disproven
+- **Evidence:** {what was tried and what happened}
+- **Tested:** {timestamp}
+```
+
+Before proposing a fix, read `findings-$BRANCH.md` hypotheses — do NOT
+re-test a hypothesis that was already disproven. This is critical because
+after compaction you may forget you already tried something.
+
+### Fix Attempt Log
+
+The Iron Law says stop after 3 failed fixes. Track them in `findings-$BRANCH.md`:
+
+```markdown
+### Fix Attempt {N} — {hypothesis}
+- **Action:** {what was changed}
+- **Result:** FAILED | SUCCESS
+- **Timestamp:** {timestamp}
+```
+
+Before each fix attempt, read `findings-$BRANCH.md` to count previous attempts.
+After compaction, you WILL forget how many fixes you've tried. The file knows.
+
+When a fix attempt fails, also append to `.gstack/anti-patterns.md` so future
+sessions never re-attempt the same approach.
+
+---
+
 ## Iron Law
 
 **NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.**
@@ -185,9 +242,21 @@ Status:          DONE | DONE_WITH_CONCERNS | BLOCKED
 
 ---
 
+## Investigation Completion
+
+Before presenting the debug report:
+1. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` to get the complete findings, hypotheses, and fix attempts
+2. Write `~/.gstack/projects/$SLUG/handoff.md` with the investigation summary, root cause, and fix details
+3. Present the report based on the FILES, not your memory
+
+The debug report MUST match what's in `findings-$BRANCH.md`. If your memory
+of hypotheses tested or fix attempts differs from the file, the file is correct.
+
+---
+
 ## Important Rules
 
-- **3+ failed fix attempts → STOP and question the architecture.** Wrong architecture, not failed hypothesis.
+- **3+ failed fix attempts → STOP and question the architecture.** Wrong architecture, not failed hypothesis. Read `findings-$BRANCH.md` fix attempts to verify the count — do NOT rely on memory.
 - **Never apply a fix you cannot verify.** If you can't reproduce and confirm, don't ship it.
 - **Never say "this should fix it."** Verify and prove it. Run the tests.
 - **If fix touches >5 files → AskUserQuestion** about blast radius before proceeding.

diff --git a/qa/SKILL.md b/qa/SKILL.md
@@ -324,6 +324,40 @@ branch name wherever the instructions say "the base branch."
 
 You are a QA engineer AND a bug-fix engineer. Test web applications like a real user — click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.
 
+## Session Memory
+
+Before starting QA, initialize synthetic memory:
+
+```bash
+eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
+bash ~/.claude/skills/gstack/scripts/init-memory.sh
+```
+
+1. Read `~/.gstack/projects/$SLUG/state.md` — if a previous skill left state, note it
+2. Read `~/.gstack/projects/$SLUG/handoff.md` if it exists — incorporate prior context
+3. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` for any existing findings
+4. Update `state.md`: skill=qa, phase=setup, turn=0
+
+Follow the Synthetic Memory Protocol from `lib/memory.md` throughout this QA session.
+Specifically:
+- Write EVERY finding to `~/.gstack/projects/$SLUG/findings-$BRANCH.md` IMMEDIATELY upon discovery
+- Run a CHECKPOINT every 5 tool calls
+- Log EVERY user decision to `.gstack/decisions.log`
+- On completion, write `~/.gstack/projects/$SLUG/handoff.md` for the next skill
+
+### Screenshot Memory
+
+Browser screenshots consume enormous context and are aggressively compacted.
+After EVERY screenshot that reveals a bug or important state:
+
+1. Extract the key observation into text IMMEDIATELY
+2. Write it to `~/.gstack/projects/$SLUG/findings-$BRANCH.md` if it's a bug
+
+NEVER rely on "I saw in the screenshot earlier that..." — if you didn't write
+it down, you don't know it. The screenshot may have been compacted away.
+
+---
+
 ## Setup
 
 **Parse the user's request for these parameters:**
@@ -925,6 +959,12 @@ $B snapshot -D
 - **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service)
 - **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred"
 
+When a fix is classified, update the finding in `~/.gstack/projects/$SLUG/findings-$BRANCH.md` with:
+- Status: RESOLVED (or DEFERRED if reverted)
+- Resolution: description of fix
+- Commit SHA (if fixed)
+Update `~/.gstack/projects/$SLUG/state.md` accordingly.
+
 ### 8e.5. Regression Test
 
 Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap.
@@ -973,6 +1013,7 @@ Use auto-incrementing names to avoid collisions: check existing `{name}.regressi
 
 **4. Evaluate:**
 - Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"`
+  Update the finding in `~/.gstack/projects/$SLUG/findings-$BRANCH.md` with: `Regression test: {test file and test name}`
 - Fails → fix test once. Still failing → delete test, defer.
 - Taking >2 min exploration → skip and defer.
 
@@ -1046,6 +1087,18 @@ If the repo has a `TODOS.md`:
 
 ---
 
+## QA Completion
+
+Before presenting the final QA report:
+1. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` to get the complete findings list
+2. Write `~/.gstack/projects/$SLUG/handoff.md` with the full QA summary, unresolved findings, and recommendations
+4. Present the report based on the FILES, not your memory
+
+The final report MUST match what's in `~/.gstack/projects/$SLUG/findings-$BRANCH.md`. If your memory
+of the findings differs from the file, the file is correct.
+
+---
+
 ## Additional Rules (qa-specific)
 
 11. **Clean working tree required.** If dirty, use AskUserQuestion to offer commit/stash/abort before proceeding.