Skip to content
Open
27 changes: 27 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,33 @@ The `EvalCollector` accumulates test results and writes them in two ways:

Tier 1 runs on every `bun test`. Tiers 2+3 are gated behind `EVALS=1`. The idea: catch 95% of issues for free, use LLMs only for judgment calls and integration testing.

## Synthetic memory

Long-running skills (`/review`, `/qa`, `/investigate`) accumulate findings, decisions, and session state that exceed the context window. When Claude compacts earlier messages, critical details are silently lost — specific findings get summarized away, user decisions are forgotten, and the agent re-tests hypotheses it already disproved.

The synthetic memory layer uses two storage locations — session state is private, team knowledge is shareable:

```
~/.gstack/projects/$SLUG/ ← private, per-user session state
├── state.md ← skill, phase, turn (plain markdown)
├── findings-$BRANCH.md ← branch-scoped finding registry
├── handoff.md ← inter-skill context transfer
└── $BRANCH-reviews.jsonl ← upstream review/ship logs (unchanged)

.gstack/ ← repo-level, optionally committed
├── decisions.log ← append-only user decision log
└── anti-patterns.md ← failed fixes (never re-attempt)
```

**Key design decisions:**
- **Two layers.** Session state (ephemeral, single-user) lives in `~/.gstack/` alongside upstream's existing JSONL. Team knowledge (decisions, anti-patterns) lives in `.gstack/` where teams can optionally commit it.
- **Markdown everywhere.** Claude writes markdown reliably; JSON with arrays unreliably. A corrupted markdown line doesn't break the file. A corrupted JSON bracket does.
- **Branch-scoped findings.** `findings-feat-auth.md` and `findings-feat-payments.md` don't interfere. Uses the same `$SLUG/$BRANCH` scoping as upstream's review logs.
- **Checkpoint = print, not copy.** Every 5 tool calls, re-read files and print status to re-inject state. No file snapshots — the value is in the context injection.
- **Anti-patterns from PR #403.** Failed fix attempts are recorded so future `/investigate` sessions never re-attempt the same broken approach.

The protocol is defined in `lib/memory.md` and included by reference in each skill's SKILL.md.tmpl. Scripts in `scripts/` handle initialization, status display, and reset.

## What's intentionally not here

- **No WebSocket streaming.** HTTP request/response is simpler, debuggable with curl, and fast enough. Streaming would add complexity for marginal benefit.
Expand Down
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Changelog

## [0.11.19.0] - 2026-03-25 — Synthetic Memory

### Added

- **Skills now remember what they found — even when the context window forgets.** During long `/review`, `/qa`, and `/investigate` sessions, Claude's context window silently compresses away details like specific findings, user decisions, and what's been checked. Now, every finding is written to `.gstack/findings.md` immediately, every decision goes to `.gstack/decisions.log`, and a checkpoint re-syncs state every 5 tool calls. The files are the source of truth, not memory.
- **`/ship` blocks on unresolved P0 findings.** If `/review` or `/qa` found a critical bug and it wasn't fixed, `/ship` will catch it and block — even if the finding was discovered in a previous session.
- **Skills hand off context to each other.** When `/review` finishes, it writes a summary to `.gstack/handoff.md` that `/qa` picks up automatically. No more "I already told you about those bugs in the review."
- **`/investigate` tracks hypotheses and fix attempts on disk.** After context compaction, the agent won't re-test a hypothesis it already disproved or forget how many fixes it's tried. The Iron Law's "3 failed fixes" rule now actually works across compaction boundaries.
- **`/retro` incorporates findings patterns.** Weekly retrospectives now include systemic issue detection from accumulated findings (e.g., "4 SQL injection findings across sessions — consider a linter rule").
- **New utility scripts:** `gstack-status` shows current session state at a glance, `gstack-reset` archives and clears memory for a fresh start.

## [0.11.18.2] - 2026-03-24

### Fixed
Expand Down
26 changes: 26 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,12 @@ gstack/
├── retro/ # Retrospective skill (includes /retro global cross-project mode)
├── bin/ # CLI utilities (gstack-repo-mode, gstack-slug, gstack-config, etc.)
├── document-release/ # /document-release skill (post-ship doc updates)
├── lib/ # Shared instruction fragments for skill templates
│ └── memory.md # Synthetic memory protocol (included by reference)
├── cso/ # /cso skill (OWASP Top 10 + STRIDE security audit)
├── design-consultation/ # /design-consultation skill (design system from scratch)
├── setup-deploy/ # /setup-deploy skill (one-time deploy config)
├── scripts/ # Build + DX tooling (also init-memory.sh, gstack-status.sh, gstack-reset.sh)
├── .github/ # CI workflows + Docker image
│ ├── workflows/ # evals.yml (E2E on Ubicloud), skill-docs.yml, actionlint.yml
│ └── docker/ # Dockerfile.ci (pre-baked toolchain + Playwright/Chromium)
Expand Down Expand Up @@ -307,3 +310,26 @@ The active skill lives at `~/.claude/skills/gstack/`. After making changes:
3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`

Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`

## gstack Synthetic Memory

gstack uses two storage layers for file-backed memory that survives context
window compaction during long sessions:

**Session state** (`~/.gstack/projects/$SLUG/`, private, per-user):
- `state.md` — current skill, phase, turn count (plain markdown)
- `findings-$BRANCH.md` — branch-scoped finding registry (source of truth)
- `handoff.md` — skill-to-skill context transfer (deleted after consumption)

**Team knowledge** (`.gstack/` in project root, optionally committed):
- `decisions.log` — append-only user decision audit trail
- `anti-patterns.md` — failed fix attempts that should never be re-tried

Key rules:
- `findings-$BRANCH.md` is the source of truth for all findings — not conversation
- Skills run checkpoints every 5 tool calls to re-inject state into context
- `/ship` reads findings and blocks on unresolved P0 issues
- `/investigate` searches anti-patterns before attempting any fix

Skills auto-initialize via `scripts/init-memory.sh`. Session state uses the same
`$SLUG/$BRANCH` scoping as upstream's review JSONL. The full protocol is in `lib/memory.md`.
12 changes: 12 additions & 0 deletions TODOS.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,18 @@ Linux cookie import shipped in v0.11.11.0 (Wave 3). Supports Chrome, Chromium, B

## Infrastructure

### Add touchfiles for synthetic memory E2E tests

**What:** Update `test/helpers/touchfiles.ts` so changes to `lib/memory.md`, `scripts/init-memory.sh`, or `scripts/gstack-*.sh` trigger relevant E2E tests (qa-*, review-*, investigate-*).

**Why:** Currently, changes to memory protocol files don't trigger any E2E tests via diff-based selection. A regression in memory.md wording could silently break skill behavior.

**Context:** Added as part of the synthetic memory layer (v0.9.5.0). Deferred because memory-specific E2E tests don't exist yet — adding touchfiles without corresponding tests would just trigger unrelated E2E runs.

**Effort:** S
**Priority:** P3
**Depends on:** Memory-specific E2E test cases

### /setup-gstack-upload skill (S3 bucket)

**What:** Configure S3 bucket for image hosting. One-time setup for visual PR annotations.
Expand Down
71 changes: 70 additions & 1 deletion investigate/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,63 @@ plan's living status.

# Systematic Debugging

## Session Memory

Before starting the investigation, initialize synthetic memory:

```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
bash ~/.claude/skills/gstack/scripts/init-memory.sh
```

1. Read `~/.gstack/projects/$SLUG/state.md` — if a previous skill left state, note it
2. Read `~/.gstack/projects/$SLUG/handoff.md` if it exists — incorporate prior context
3. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` for any existing findings
4. **Search `.gstack/anti-patterns.md`** for patterns matching the reported bug — if a
match is found, skip that approach and note: "Skipping — tried before (AP{NNN})"
5. Update `state.md`: skill=investigate, phase=collecting_symptoms, turn=0

Follow the Synthetic Memory Protocol from `lib/memory.md` throughout this investigation.
Specifically:
- Write EVERY finding to `~/.gstack/projects/$SLUG/findings-$BRANCH.md` IMMEDIATELY upon discovery
- Run a CHECKPOINT every 5 tool calls
- Log EVERY user decision to `.gstack/decisions.log`
- On completion, write `~/.gstack/projects/$SLUG/handoff.md` for the next skill

### Hypothesis Tracking

Maintain a hypothesis log in `~/.gstack/projects/$SLUG/findings-$BRANCH.md` using this format:

```markdown
### H{N} — {description}
- **Status:** testing | confirmed | disproven
- **Evidence:** {what was tried and what happened}
- **Tested:** {timestamp}
```

Before proposing a fix, read `findings-$BRANCH.md` hypotheses — do NOT
re-test a hypothesis that was already disproven. This is critical because
after compaction you may forget you already tried something.

### Fix Attempt Log

The Iron Law says stop after 3 failed fixes. Track them in `findings-$BRANCH.md`:

```markdown
### Fix Attempt {N} — {hypothesis}
- **Action:** {what was changed}
- **Result:** FAILED | SUCCESS
- **Timestamp:** {timestamp}
```

Before each fix attempt, read `findings-$BRANCH.md` to count previous attempts.
After compaction, you WILL forget how many fixes you've tried. The file knows.

When a fix attempt fails, also append to `.gstack/anti-patterns.md` so future
sessions never re-attempt the same approach.

---

## Iron Law

**NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.**
Expand Down Expand Up @@ -462,9 +519,21 @@ Status: DONE | DONE_WITH_CONCERNS | BLOCKED

---

## Investigation Completion

Before presenting the debug report:
1. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` to get the complete findings, hypotheses, and fix attempts
2. Write `~/.gstack/projects/$SLUG/handoff.md` with the investigation summary, root cause, and fix details
3. Present the report based on the FILES, not your memory

The debug report MUST match what's in `findings-$BRANCH.md`. If your memory
of hypotheses tested or fix attempts differs from the file, the file is correct.

---

## Important Rules

- **3+ failed fix attempts → STOP and question the architecture.** Wrong architecture, not failed hypothesis.
- **3+ failed fix attempts → STOP and question the architecture.** Wrong architecture, not failed hypothesis. Read `findings-$BRANCH.md` fix attempts to verify the count — do NOT rely on memory.
- **Never apply a fix you cannot verify.** If you can't reproduce and confirm, don't ship it.
- **Never say "this should fix it."** Verify and prove it. Run the tests.
- **If fix touches >5 files → AskUserQuestion** about blast radius before proceeding.
Expand Down
71 changes: 70 additions & 1 deletion investigate/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,63 @@ hooks:

# Systematic Debugging

## Session Memory

Before starting the investigation, initialize synthetic memory:

```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
bash ~/.claude/skills/gstack/scripts/init-memory.sh
```

1. Read `~/.gstack/projects/$SLUG/state.md` — if a previous skill left state, note it
2. Read `~/.gstack/projects/$SLUG/handoff.md` if it exists — incorporate prior context
3. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` for any existing findings
4. **Search `.gstack/anti-patterns.md`** for patterns matching the reported bug — if a
match is found, skip that approach and note: "Skipping — tried before (AP{NNN})"
5. Update `state.md`: skill=investigate, phase=collecting_symptoms, turn=0

Follow the Synthetic Memory Protocol from `lib/memory.md` throughout this investigation.
Specifically:
- Write EVERY finding to `~/.gstack/projects/$SLUG/findings-$BRANCH.md` IMMEDIATELY upon discovery
- Run a CHECKPOINT every 5 tool calls
- Log EVERY user decision to `.gstack/decisions.log`
- On completion, write `~/.gstack/projects/$SLUG/handoff.md` for the next skill

### Hypothesis Tracking

Maintain a hypothesis log in `~/.gstack/projects/$SLUG/findings-$BRANCH.md` using this format:

```markdown
### H{N} — {description}
- **Status:** testing | confirmed | disproven
- **Evidence:** {what was tried and what happened}
- **Tested:** {timestamp}
```

Before proposing a fix, read `findings-$BRANCH.md` hypotheses — do NOT
re-test a hypothesis that was already disproven. This is critical because
after compaction you may forget you already tried something.

### Fix Attempt Log

The Iron Law says stop after 3 failed fixes. Track them in `findings-$BRANCH.md`:

```markdown
### Fix Attempt {N} — {hypothesis}
- **Action:** {what was changed}
- **Result:** FAILED | SUCCESS
- **Timestamp:** {timestamp}
```

Before each fix attempt, read `findings-$BRANCH.md` to count previous attempts.
After compaction, you WILL forget how many fixes you've tried. The file knows.

When a fix attempt fails, also append to `.gstack/anti-patterns.md` so future
sessions never re-attempt the same approach.

---

## Iron Law

**NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.**
Expand Down Expand Up @@ -185,9 +242,21 @@ Status: DONE | DONE_WITH_CONCERNS | BLOCKED

---

## Investigation Completion

Before presenting the debug report:
1. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` to get the complete findings, hypotheses, and fix attempts
2. Write `~/.gstack/projects/$SLUG/handoff.md` with the investigation summary, root cause, and fix details
3. Present the report based on the FILES, not your memory

The debug report MUST match what's in `findings-$BRANCH.md`. If your memory
of hypotheses tested or fix attempts differs from the file, the file is correct.

---

## Important Rules

- **3+ failed fix attempts → STOP and question the architecture.** Wrong architecture, not failed hypothesis.
- **3+ failed fix attempts → STOP and question the architecture.** Wrong architecture, not failed hypothesis. Read `findings-$BRANCH.md` fix attempts to verify the count — do NOT rely on memory.
- **Never apply a fix you cannot verify.** If you can't reproduce and confirm, don't ship it.
- **Never say "this should fix it."** Verify and prove it. Run the tests.
- **If fix touches >5 files → AskUserQuestion** about blast radius before proceeding.
Expand Down
53 changes: 53 additions & 0 deletions qa/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,40 @@ branch name wherever the instructions say "the base branch."

You are a QA engineer AND a bug-fix engineer. Test web applications like a real user — click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.

## Session Memory

Before starting QA, initialize synthetic memory:

```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
bash ~/.claude/skills/gstack/scripts/init-memory.sh
```

1. Read `~/.gstack/projects/$SLUG/state.md` — if a previous skill left state, note it
2. Read `~/.gstack/projects/$SLUG/handoff.md` if it exists — incorporate prior context
3. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` for any existing findings
4. Update `state.md`: skill=qa, phase=setup, turn=0

Follow the Synthetic Memory Protocol from `lib/memory.md` throughout this QA session.
Specifically:
- Write EVERY finding to `~/.gstack/projects/$SLUG/findings-$BRANCH.md` IMMEDIATELY upon discovery
- Run a CHECKPOINT every 5 tool calls
- Log EVERY user decision to `.gstack/decisions.log`
- On completion, write `~/.gstack/projects/$SLUG/handoff.md` for the next skill

### Screenshot Memory

Browser screenshots consume enormous context and are aggressively compacted.
After EVERY screenshot that reveals a bug or important state:

1. Extract the key observation into text IMMEDIATELY
2. Write it to `~/.gstack/projects/$SLUG/findings-$BRANCH.md` if it's a bug

NEVER rely on "I saw in the screenshot earlier that..." — if you didn't write
it down, you don't know it. The screenshot may have been compacted away.

---

## Setup

**Parse the user's request for these parameters:**
Expand Down Expand Up @@ -925,6 +959,12 @@ $B snapshot -D
- **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service)
- **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred"

When a fix is classified, update the finding in `~/.gstack/projects/$SLUG/findings-$BRANCH.md` with:
- Status: RESOLVED (or DEFERRED if reverted)
- Resolution: description of fix
- Commit SHA (if fixed)
Update `~/.gstack/projects/$SLUG/state.md` accordingly.

### 8e.5. Regression Test

Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap.
Expand Down Expand Up @@ -973,6 +1013,7 @@ Use auto-incrementing names to avoid collisions: check existing `{name}.regressi

**4. Evaluate:**
- Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"`
Update the finding in `~/.gstack/projects/$SLUG/findings-$BRANCH.md` with: `Regression test: {test file and test name}`
- Fails → fix test once. Still failing → delete test, defer.
- Taking >2 min exploration → skip and defer.

Expand Down Expand Up @@ -1046,6 +1087,18 @@ If the repo has a `TODOS.md`:

---

## QA Completion

Before presenting the final QA report:
1. Read `~/.gstack/projects/$SLUG/findings-$BRANCH.md` to get the complete findings list
2. Write `~/.gstack/projects/$SLUG/handoff.md` with the full QA summary, unresolved findings, and recommendations
4. Present the report based on the FILES, not your memory

The final report MUST match what's in `~/.gstack/projects/$SLUG/findings-$BRANCH.md`. If your memory
of the findings differs from the file, the file is correct.

---

## Additional Rules (qa-specific)

11. **Clean working tree required.** If dirty, use AskUserQuestion to offer commit/stash/abort before proceeding.
Expand Down
Loading