fix(memory): deterministic deflection scrub for diary writes + bulk clean button by isair · Pull Request #323 · isair/jarvis

isair · 2026-05-02T18:13:34Z

Summary

Closes #259.

The summariser prompt's deflection rule (added in #232) reduces but does not eliminate "the assistant could not / offered to search / did not have…" leaks on small models. Field measurement on a real diary showed roughly 40% of post-rule writes on gemma4:e2b still contained banned phrasing, and pre-rule rows had no way to be cleaned. Once recalled by enrichment, those entries prime the next reply to repeat the deflection.

This PR brings the diary up to parity with the knowledge graph's two-layer defence (extractor BANNED FACT FORMS at write-time, deterministic merge-time rewrite for historical data) by:

Write-time scrub — scrub_deflection_sentences() runs on every diary write and drops whole sentences matching narrow patterns. Keeps the original if scrubbing would empty the row; an empty diary entry is worse than a slightly-leaky one. Idempotent.
Bulk sweep clean button — scrub_all_diary_summaries() walks every row in conversation_summaries. Surfaced via POST /api/diary/scrub-deflections (NDJSON streaming) and a "Clean up deflection narration" button under a new Maintenance section in the diary sidebar. Preserves the row's original ts_utc so the audit trail survives the sweep, and refreshes the vector embedding inline if an embed model is configured.
Privacy contract — the streaming progress endpoint emits only counts and dates, never raw summary text. The error value is the exception class name only (e.g. "RuntimeError"), never the stringified message, because Python exception messages can echo offending input back to the caller. The progress event shape is locked behind a whitelist regression test so any future field addition forces deliberate review.
Modal copy is explicit — tells users exactly what is removed (deflection sentences) and what stays (everything else, including the row itself if it would otherwise be emptied), so the clean button does not feel like a silent data-loss action. Modal log entries are constructed with textContent on real DOM nodes so a malformed date_utc or error class name cannot inject markup.

PR #281's caching scope is intentionally left untouched per the original brief.

Why this approach (vs. enrichment-side filter)

#259 originally proposed (A) summariser hygiene + (B) enrichment-side filter. With (A) hardened to a two-layer defence (prompt + scrub) and the bulk sweep cleaning historical poisoning, (B) becomes unnecessary by construction: there is no poisoned content left at recall time.

Field-driven design

The pattern set is intentionally narrow (English-first, "the assistant <failure verb>" canonical shape). False positives erase real content, which poisons the diary in a different way (silent fact loss). Precision over recall: a few residual leaks are survivable, an erased preference is not. The summariser prompt rule itself remains language-agnostic.

Field bug caught during review

Initial commit wired the diary maintenance click handler inside initGraph(), which only runs when the user opens the Knowledge tab. The diary tab is the default — meaning a user who clicked the button without ever visiting Knowledge first got no response. Fixed in the second commit (handler now lives on the always-run page setup path) and a regression test asserts the wiring is structurally outside initGraph() so a future refactor cannot reintroduce the bug.

Test plan

30 unit tests for the scrub function (tests/test_diary_deflection_scrub.py): pattern coverage across 13 phrase variants including the stated… and indicated… branches, edge cases (empty, idempotence, total-deflection, multi-line summaries with \n, summaries without terminal punctuation), and a write-path integration test that proves the scrub fires before the row is persisted.
9 DB-integration tests for the bulk sweep (tests/test_diary_scrub_sweep.py): every row visited, idempotent, fail-open per row, no diary content in event payloads, total-deflection rows kept rather than emptied, ts_utc preservation, error field contains exception class name only, embedding refresh behaviour without an embed model.
5 endpoint tests (tests/test_memory_viewer_diary_scrub_api.py): NDJSON streaming, write-back, privacy regression asserting diary content cannot leak via the streaming UI, progress-event key whitelist locking the shape, aggregate-count contract, regression test that the click handler is wired outside initGraph().
Live eval (evals/test_diary_summariser_hygiene.py): defence-in-depth check that the post-scrub output is clean even when the live LLM leaks.
373 tests pass across the diary/memory/graph/engine surface. Pre-existing flake on tests/test_dialogue_memory.py::test_update_diary_preserves_new_messages_during_slow_llm reproduces on develop with no changes — not introduced by this PR.
Memory viewer HTML structure verified end-to-end via Flask test client (button id, label, modal copy, endpoint wiring, CSS class, handler placement outside initGraph(), XSS-hardened log construction).

Spec / docs

src/jarvis/memory/summariser.spec.md: post-process scrub, bulk-sweep UI, privacy contract, audit-trail and embedding-refresh contracts documented alongside the existing rules; eval/regression-guard table extended.
docs/llm_contexts.md: summariser data flow updated to note the scrub between LLM output and DB.
CLAUDE.md spec registry: summariser entry refreshed to reflect the two-layer defence.
README.md: troubleshooting note for users hitting poisoned diary rows pointing at the new clean button.

…lean button The summariser prompt rule against narrating assistant failures (added in #232) reduces but does not eliminate the leak on small models. Field measurement on a real diary showed roughly 40% of post-rule writes on gemma4:e2b still contained banned phrasing, and pre-rule rows remained poisoned with no way to clean them. Adds a deterministic safety net mirroring the knowledge graph's two-layer defence (extractor BANNED FACT FORMS at write-time + merge-time rewrite for historical data): - `scrub_deflection_sentences()`: drops whole sentences matching narrow regex patterns ("the assistant could not...", "offered to search...", "lacks access...", etc.). Keeps the original if scrubbing would empty the row; an empty diary entry is worse than a slightly-leaky one. Idempotent. - Wired into `update_daily_conversation_summary` so every new write is cleaned before it lands in `conversation_summaries`. - `scrub_all_diary_summaries()`: bulk sweep over every row; same semantics, fail-open per row, streams privacy-safe events (counts only, never raw summary text). - `POST /api/diary/scrub-deflections`: NDJSON-streaming endpoint backed by the bulk sweep. - Memory viewer: "Clean up deflection narration" button under a new Maintenance section in the diary sidebar. Modal copy is explicit about what is removed and what stays, so users do not worry about silent data loss. Spec updates: `summariser.spec.md` documents the post-process layer + UI; `docs/llm_contexts.md` notes the scrub in the summariser data flow; CLAUDE.md spec registry refreshed. Tests: 30 unit tests covering scrub patterns, edge cases (empty input, total-deflection rows, idempotence), DB integration of the sweep, streaming endpoint contract, and a privacy regression that asserts diary content cannot leak through the streaming UI. Eval extended with a defence-in-depth check that asserts post-scrub output is clean even when the live LLM leaks. Closes #259 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ntics Field bug: clicking "Clean up deflection narration" did nothing. The handler was wired inside `initGraph()`, which only runs when the user opens the Knowledge tab. Diary is the default tab, so a user who never visited Knowledge first never had the click handler attached. Wires the handler in the always-run page setup section. Adds a regression test that asserts the wiring is structurally outside `initGraph()` so a future refactor cannot reintroduce the bug. While here, addresses the verified findings from the multi-agent review: - Preserve `ts_utc` on bulk-sweep writes. `db.upsert_conversation_summary` gains an optional `ts_utc` parameter; the sweep passes through the row's original write time so a maintenance pass cannot stomp the audit trail (every cleaned row would otherwise look as though it had been written today). - Refresh the vector embedding inline when the bulk sweep rewrites a row. Without this the embedding stays anchored to the pre-scrub text and vector search drifts from FTS results. Best-effort: an embedding service failure is logged but does not roll back the summary write. When the caller has no embed model configured (offline-only setup), the sweep skips re-embedding and reports `embedding_refreshed=False`. - Surface only the exception class name through the streaming UI's `error` field (and the modal's status copy on terminal errors). A stringified Python exception can echo offending input back to the caller; class names cannot. - Construct modal log entries with `textContent` on a real DOM node rather than `innerHTML +=`. Defence-in-depth: even with the privacy whitelist, no path should be able to inject markup into the modal log. - Close the database connection in `finally` so a mid-iteration exception cannot orphan it until GC. - Lock the streaming progress event shape behind a whitelist test (`test_progress_event_keys_are_a_known_whitelist`). Any future field added to the sweep's event dict that could carry summary text now trips this test, forcing deliberate review. Adds tests for the `stated…` and `indicated…` regex branches that had zero coverage, multi-line summaries with `\n` between sentences, and summaries without terminal punctuation. README troubleshooting entry for users hitting poisoned diary rows; spec updated with the audit-trail and embedding-refresh contracts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Audit triggered by the hot-cache surface introduced in #281: does the deflection scrub need a listener-style invalidation hook similar to the graph's invalidate_warm_profile? Verified the answer is no, and pinned the contract so a future change that starts caching diary content has to opt in deliberately: - DialogueMemory's hot cache holds three values: warm_profile_block (graph-derived, not diary), the per-query router decision, and the per-query memory-extractor parameters. None contain diary text. - search_conversation_memory_by_keywords reads SQLite live on every enrichment-bearing turn; the engine never stashes diary entries across turns. - Concurrency between the sweep and an in-flight reply is handled by SQLite WAL: separate Database instances on different connections, WAL serialises writes against reads at the file level. Spec gains a "Cache invariant" paragraph documenting the rule and the counterpart relationship to the graph's listener pattern. Regression test seeds a row, populates the engine's hot cache with extractor params, runs the scrub without invalidating the cache, and asserts the follow-up search returns cleaned content. If a future change adds a diary-content cache without an invalidation path, this test still passes while the user observes stale results, so the spec note is the load-bearing piece; the test catches the obvious regression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

isair and others added 3 commits May 2, 2026 19:12

isair merged commit 04ec7d5 into develop May 2, 2026
2 checks passed

isair deleted the fix/diary-deflection-scrub branch May 2, 2026 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(memory): deterministic deflection scrub for diary writes + bulk clean button#323

fix(memory): deterministic deflection scrub for diary writes + bulk clean button#323
isair merged 3 commits intodevelopfrom
fix/diary-deflection-scrub

isair commented May 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

isair commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this approach (vs. enrichment-side filter)

Field-driven design

Field bug caught during review

Test plan

Spec / docs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

isair commented May 2, 2026 •

edited

Loading