Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Any code change must either adhere to our spec files perfectly or you should ask
| `src/jarvis/utils/location.spec.md` | GeoIP location detection | Privacy-first; local GeoLite2 DB only |
| `src/jarvis/memory/graph.spec.md` | Node graph memory (v2), self-organising tree, UI explorer | Dynamic structure; access-aware; auto-split/merge (future) |
| `src/jarvis/memory/summariser.spec.md` | Diary summariser prompt contract and hygiene rules (deflection, attribution, topic separation) | Summariser is the source; corrupted summaries poison every downstream consumer |
| `src/jarvis/memory/recall_gate.spec.md` | Deterministic skip-enrichment heuristic when the hot window covers a follow-up | Fail-open; language-agnostic via `\w{3,}` + `re.UNICODE`; planner intent always wins |

The LLM contexts graph at `docs/llm_contexts.md` maps every LLM call in the app (model, gating, inputs, outputs, limits, flow). Keep it up-to-date at all times: any change that adds, removes, or alters an LLM context (model resolution, timeout, cap, prompt source, gating flag, data-flow edge) must update `docs/llm_contexts.md` in the same PR.

Expand Down
16 changes: 14 additions & 2 deletions docs/llm_contexts.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ Every distinct LLM call in Jarvis, what feeds it, what consumes it, and how it i
- **Model / gating**: `cfg.ollama_chat_model` (the big model). Not optional. No size branching on the loop itself — size branching affects the digests/evaluator around it.
- **Inputs**:
- Redacted user query
- Recent dialogue (last 5 minutes)
- Recent dialogue (last 5 minutes), including in-loop tool-call + tool-role messages from prior replies within the active conversation (tool carryover, `DialogueMemory.record_tool_turn` / `get_recent_turns_with_tools` in [src/jarvis/memory/conversation.py](src/jarvis/memory/conversation.py); per-prompt cap via `cfg.tool_carryover_max_turns` / `tool_carryover_per_entry_chars`; storage cap `_tool_turns_max_storage = 16`; cleared on `stop` signal AND on new-conversation entry; UNTRUSTED WEB EXTRACT fence markers preserved on truncation; both `content` and `tool_calls[*].function.arguments` scrubbed on write)
- Unified system prompt from [src/jarvis/system_prompt.py](src/jarvis/system_prompt.py) + ASR note + tool-protocol guidance
- **Warm profile block** (query-agnostic User + Directives excerpt from the knowledge graph, composed by `build_warm_profile()` / `format_warm_profile_block()` in [src/jarvis/memory/graph_ops.py](src/jarvis/memory/graph_ops.py) at Step 3.5 of `reply()`; no LLM call, pure SQLite read; injected unconditionally so personalisation is the default)
- **Warm profile block** (query-agnostic User + Directives excerpt from the knowledge graph, composed by `build_warm_profile()` / `format_warm_profile_block()` in [src/jarvis/memory/graph_ops.py](src/jarvis/memory/graph_ops.py) at Step 3.5 of `reply()`; no LLM call, pure SQLite read; injected unconditionally so personalisation is the default; result cached in `DialogueMemory._hot_cache` under `DialogueMemory.WARM_PROFILE_CACHE_KEY` for the lifetime of the active conversation. Invalidated on `stop`, on new-conversation entry, AND on User/Directives graph mutations via the listener registered in [src/jarvis/daemon.py](src/jarvis/daemon.py) against `register_graph_mutation_listener` in [src/jarvis/memory/graph.py](src/jarvis/memory/graph.py); World-branch writes are ignored)
- Digested memory enrichment (optional, see #4)
- Time + location context (re-injected each turn)
- Tool schema: native via `generate_tools_json_schema()` ([src/jarvis/tools/registry.py](src/jarvis/tools/registry.py)) or text fallback via `_text_tool_call_guidance()` ([engine.py:68](src/jarvis/reply/engine.py:68))
Expand Down Expand Up @@ -44,6 +44,17 @@ Every distinct LLM call in Jarvis, what feeds it, what consumes it, and how it i
- **System prompt**: inline at [enrichment.py:35-63](src/jarvis/reply/enrichment.py:35).
- **Output**: `{keywords, from?, to?, questions?}`. Consumed by memory search in the reply engine.
- **Limits**: up to 2 retries; timeout from `llm_tools_timeout_sec`.
- **Caching**: result cached in `DialogueMemory._hot_cache` under key `enrichment:{redacted_query[+topic_hint]}` for the lifetime of the active conversation. Identical follow-ups within the same conversation reuse the dict and skip the LLM hop. Cleared by `clear_hot_cache()` on the `stop` signal and on new-conversation entry.

## 3b. Recall Gate (pre-enrichment short-circuit)

- **File**: [src/jarvis/memory/recall_gate.py](src/jarvis/memory/recall_gate.py) — `should_recall()`.
- **Trigger**: once per reply, before diary/graph/digest enrichment runs (after the planner has decided memory is potentially needed).
- **Model / gating**: NO LLM — deterministic keyword-coverage heuristic. Cheap.
- **Inputs**: query, recent dialogue (incl. tool carryover rows).
- **Output**: `False` only if hot-window contains a fresh tool result AND ≥50% of the query's content words appear in the hot-window transcript → skips diary, graph, and memory digest for this reply. Else `True`. Fail-open on any exception. Content-word extraction uses `\w{3,}` with `re.UNICODE`, so the gate works for Latin, Cyrillic, CJK, Arabic, Hebrew, etc. (per CLAUDE.md "no hardcoded language patterns"). Overlap words are run through `redact()` before being written to debug logs.
- **Planner precedence**: when the planner explicitly emitted a `searchMemory` step, the gate is bypassed — the planner has more signal than coverage and overriding it would silently drop intent. The gate only short-circuits the fail-open empty-plan path.
- **Rationale**: prevents re-running diary/graph lookups when the hot window already grounds the follow-up (e.g. "his most famous song" after a Bieber webSearch).

## 4. Memory Digest (optional, SMALL models)

Expand Down Expand Up @@ -84,6 +95,7 @@ Every distinct LLM call in Jarvis, what feeds it, what consumes it, and how it i
- **System prompt**: inline (~lines 260-315). Teaches pick up-to-5 tools or `none`.
- **Output**: comma-separated tool names or `none`. Capped at `_LLM_MAX_SELECTED` (5). Always-included tools (`stop`, `toolSearchTool`) are unioned in regardless.
- **Limits**: `llm_timeout_sec`. On failure → all tools.
- **Caching**: `routed_tools` cached in `DialogueMemory._hot_cache` under key `router:{redacted_query}|{strategy}|{builtin-names}|{mcp-names}` for the lifetime of the active conversation. The catalogue signature lets a mid-conversation MCP refresh invalidate the cache; `context_hint` is intentionally excluded so time/location drift inside one conversation doesn't bust it. Cleared by `clear_hot_cache()` on the `stop` signal and on new-conversation entry.

## 8. Tool Searcher (mid-loop escape hatch)

Expand Down
6 changes: 6 additions & 0 deletions src/desktop_app/settings_window.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,12 @@ def f(key, label, desc, cat, ftype, **kw):
f("memory_enrichment_source", "Enrichment Source",
"Which memory system enriches replies: all (diary + graph), diary only, or graph only",
"memory", "choice", choices=[("diary", "Diary only"), ("graph", "Graph only"), ("all", "All (diary + graph)")])
f("tool_carryover_max_turns", "Tool Carryover Turns",
"How many prior replies' tool results to keep visible for follow-up questions",
"memory", "int", min_val=0, max_val=10)
f("tool_carryover_per_entry_chars", "Tool Carryover Length",
"Chars kept per carried-over tool result (UNTRUSTED fence markers preserved)",
"memory", "int", min_val=200, max_val=8000, step=100)
f("agentic_max_turns", "Agentic Max Turns",
"Maximum turns in agentic tool-use loops",
"memory", "int", min_val=1, max_val=30)
Expand Down
14 changes: 14 additions & 0 deletions src/jarvis/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,13 @@ class Settings:
dialogue_memory_timeout: float
memory_enrichment_max_results: int
memory_enrichment_source: str # "all", "diary", or "graph"
# Tool-call + tool-result messages from prior replies in the hot window
# are re-injected into the next turn so follow-ups can reuse them instead
# of re-fetching. These knobs cap how many prior tool turns survive and
# how much of each tool payload is retained (the fence markers of
# UNTRUSTED WEB EXTRACT blocks are preserved on truncation).
tool_carryover_max_turns: int
tool_carryover_per_entry_chars: int
# Distil diary + graph into a short relevance-filtered note via a cheap
# LLM pass before injecting into the reply system prompt. When None
# (the default), it auto-enables for SMALL models (≤7B) and stays off
Expand Down Expand Up @@ -470,6 +477,9 @@ def get_default_config() -> Dict[str, Any]:
"dialogue_memory_timeout": 300.0,
"memory_enrichment_max_results": 3,
"memory_enrichment_source": "all", # "all", "diary", or "graph"
# Tool carryover: cap re-injected prior tool turns + chars per entry.
"tool_carryover_max_turns": 2,
"tool_carryover_per_entry_chars": 1200,
# None = auto (on for small models ≤7B, off for large). Set true/false to force.
"memory_digest_enabled": None,
# Distil raw tool results (e.g. webSearch extracts) into a short
Expand Down Expand Up @@ -658,6 +668,8 @@ def load_settings() -> Settings:
memory_enrichment_source = str(merged.get("memory_enrichment_source", "all")).lower()
if memory_enrichment_source not in ("all", "diary", "graph"):
memory_enrichment_source = "all"
tool_carryover_max_turns = max(0, int(merged.get("tool_carryover_max_turns", 2)))
tool_carryover_per_entry_chars = max(200, int(merged.get("tool_carryover_per_entry_chars", 1200)))
_digest_raw = merged.get("memory_digest_enabled", None)
memory_digest_enabled: Optional[bool]
if _digest_raw is None:
Expand Down Expand Up @@ -818,6 +830,8 @@ def load_settings() -> Settings:
dialogue_memory_timeout=dialogue_memory_timeout,
memory_enrichment_max_results=memory_enrichment_max_results,
memory_enrichment_source=memory_enrichment_source,
tool_carryover_max_turns=tool_carryover_max_turns,
tool_carryover_per_entry_chars=tool_carryover_per_entry_chars,
memory_digest_enabled=memory_digest_enabled,
tool_result_digest_enabled=tool_result_digest_enabled,
agentic_max_turns=agentic_max_turns,
Expand Down
69 changes: 69 additions & 0 deletions src/jarvis/daemon.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
# Global instances for coordination between modules
_global_dialogue_memory: Optional[DialogueMemory] = None
_global_stop_requested: bool = False
_warm_profile_graph_listener = None # registered callback, kept for shutdown unregister
_global_tts_engine = None # TTS engine reference for face animation polling
_global_dictation_engine = None # Dictation engine reference for history UI

Expand Down Expand Up @@ -294,6 +295,7 @@ def on_token_handler(token: str):
def main() -> None:
"""Main daemon entry point."""
global _global_dialogue_memory, _global_stop_requested, _global_tts_engine, _global_dictation_engine
global _warm_profile_graph_listener

# Reset stop flag at start (in case of restart)
_global_stop_requested = False
Expand Down Expand Up @@ -348,6 +350,60 @@ def main() -> None:
)
print("✓ Dialogue memory initialized", flush=True)

# Wire the conversation-scoped warm-profile cache to graph mutations.
# When the User or Directives branch is mutated mid-conversation, the
# cached warm profile is dropped so the next reply rebuilds it from
# the current graph state. World-branch writes (typical webSearch
# extractions) do not touch warm profile, so they are ignored.
try:
from .memory.graph import (
BRANCH_DIRECTIVES,
BRANCH_USER,
register_graph_mutation_listener,
)

_wp_relevant_branches = {BRANCH_USER, BRANCH_DIRECTIVES}

# Read the DialogueMemory ref through the module global at fire
# time, not via closure capture, so a future singleton swap (tests
# or hot-reload) routes invalidation to the live instance instead
# of the freed one.
def _invalidate_wp_on_graph_mutation(*, action, node_id, branch):
del action, node_id # Only the branch matters for warm-profile filtering.
if branch not in _wp_relevant_branches:
return
dm = _global_dialogue_memory
if dm is None:
return
try:
dm.invalidate_warm_profile()
debug_log(
f"warm profile invalidated by {branch} graph mutation",
"memory",
)
except Exception as exc:
debug_log(
f"warm profile invalidation failed (non-fatal): {exc}",
"memory",
)

# If a previous run left a listener registered (re-entry without
# full process restart), drop it before installing the new one so
# the registry never accumulates stale closures.
if _warm_profile_graph_listener is not None:
try:
from .memory.graph import unregister_graph_mutation_listener
unregister_graph_mutation_listener(_warm_profile_graph_listener)
except Exception:
pass
register_graph_mutation_listener(_invalidate_wp_on_graph_mutation)
_warm_profile_graph_listener = _invalidate_wp_on_graph_mutation
except Exception as exc:
debug_log(
f"warm profile mutation listener wiring failed (non-fatal): {exc}",
"memory",
)

# Knowledge graph: wipe + re-seed if the on-disk shape predates the
# User/Directives/World taxonomy. Non-destructive to the diary —
# users can re-import via the memory viewer.
Expand Down Expand Up @@ -567,6 +623,19 @@ def stdin_monitor():
if tts is not None:
tts.stop()
db.close()

# Drop the warm-profile graph listener so the module registry does
# not retain a closure pointing at this run's DialogueMemory after
# shutdown — relevant for tests and any embedder that re-runs the
# daemon in-process.
if _warm_profile_graph_listener is not None:
try:
from .memory.graph import unregister_graph_mutation_listener
unregister_graph_mutation_listener(_warm_profile_graph_listener)
except Exception:
pass
_warm_profile_graph_listener = None

debug_log("daemon stopped", "jarvis")
print("👋 Daemon stopped", flush=True)

Expand Down
Loading