triplecloudtech · frisbeeman · May 18, 2026 · May 17, 2026 · May 18, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -38,22 +38,23 @@ Override the LLM endpoint with `OPENAI_BASE_URL` (defaults to `https://api.opena
 
 **Module map:**
 - [src/lib.rs](src/lib.rs) — module re-exports.
-- [src/builder.rs](src/builder.rs) — `AgentWorkerBuilder` fluent builder; wires LLM + tools into a Temporal `Worker`.
+- [src/builder.rs](src/builder.rs) — `AgentWorkerBuilder` fluent builder; wires LLM + tools + memory provider into a Temporal `Worker`.
 - [src/workflow.rs](src/workflow.rs) — `AgentWorkflow` with `#[run]`, `#[signal] add_user_message`, `#[query] get_state`, `#[query] turn_count`. Owns the ReAct loop.
 - [src/activities.rs](src/activities.rs) — `AgentActivities::llm_chat` and `AgentActivities::execute_tool`. The *only* place LLM providers and tool implementations execute.
 - [src/llm.rs](src/llm.rs) — translation between local `Message`/`ToolSchema` types and AutoAgents `ChatMessage`/`LlmTool`; native-tool-call parsing with fenced-JSON fallback. The only file that touches `autoagents_llm` types in the hot path (`src/llm.rs:6`).
-- [src/state.rs](src/state.rs) — `AgentInput`, `AgentOutput`, `AgentState`, `Message`, `ToolCall`, `ToolResult`, `ToolSchema`, `LlmResponse`, `StopReason`, plus `compact()`.
+- [src/state.rs](src/state.rs) — `AgentInput`, `AgentOutput`, `AgentState`, `Message`, `ToolCall`, `ToolResult`, `ToolSchema`, `LlmResponse`, `StopReason`.
+- [src/memory.rs](src/memory.rs) — `MemoryProvider` trait, default `SlidingWindowMemory` impl, and the `compact_sliding_window` kernel. Pluggable compaction strategy consulted by the workflow before every turn.
 - [src/tool.rs](src/tool.rs) — `ToolRegistry` (immutable name→impl map) and its builder.
 - [src/error.rs](src/error.rs) — `AgentError` with `is_retryable()` to distinguish transient vs. permanent.
-- [src/prelude.rs](src/prelude.rs) — convenience re-exports including AutoAgents traits (`ToolT`, `LLMProvider`, `ToolRuntime`, `ToolCallError`).
+- [src/prelude.rs](src/prelude.rs) — convenience re-exports including AutoAgents traits (`ToolT`, `LLMProvider`, `ToolRuntime`, `ToolCallError`) and memory types (`MemoryProvider`, `SlidingWindowMemory`).
 
-**Public API surface (what a user actually touches):** `AgentWorkerBuilder`, `AgentWorkflow`, `AgentInput`/`AgentOutput`, `ToolRegistry`. Users supply their own `Arc<dyn LLMProvider>` and `Arc<dyn ToolT>` from AutoAgents.
+**Public API surface (what a user actually touches):** `AgentWorkerBuilder`, `AgentWorkflow`, `AgentInput`/`AgentOutput`, `ToolRegistry`, `MemoryProvider`/`SlidingWindowMemory`. Users supply their own `Arc<dyn LLMProvider>` and `Arc<dyn ToolT>` from AutoAgents.
 
 **Non-obvious behaviors to preserve when editing:**
 
-- **History compaction.** When `AgentState::history.len()` exceeds `CONTINUE_AS_NEW_THRESHOLD = 200` (`src/workflow.rs:36`), the workflow calls `continue_as_new` with a compacted state: summary prepended to the system prompt, last 20 messages kept (`src/state.rs::compact`). Any change to the message shape needs to round-trip through `compact()`.
+- **History compaction is pluggable.** The workflow consults `MemoryProvider::should_compact` before every turn; on `true` it calls `MemoryProvider::compact` and `continue_as_new` with the returned `AgentInput`. Default provider is `SlidingWindowMemory` (`compact_threshold = 200`, `keep_recent = 20`), preserving the legacy hardcoded behavior. Override via `AgentWorkerBuilder::memory(Arc::new(SlidingWindowMemory::new().with_compact_threshold(N).with_keep_recent(K)))` or supply your own `Arc<dyn MemoryProvider>`. Trait impls MUST be pure and sync — they run inside the deterministic workflow body. The kernel summarizer lives at `src/memory.rs::compact_sliding_window`; any change to the `Message` shape needs to round-trip through it.
 - **Tool error semantics.** Tool-side failures return `Ok(ToolResult { error: Some(...) })` so the LLM can see and recover from them (`src/activities.rs:59-88`). Only infrastructure errors (missing tool, serde failure) surface as activity `Err`, which Temporal retries.
-- **`WORKER_TOOL_CATALOG`.** A process-global `OnceCell` set once at worker init in `build_worker` (`src/builder.rs:34`). The deterministic workflow body reads it on every replay, so it must be set before the worker starts and never mutated after.
+- **Process-global worker config (`WORKER_TOOL_CATALOG`, `WORKER_MEMORY`).** Two `OnceCell`s in `src/builder.rs` published by `build_worker`. The deterministic workflow body reads them on every replay, so they must be set before the worker starts and never mutated after. Building a second worker in the same process with a *different* catalog (compared by `PartialEq`) or a different memory `Arc` (compared by `Arc::ptr_eq`) returns `AgentError::Other` — multi-worker setups in one process must share the same `Arc<dyn MemoryProvider>` and register identical tools in the same order.
 - **Activity timeouts.** Set inside `AgentWorkflow::run` at `src/workflow.rs:66-74`: LLM activity 120s start-to-close / 30s heartbeat, tool activity **3600s** start-to-close (generous on purpose — supports human-in-the-loop tools that block on stdin/HTTP/async-completion).
 - **Mid-conversation user input.** The `add_user_message` signal pushes into `pending_user_messages`, drained at the top of each loop iteration (`src/workflow.rs:145-152`). Don't mutate `history` directly from signal handlers — that races with the in-flight `llm_chat` activity.
 - **Dual LLM response parsing.** `src/llm.rs` tries native tool calls first, then falls back to a fenced `\`\`\`tool_calls` JSON block so non-OpenAI providers still work.
@@ -62,6 +63,34 @@ Override the LLM endpoint with `OPENAI_BASE_URL` (defaults to `https://api.opena
 - Tools must be side-effect-safe on retry.
 - `LLMProvider` and `ToolT` impls must be `Send + Sync + 'static`.
 - Never invoke `LLMProvider` or `ToolT` from workflow code — only from activities.
+- `MemoryProvider` impls must be pure, sync, and stateless (config-only) — `should_compact` and `compact` are called inside the workflow body and must return identical results on replay for the same `AgentState`.
+
+## Documentation maintenance
+
+After any change large enough to alter the public API surface, observable
+behavior, defaults, or feature set, update the user-facing docs in the same
+PR — stale docs are worse than no docs because they actively mislead.
+Specifically:
+
+- **[AGENTS.md](AGENTS.md)** (this file) — update the module map, public API
+  surface line, non-obvious behaviors, and determinism contract whenever any
+  of them change. Add new module entries here as soon as you create them.
+- **[README.md](README.md)** — update the features list, examples list,
+  user-facing determinism contract, and any code snippets affected by API
+  changes. If you add a feature with its own knobs (caching, fallback,
+  memory backends, etc.), give it a short dedicated section like
+  "Pluggable memory backends" so users can find it without reading the
+  source.
+- **[examples/](examples/)** — when adding a new top-level feature, ship
+  one runnable example that exercises it (model the new example on
+  `simple_math_agent` — same worker/client/status mode template, same
+  three-terminal flow). Register it in `Cargo.toml` under a new
+  `[[example]]` entry and add a one-line description plus a runnable
+  invocation to README.md's "Running the examples" block.
+
+Rule of thumb: if you touched `src/lib.rs` re-exports, `src/prelude.rs`, or
+`AgentWorkerBuilder`'s public API, you owe at least one edit to each of the
+three above.
 
 ## Version pins
 

diff --git a/Cargo.toml b/Cargo.toml
@@ -70,3 +70,7 @@ path = "examples/pipelined_math_agent/main.rs"
 [[example]]
 name = "structured_output_agent"
 path = "examples/structured_output_agent/main.rs"
+
+[[example]]
+name = "tunable_memory_agent"
+path = "examples/tunable_memory_agent/main.rs"
diff --git a/README.md b/README.md
@@ -59,6 +59,10 @@ LLM tokens.
 - `AgentWorkerBuilder` for one-line worker setup.
 - Provider-agnostic: bring your own `Arc<dyn LLMProvider>` (OpenAI,
   Anthropic, Ollama, etc. — anything supported by `autoagents_llm`).
+- **Pluggable memory backends** via the `MemoryProvider` trait — default
+  `SlidingWindowMemory` matches the legacy hardcoded behavior; swap in
+  custom strategies through `AgentWorkerBuilder::memory`. See
+  [Pluggable memory backends](#pluggable-memory-backends).
 - **Human-in-the-loop as a regular tool** — the library does not
   special-case any tool name. See
   [Human-in-the-loop tools](#human-in-the-loop-tools).
@@ -159,16 +163,62 @@ by it.
 See `examples/pipelined_math_agent` for a runnable demo (`add` tool +
 `PipelineBuilder(CacheLayer → FallbackLayer)` around two OpenAI models).
 
+## Pluggable memory backends
+
+History compaction is governed by an `Arc<dyn MemoryProvider>` published
+to the worker via `AgentWorkerBuilder::memory`. The default — used when
+`.memory(...)` is not called — is `SlidingWindowMemory` with
+`compact_threshold = 200` and `keep_recent = 20`, which matches the
+legacy hardcoded behavior.
+
+```rust,ignore
+use std::sync::Arc;
+use temporal_agent_rs::prelude::*;
+
+let memory: Arc<dyn MemoryProvider> = Arc::new(
+    SlidingWindowMemory::new()
+        .with_compact_threshold(50)
+        .with_keep_recent(10),
+);
+
+AgentWorkerBuilder::new(client)
+    .llm(llm)
+    .tool(my_tool)
+    .memory(memory)
+    .build_worker(&runtime)?;
+```
+
+**Trait contract.** Implementations MUST be pure and synchronous —
+`should_compact` and `compact` run inside the deterministic workflow
+body and must return identical results on every replay for the same
+`AgentState`. Per-conversation state belongs in `AgentState` (which
+Temporal persists in workflow history), never in fields on the provider.
+
+**Multi-worker setups.** Running multiple workers in the same process on
+the same queue requires sharing the *same* `Arc<dyn MemoryProvider>` —
+the builder fails fast (via `Arc::ptr_eq`) on mismatching instances to
+prevent the second worker from silently inheriting the first worker's
+provider while replay diverges.
+
+See `examples/tunable_memory_agent` for a runnable demo of a tuned
+`SlidingWindowMemory` plus a minimal custom `MemoryProvider` impl
+(`KeepEverythingMemory`) gated behind a `KEEP_EVERYTHING=1` env switch.
+
 ## Running the examples
 
-Three examples ship with the crate:
+Five examples ship with the crate:
 
 - `simple_math_agent` — minimal autonomous loop with a single `add` tool.
 - `interactive_math_agent` — adds an `ask_user` tool so the agent can pause
   for human input on the worker's stdin.
 - `pipelined_math_agent` — same `add` tool, but the provider is wrapped with
   `PipelineBuilder → CacheLayer → FallbackLayer` to demonstrate the
   composition pattern described above.
+- `structured_output_agent` — forces a JSON-schema-shaped final answer via
+  `AgentInput::output_schema`.
+- `tunable_memory_agent` — demonstrates a tuned `SlidingWindowMemory` and a
+  custom `MemoryProvider` impl; aggressive thresholds make `continue_as_new`
+  compaction observable on a short conversation.
 
 ```bash
 # Terminal 1: local Temporal dev server (install via `brew install temporal` or temporal.io)
@@ -190,6 +240,17 @@ cargo run --example interactive_math_agent -- client
 # run the client twice with the same prompt to observe the cache layer.
 OPENAI_API_KEY=sk-... cargo run --example pipelined_math_agent -- worker
 cargo run --example pipelined_math_agent -- client
+
+# Structured output — final answer constrained by a JSON schema.
+OPENAI_API_KEY=sk-... cargo run --example structured_output_agent -- worker
+cargo run --example structured_output_agent -- client
+
+# Pluggable memory backends — aggressive SlidingWindowMemory so compaction
+# fires mid-run. Use the `status` sub-command to watch history.len() and the
+# "Prior conversation summary" marker appear in the system prompt.
+OPENAI_API_KEY=sk-... cargo run --example tunable_memory_agent -- worker
+cargo run --example tunable_memory_agent -- client
+cargo run --example tunable_memory_agent -- status
 ```
 
 The Temporal Web UI is at http://localhost:8233. Click into the workflow to
@@ -319,6 +380,10 @@ When you write tools and provider configs:
 - Never call your `LLMProvider` or your `ToolT` from inside workflow code.
   The workflow holds tools by name; the only path to invocation is the
   `execute_tool` activity.
+- `MemoryProvider` impls must be **pure, sync, and stateless** (config
+  only) — `should_compact` and `compact` run inside the workflow body and
+  must return identical results on replay for the same `AgentState`. Keep
+  conversation state in `AgentState`, never on the provider.
 
 ## Version compatibility