Status:
0.1.0— early; APIs may break before 1.0.
Durable AI agent execution on Temporal using AutoAgents for LLM provider and tool abstractions.
The headline export is AgentWorkflow: a Temporal workflow that runs a
ReAct-style agent loop where every LLM call and every tool invocation is
checkpointed as a Temporal activity. If the worker crashes mid-loop, the
workflow resumes from the last completed activity without re-paying for prior
LLM tokens.
Inspired by Temporal's blog post, Of Course You Can Build Dynamic AI Agents with Temporal.
┌──────────────────────── AgentWorkflow (deterministic) ────────────────────────┐
│ │
│ while not done: │
│ ┌──────────┐ ┌─────────────┐ │
│ │ history │ ───────▶│ llm_chat │ ── LlmResponse ─┐ │
│ └──────────┘ │ (activity) │ │ │
│ └─────────────┘ ▼ │
│ ┌────────────────┐ │
│ │ Final? Tools? │ │
│ └────────────────┘ │
│ │ │ │
│ return ▼ │
│ ┌─────────────┐ │
│ │ execute_tool│ │
│ │ (activity) │ × N │
│ └─────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────────────┘
- Workflow = orchestration. Deterministic, replayable, holds the conversation history.
- Activities = the only place LLM providers and tool implementations are called. Non-deterministic, retryable, observable in the Temporal UI.
AgentWorkflowwith a ReAct loop, signals, queries, andcontinue_as_newhistory compaction.AgentActivitieswithllm_chatandexecute_tool.ToolRegistrythat accepts any AutoAgentsArc<dyn ToolT>(use the#[tool]derive macro).AgentWorkerBuilderfor one-line worker setup.- Provider-agnostic: bring your own
Arc<dyn LLMProvider>(OpenAI, Anthropic, Ollama, etc. — anything supported byautoagents_llm). - Pluggable memory backends via the
MemoryProvidertrait — defaultSlidingWindowMemorymatches the legacy hardcoded behavior; swap in custom strategies throughAgentWorkerBuilder::memory. See Pluggable memory backends. - Human-in-the-loop as a regular tool — the library does not special-case any tool name. See Human-in-the-loop tools.
- Rust ≥ 1.95 (install via rustup).
- Temporal CLI — for running the examples against a local dev server.
Install with
brew install temporalor follow the official install guide. - Docker — only required to run the integration test suite, which
spins up Temporal and Ollama containers automatically via
testcontainers. - An OpenAI-compatible API key for the examples (set
OPENAI_API_KEY; override the endpoint withOPENAI_BASE_URLto point at Ollama or any other compatible server).
use std::sync::Arc;
use temporal_agent_rs::prelude::*;
# async fn run(
# client: temporalio_client::Client,
# llm: Arc<dyn LLMProvider>,
# my_tool: Arc<dyn ToolT>,
# ) -> anyhow::Result<()> {
let runtime = temporalio_sdk::CoreRuntime::new_assume_tokio(Default::default())?;
let mut worker = AgentWorkerBuilder::new(client)
.llm(llm)
.tool(my_tool)
.queue("agents")
.build_worker(&runtime)?;
worker.run().await?;
# Ok(())
# }Starting a workflow from a client:
use temporal_agent_rs::prelude::*;
use temporalio_client::{WorkflowGetResultOptions, WorkflowStartOptions};
let handle = client.start_workflow(
AgentWorkflow::run,
AgentInput {
system_prompt: "You are a math assistant.".into(),
user_message: "What is 17.5 + 4.2?".into(),
max_turns: 5,
},
WorkflowStartOptions::new("agents", "math-1").build(),
).await?;
let out: AgentOutput = handle.get_result(WorkflowGetResultOptions::default()).await?;
println!("{}", out.final_answer);AgentWorkerBuilder::llm accepts any Arc<dyn LLMProvider>, so you can wrap
your provider with composable layers from autoagents_llm before passing it
to the worker. The prelude re-exports the relevant types:
use std::sync::Arc;
use std::time::Duration;
use temporal_agent_rs::prelude::*;
# fn build(primary: Arc<dyn LLMProvider>, fallback: Arc<dyn LLMProvider>) -> Arc<dyn LLMProvider> {
let llm: Arc<dyn LLMProvider> = PipelineBuilder::new(primary)
.add_layer(CacheLayer::new(CacheConfig {
ttl: Some(Duration::from_secs(300)),
..Default::default()
}))
.add_layer(FallbackLayer::new(vec![fallback]))
.build();
# llm
# }Layer ordering. The first-added layer is outermost. In the snippet above
a cache hit short-circuits before any network call; fallback wraps only the
primary, so it triggers exclusively on a primary error after a cache miss.
Reorder the add_layer calls and the semantics change accordingly.
Why no retry layer? autoagents_llm also ships a RetryLayer, but this
crate deliberately does not re-export it. Every LLM call already runs inside
a Temporal activity, and Temporal's activity
RetryPolicy owns
retry. Layering retry inside one activity attempt would hide failures from
Temporal's history, amplify rate-limit pressure, and not honour workflow
cancellation. Configure retry on your activity options instead. Fallback is
kept because it switches providers — orthogonal to retry and not replaceable
by it.
See examples/pipelined_math_agent for a runnable demo (add tool +
PipelineBuilder(CacheLayer → FallbackLayer) around two OpenAI models).
History compaction is governed by an Arc<dyn MemoryProvider> published
to the worker via AgentWorkerBuilder::memory. The default — used when
.memory(...) is not called — is SlidingWindowMemory with
compact_threshold = 200 and keep_recent = 20, which matches the
legacy hardcoded behavior.
use std::sync::Arc;
use temporal_agent_rs::prelude::*;
let memory: Arc<dyn MemoryProvider> = Arc::new(
SlidingWindowMemory::new()
.with_compact_threshold(50)
.with_keep_recent(10),
);
AgentWorkerBuilder::new(client)
.llm(llm)
.tool(my_tool)
.memory(memory)
.build_worker(&runtime)?;Trait contract. Implementations MUST be pure and synchronous —
should_compact and compact run inside the deterministic workflow
body and must return identical results on every replay for the same
AgentState. Per-conversation state belongs in AgentState (which
Temporal persists in workflow history), never in fields on the provider.
Multi-worker setups. Running multiple workers in the same process on
the same queue requires sharing the same Arc<dyn MemoryProvider> —
the builder fails fast (via Arc::ptr_eq) on mismatching instances to
prevent the second worker from silently inheriting the first worker's
provider while replay diverges.
See examples/tunable_memory_agent for a runnable demo of a tuned
SlidingWindowMemory plus a minimal custom MemoryProvider impl
(KeepEverythingMemory) gated behind a KEEP_EVERYTHING=1 env switch.
Five examples ship with the crate:
simple_math_agent— minimal autonomous loop with a singleaddtool.interactive_math_agent— adds anask_usertool so the agent can pause for human input on the worker's stdin.pipelined_math_agent— sameaddtool, but the provider is wrapped withPipelineBuilder → CacheLayer → FallbackLayerto demonstrate the composition pattern described above.structured_output_agent— forces a JSON-schema-shaped final answer viaAgentInput::output_schema.tunable_memory_agent— demonstrates a tunedSlidingWindowMemoryand a customMemoryProviderimpl; aggressive thresholds makecontinue_as_newcompaction observable on a short conversation.
# Terminal 1: local Temporal dev server (install via `brew install temporal` or temporal.io)
temporal server start-dev
# Simple autonomous agent — single `add` tool, no human-in-the-loop.
# Terminal 2:
OPENAI_API_KEY=sk-... cargo run --example simple_math_agent -- worker
# Terminal 3:
cargo run --example simple_math_agent -- client
# Same workflow, but the agent can pause to ask the user for missing info.
# The worker terminal also accepts typed answers on stdin.
OPENAI_API_KEY=sk-... cargo run --example interactive_math_agent -- worker
cargo run --example interactive_math_agent -- client
# Same workflow with a cache+fallback pipeline around two OpenAI models.
# Override OPENAI_MODEL_PRIMARY / OPENAI_MODEL_FALLBACK to demo fallback;
# run the client twice with the same prompt to observe the cache layer.
OPENAI_API_KEY=sk-... cargo run --example pipelined_math_agent -- worker
cargo run --example pipelined_math_agent -- client
# Structured output — final answer constrained by a JSON schema.
OPENAI_API_KEY=sk-... cargo run --example structured_output_agent -- worker
cargo run --example structured_output_agent -- client
# Pluggable memory backends — aggressive SlidingWindowMemory so compaction
# fires mid-run. Use the `status` sub-command to watch history.len() and the
# "Prior conversation summary" marker appear in the system prompt.
OPENAI_API_KEY=sk-... cargo run --example tunable_memory_agent -- worker
cargo run --example tunable_memory_agent -- client
cargo run --example tunable_memory_agent -- statusThe Temporal Web UI is at http://localhost:8233. Click into the workflow to
see every llm_chat and execute_tool as a separate activity event.
To witness durability: kill the worker mid-loop (Ctrl-C in terminal 2),
restart it, and the workflow picks up from the last completed activity.
The library treats every tool uniformly — there is no built-in "ask the user"
primitive, no AskUser response variant, no awaiting_user flag baked into
the workflow state. Pause-and-wait semantics are implemented inside the
user's tool, not inside the agent loop.
When the LLM emits a tool call, the workflow dispatches it as an
execute_tool activity. If that activity's execute() blocks on a channel
waiting for an external answer, Temporal happily keeps it in-flight up to the
configured start_to_close_timeout (the library default is 1 hour;
override per-deployment if you need longer). When the answer arrives, the
tool returns it as a normal serde_json::Value. The LLM observes it on the
next llm_chat turn as a standard tool result. No special workflow code
needed; the diagram above already covers it.
Define a ToolT whose execute() publishes the question to an out-of-band
channel and awaits an answer. Three concrete delivery mechanisms, in order
of increasing production-readiness:
| Mechanism | When to use | Crash-durable? |
|---|---|---|
Stdin → in-process channel (used in examples/interactive_math_agent) |
Local dev, single-user demos | No — pending question lost on worker restart |
| HTTP / Unix socket sidecar | Multi-user UIs, multi-process clients | No — pending question lost unless persisted externally |
Temporal async activity completion (task token + client.complete_activity_…) |
Production | Yes — survives worker restarts |
The example uses the stdin variant for brevity. Production deployments should
use Temporal async activity completion: the tool persists (task_token, question) to a queue/UI, returns ActivityError::WillCompleteAsync, and an
external client completes the activity with the answer later. (This requires
the tool to access the ActivityContext, which today means writing the
activity directly rather than going through our execute_tool dispatcher — a
future library enhancement.)
use tokio::sync::broadcast;
use autoagents_derive::{tool, ToolInput};
use autoagents_core::tool::{ToolRuntime, ToolCallError};
#[derive(serde::Deserialize, ToolInput)]
struct AskUserArgs {
#[input(description = "The question to put to the user")]
question: String,
}
#[tool(
name = "ask_user",
description = "Ask the human user a follow-up question. The agent will \
pause until the user replies.",
input = AskUserArgs
)]
#[derive(Clone)]
struct AskUserTool {
answers: broadcast::Sender<String>,
}
#[async_trait::async_trait]
impl ToolRuntime for AskUserTool {
async fn execute(&self, args: serde_json::Value)
-> Result<serde_json::Value, ToolCallError>
{
let parsed: AskUserArgs = serde_json::from_value(args)?;
println!(">>> AGENT ASKS: {}", parsed.question);
let mut rx = self.answers.subscribe();
let answer = rx.recv().await.map_err(|e| {
ToolCallError::RuntimeError(Box::new(std::io::Error::new(
std::io::ErrorKind::BrokenPipe,
e.to_string(),
)))
})?;
Ok(serde_json::json!({ "answer": answer }))
}
}Register it like any other tool:
let (answer_tx, _) = broadcast::channel::<String>(16);
// spawn a stdin reader (or HTTP listener, etc.) that publishes to answer_tx
AgentWorkerBuilder::new(client)
.llm(llm)
.tool(Arc::new(MyComputeTool))
.tool(Arc::new(AskUserTool::new(answer_tx)))
.queue("agents")
.build_worker(&runtime)?;The default start_to_close_timeout for tool activities is set generously
(1 hour) so that human-in-the-loop tools don't trip the timeout. Tools that
complete quickly are unaffected. See
src/workflow.rs (tool_opts) to tweak it.
With the in-process answer mechanisms (stdin, local socket), if the worker
process crashes while a question is pending, the answer channel state is
lost. Temporal will retry the execute_tool activity on the new worker; the
tool will reprint the question and ask again. For full crash durability,
use the Temporal async activity completion approach.
When you write tools and provider configs:
- Tools must be side-effect-safe-on-retry by default. Tool errors are reported back to the LLM, not retried by Temporal, but infrastructure errors do retry up to 3 times.
- The LLM provider must be
Send + Sync + 'static.Arc<dyn LLMProvider>already satisfies this for AutoAgents' built-in providers. - Never call your
LLMProvideror yourToolTfrom inside workflow code. The workflow holds tools by name; the only path to invocation is theexecute_toolactivity. MemoryProviderimpls must be pure, sync, and stateless (config only) —should_compactandcompactrun inside the workflow body and must return identical results on replay for the sameAgentState. Keep conversation state inAgentState, never on the provider.
| Crate | Version |
|---|---|
temporalio-sdk |
0.4.x (prerelease) |
autoagents |
0.3.x |
| Rust edition | 2024 |
| MSRV | 1.95 |
The Temporal Rust SDK is prerelease; API breaks are expected on minor
version bumps. This crate pins to 0.4.x for now.
See CONTRIBUTING.md for setup, build/test commands, and PR conventions. By participating you agree to abide by our Code of Conduct.
See the Releases page for per-version notes auto-generated from merged PRs.
Please report vulnerabilities privately — see SECURITY.md.
MIT