temporal-agent-rs

Status: 0.1.0 — early; APIs may break before 1.0.

Durable AI agent execution on Temporal using AutoAgents for LLM provider and tool abstractions.

The headline export is AgentWorkflow: a Temporal workflow that runs a ReAct-style agent loop where every LLM call and every tool invocation is checkpointed as a Temporal activity. If the worker crashes mid-loop, the workflow resumes from the last completed activity without re-paying for prior LLM tokens.

Inspired by Temporal's blog post, Of Course You Can Build Dynamic AI Agents with Temporal.

Architecture

┌──────────────────────── AgentWorkflow (deterministic) ────────────────────────┐
│                                                                                │
│   while not done:                                                              │
│      ┌──────────┐         ┌─────────────┐                                      │
│      │ history  │ ───────▶│  llm_chat   │ ── LlmResponse ─┐                    │
│      └──────────┘         │  (activity) │                 │                    │
│                           └─────────────┘                 ▼                    │
│                                                  ┌────────────────┐            │
│                                                  │ Final?  Tools? │            │
│                                                  └────────────────┘            │
│                                                       │       │                │
│                                                  return       ▼                │
│                                                          ┌─────────────┐       │
│                                                          │ execute_tool│       │
│                                                          │  (activity) │ × N   │
│                                                          └─────────────┘       │
│                                                                                │
└────────────────────────────────────────────────────────────────────────────────┘

Workflow = orchestration. Deterministic, replayable, holds the conversation history.
Activities = the only place LLM providers and tool implementations are called. Non-deterministic, retryable, observable in the Temporal UI.

Features

AgentWorkflow with a ReAct loop, signals, queries, and continue_as_new history compaction.
AgentActivities with llm_chat and execute_tool.
ToolRegistry that accepts any AutoAgents Arc<dyn ToolT> (use the #[tool] derive macro).
AgentWorkerBuilder for one-line worker setup.
Provider-agnostic: bring your own Arc<dyn LLMProvider> (OpenAI, Anthropic, Ollama, etc. — anything supported by autoagents_llm).
Pluggable memory backends via the MemoryProvider trait — default SlidingWindowMemory matches the legacy hardcoded behavior; swap in custom strategies through AgentWorkerBuilder::memory. See Pluggable memory backends.
Human-in-the-loop as a regular tool — the library does not special-case any tool name. See Human-in-the-loop tools.

Prerequisites

Rust ≥ 1.95 (install via rustup).
Temporal CLI — for running the examples against a local dev server. Install with brew install temporal or follow the official install guide.
Docker — only required to run the integration test suite, which spins up Temporal and Ollama containers automatically via testcontainers.
An OpenAI-compatible API key for the examples (set OPENAI_API_KEY; override the endpoint with OPENAI_BASE_URL to point at Ollama or any other compatible server).

Quick start

use std::sync::Arc;
use temporal_agent_rs::prelude::*;

# async fn run(
#     client: temporalio_client::Client,
#     llm: Arc<dyn LLMProvider>,
#     my_tool: Arc<dyn ToolT>,
# ) -> anyhow::Result<()> {
let runtime = temporalio_sdk::CoreRuntime::new_assume_tokio(Default::default())?;
let mut worker = AgentWorkerBuilder::new(client)
    .llm(llm)
    .tool(my_tool)
    .queue("agents")
    .build_worker(&runtime)?;
worker.run().await?;
# Ok(())
# }

Starting a workflow from a client:

use temporal_agent_rs::prelude::*;
use temporalio_client::{WorkflowGetResultOptions, WorkflowStartOptions};

let handle = client.start_workflow(
    AgentWorkflow::run,
    AgentInput {
        system_prompt: "You are a math assistant.".into(),
        user_message: "What is 17.5 + 4.2?".into(),
        max_turns: 5,
    },
    WorkflowStartOptions::new("agents", "math-1").build(),
).await?;

let out: AgentOutput = handle.get_result(WorkflowGetResultOptions::default()).await?;
println!("{}", out.final_answer);

Production primitives: caching and fallback

AgentWorkerBuilder::llm accepts any Arc<dyn LLMProvider>, so you can wrap your provider with composable layers from autoagents_llm before passing it to the worker. The prelude re-exports the relevant types:

use std::sync::Arc;
use std::time::Duration;
use temporal_agent_rs::prelude::*;

# fn build(primary: Arc<dyn LLMProvider>, fallback: Arc<dyn LLMProvider>) -> Arc<dyn LLMProvider> {
let llm: Arc<dyn LLMProvider> = PipelineBuilder::new(primary)
    .add_layer(CacheLayer::new(CacheConfig {
        ttl: Some(Duration::from_secs(300)),
        ..Default::default()
    }))
    .add_layer(FallbackLayer::new(vec![fallback]))
    .build();
# llm
# }

Layer ordering. The first-added layer is outermost. In the snippet above a cache hit short-circuits before any network call; fallback wraps only the primary, so it triggers exclusively on a primary error after a cache miss. Reorder the add_layer calls and the semantics change accordingly.

Why no retry layer? autoagents_llm also ships a RetryLayer, but this crate deliberately does not re-export it. Every LLM call already runs inside a Temporal activity, and Temporal's activity RetryPolicy owns retry. Layering retry inside one activity attempt would hide failures from Temporal's history, amplify rate-limit pressure, and not honour workflow cancellation. Configure retry on your activity options instead. Fallback is kept because it switches providers — orthogonal to retry and not replaceable by it.

See examples/pipelined_math_agent for a runnable demo (add tool + PipelineBuilder(CacheLayer → FallbackLayer) around two OpenAI models).

Pluggable memory backends

History compaction is governed by an Arc<dyn MemoryProvider> published to the worker via AgentWorkerBuilder::memory. The default — used when .memory(...) is not called — is SlidingWindowMemory with compact_threshold = 200 and keep_recent = 20, which matches the legacy hardcoded behavior.

use std::sync::Arc;
use temporal_agent_rs::prelude::*;

let memory: Arc<dyn MemoryProvider> = Arc::new(
    SlidingWindowMemory::new()
        .with_compact_threshold(50)
        .with_keep_recent(10),
);

AgentWorkerBuilder::new(client)
    .llm(llm)
    .tool(my_tool)
    .memory(memory)
    .build_worker(&runtime)?;

Trait contract. Implementations MUST be pure and synchronous — should_compact and compact run inside the deterministic workflow body and must return identical results on every replay for the same AgentState. Per-conversation state belongs in AgentState (which Temporal persists in workflow history), never in fields on the provider.

Multi-worker setups. Running multiple workers in the same process on the same queue requires sharing the same Arc<dyn MemoryProvider> — the builder fails fast (via Arc::ptr_eq) on mismatching instances to prevent the second worker from silently inheriting the first worker's provider while replay diverges.

See examples/tunable_memory_agent for a runnable demo of a tuned SlidingWindowMemory plus a minimal custom MemoryProvider impl (KeepEverythingMemory) gated behind a KEEP_EVERYTHING=1 env switch.

Running the examples

Five examples ship with the crate:

simple_math_agent — minimal autonomous loop with a single add tool.
interactive_math_agent — adds an ask_user tool so the agent can pause for human input on the worker's stdin.
pipelined_math_agent — same add tool, but the provider is wrapped with PipelineBuilder → CacheLayer → FallbackLayer to demonstrate the composition pattern described above.
structured_output_agent — forces a JSON-schema-shaped final answer via AgentInput::output_schema.
tunable_memory_agent — demonstrates a tuned SlidingWindowMemory and a custom MemoryProvider impl; aggressive thresholds make continue_as_new compaction observable on a short conversation.

# Terminal 1: local Temporal dev server (install via `brew install temporal` or temporal.io)
temporal server start-dev

# Simple autonomous agent — single `add` tool, no human-in-the-loop.
# Terminal 2:
OPENAI_API_KEY=sk-... cargo run --example simple_math_agent -- worker
# Terminal 3:
cargo run --example simple_math_agent -- client

# Same workflow, but the agent can pause to ask the user for missing info.
# The worker terminal also accepts typed answers on stdin.
OPENAI_API_KEY=sk-... cargo run --example interactive_math_agent -- worker
cargo run --example interactive_math_agent -- client

# Same workflow with a cache+fallback pipeline around two OpenAI models.
# Override OPENAI_MODEL_PRIMARY / OPENAI_MODEL_FALLBACK to demo fallback;
# run the client twice with the same prompt to observe the cache layer.
OPENAI_API_KEY=sk-... cargo run --example pipelined_math_agent -- worker
cargo run --example pipelined_math_agent -- client

# Structured output — final answer constrained by a JSON schema.
OPENAI_API_KEY=sk-... cargo run --example structured_output_agent -- worker
cargo run --example structured_output_agent -- client

# Pluggable memory backends — aggressive SlidingWindowMemory so compaction
# fires mid-run. Use the `status` sub-command to watch history.len() and the
# "Prior conversation summary" marker appear in the system prompt.
OPENAI_API_KEY=sk-... cargo run --example tunable_memory_agent -- worker
cargo run --example tunable_memory_agent -- client
cargo run --example tunable_memory_agent -- status

The Temporal Web UI is at http://localhost:8233. Click into the workflow to see every llm_chat and execute_tool as a separate activity event.

To witness durability: kill the worker mid-loop (Ctrl-C in terminal 2), restart it, and the workflow picks up from the last completed activity.

Human-in-the-loop tools

The library treats every tool uniformly — there is no built-in "ask the user" primitive, no AskUser response variant, no awaiting_user flag baked into the workflow state. Pause-and-wait semantics are implemented inside the user's tool, not inside the agent loop.

Why this works without library special-casing

When the LLM emits a tool call, the workflow dispatches it as an execute_tool activity. If that activity's execute() blocks on a channel waiting for an external answer, Temporal happily keeps it in-flight up to the configured start_to_close_timeout (the library default is 1 hour; override per-deployment if you need longer). When the answer arrives, the tool returns it as a normal serde_json::Value. The LLM observes it on the next llm_chat turn as a standard tool result. No special workflow code needed; the diagram above already covers it.

The pattern

Define a ToolT whose execute() publishes the question to an out-of-band channel and awaits an answer. Three concrete delivery mechanisms, in order of increasing production-readiness:

Mechanism	When to use	Crash-durable?
Stdin → in-process channel (used in `examples/interactive_math_agent`)	Local dev, single-user demos	No — pending question lost on worker restart
HTTP / Unix socket sidecar	Multi-user UIs, multi-process clients	No — pending question lost unless persisted externally
Temporal async activity completion (task token + `client.complete_activity_…`)	Production	Yes — survives worker restarts

The example uses the stdin variant for brevity. Production deployments should use Temporal async activity completion: the tool persists (task_token, question) to a queue/UI, returns ActivityError::WillCompleteAsync, and an external client completes the activity with the answer later. (This requires the tool to access the ActivityContext, which today means writing the activity directly rather than going through our execute_tool dispatcher — a future library enhancement.)

Tool-side snippet (from the example)

use tokio::sync::broadcast;
use autoagents_derive::{tool, ToolInput};
use autoagents_core::tool::{ToolRuntime, ToolCallError};

#[derive(serde::Deserialize, ToolInput)]
struct AskUserArgs {
    #[input(description = "The question to put to the user")]
    question: String,
}

#[tool(
    name = "ask_user",
    description = "Ask the human user a follow-up question. The agent will \
                   pause until the user replies.",
    input = AskUserArgs
)]
#[derive(Clone)]
struct AskUserTool {
    answers: broadcast::Sender<String>,
}

#[async_trait::async_trait]
impl ToolRuntime for AskUserTool {
    async fn execute(&self, args: serde_json::Value)
        -> Result<serde_json::Value, ToolCallError>
    {
        let parsed: AskUserArgs = serde_json::from_value(args)?;
        println!(">>> AGENT ASKS: {}", parsed.question);
        let mut rx = self.answers.subscribe();
        let answer = rx.recv().await.map_err(|e| {
            ToolCallError::RuntimeError(Box::new(std::io::Error::new(
                std::io::ErrorKind::BrokenPipe,
                e.to_string(),
            )))
        })?;
        Ok(serde_json::json!({ "answer": answer }))
    }
}

Register it like any other tool:

let (answer_tx, _) = broadcast::channel::<String>(16);
// spawn a stdin reader (or HTTP listener, etc.) that publishes to answer_tx
AgentWorkerBuilder::new(client)
    .llm(llm)
    .tool(Arc::new(MyComputeTool))
    .tool(Arc::new(AskUserTool::new(answer_tx)))
    .queue("agents")
    .build_worker(&runtime)?;

Activity timeout

The default start_to_close_timeout for tool activities is set generously (1 hour) so that human-in-the-loop tools don't trip the timeout. Tools that complete quickly are unaffected. See src/workflow.rs (tool_opts) to tweak it.

Trade-off note

With the in-process answer mechanisms (stdin, local socket), if the worker process crashes while a question is pending, the answer channel state is lost. Temporal will retry the execute_tool activity on the new worker; the tool will reprint the question and ask again. For full crash durability, use the Temporal async activity completion approach.

Determinism contract for users

When you write tools and provider configs:

Tools must be side-effect-safe-on-retry by default. Tool errors are reported back to the LLM, not retried by Temporal, but infrastructure errors do retry up to 3 times.
The LLM provider must be Send + Sync + 'static. Arc<dyn LLMProvider> already satisfies this for AutoAgents' built-in providers.
Never call your LLMProvider or your ToolT from inside workflow code. The workflow holds tools by name; the only path to invocation is the execute_tool activity.
MemoryProvider impls must be pure, sync, and stateless (config only) — should_compact and compact run inside the workflow body and must return identical results on replay for the same AgentState. Keep conversation state in AgentState, never on the provider.

Version compatibility

Crate	Version
`temporalio-sdk`	`0.4.x` (prerelease)
`autoagents`	`0.3.x`
Rust edition	2024
MSRV	1.95

The Temporal Rust SDK is prerelease; API breaks are expected on minor version bumps. This crate pins to 0.4.x for now.

Contributing

See CONTRIBUTING.md for setup, build/test commands, and PR conventions. By participating you agree to abide by our Code of Conduct.

Changelog

See the Releases page for per-version notes auto-generated from merged PRs.

Security

Please report vulnerabilities privately — see SECURITY.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
examples		examples
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.txt		LICENSE.txt
README.md		README.md
RELEASING.md		RELEASING.md
SECURITY.md		SECURITY.md
deny.toml		deny.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

temporal-agent-rs

Architecture

Features

Prerequisites

Quick start

Production primitives: caching and fallback

Pluggable memory backends

Running the examples

Human-in-the-loop tools

Why this works without library special-casing

The pattern

Tool-side snippet (from the example)

Activity timeout

Trade-off note

Determinism contract for users

Version compatibility

Contributing

Changelog

Security

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

temporal-agent-rs

Architecture

Features

Prerequisites

Quick start

Production primitives: caching and fallback

Pluggable memory backends

Running the examples

Human-in-the-loop tools

Why this works without library special-casing

The pattern

Tool-side snippet (from the example)

Activity timeout

Trade-off note

Determinism contract for users

Version compatibility

Contributing

Changelog

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages