Enable prompt caching for agent calls (closes #611) by kiranandcode · Pull Request #612 · BasisResearch/effectful

kiranandcode · 2026-03-13T15:13:20Z

Updates completion.py such that system messages (call_system): content is now a list with cache_control: {"type": "ephemeral"} — cached across all turns.

Agent user messages (LiteLLMProvider._call): _add_cache_control_to_history() annotates the last user/tool message with cache_control before each call_assistant round — only for templates with history (i.e. Agent subclasses)
Non-agent template calls are unaffected.

OpenAI calls are unaffected (OpenAI enables caching by default), litellm strips cache_control from OpenAI requests automatically.

Cost impact: Anthropic charges 25% more for cache writes but cached reads cost 90% less.

kiranandcode · 2026-03-13T15:17:36Z

@naiimic can you try running the MARA agent with this branch? it should be a lot faster.

eb8680

What happens if we add cache_control to each message at the time it is constructed? Is that incorrect vs the final-message-only behavior here?

eb8680 · 2026-03-13T20:11:52Z

effectful/handlers/llm/completions.py

+        if msg["role"] not in ("user", "tool", "assistant"):
+            continue
+        content = msg.get("content")
+        if isinstance(content, list) and content:


Ah, I see this applies to all Messages. Would it be easier to have this live in _make_message, our Message constructor? Or maybe in completions?

I think that makes more sense, will update.

Ah, no I guess the key difference is that we only apply this if the template has a history so we don't cache every template, only agent ones.

kiranandcode added the module:llm label Mar 13, 2026

kiranandcode force-pushed the kg-prompt-caching branch from d2bafd3 to 06f075f Compare March 13, 2026 15:16

enable prompt caching for agent calls

7348da4

kiranandcode force-pushed the kg-prompt-caching branch from 06f075f to 7348da4 Compare March 13, 2026 15:16

kiranandcode linked an issue Mar 13, 2026 that may be closed by this pull request

Effectful/litellm does not enable prompt_caching for anthropic #611

Open

kiranandcode added 2 commits March 13, 2026 11:51

assistant messages caching?

369d85b

cache control in all

716d873

eb8680 reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable prompt caching for agent calls (closes #611)#612

Enable prompt caching for agent calls (closes #611)#612
kiranandcode wants to merge 3 commits intomasterfrom
kg-prompt-caching

kiranandcode commented Mar 13, 2026

Uh oh!

kiranandcode commented Mar 13, 2026

Uh oh!

eb8680 left a comment

Uh oh!

eb8680 Mar 13, 2026

Uh oh!

kiranandcode Mar 13, 2026

Uh oh!

kiranandcode Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kiranandcode commented Mar 13, 2026

Uh oh!

kiranandcode commented Mar 13, 2026

Uh oh!

eb8680 left a comment

Choose a reason for hiding this comment

Uh oh!

eb8680 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

kiranandcode Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

kiranandcode Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants