feat(anthropic): implement prompt caching for messages api #200

monotykamary · 2025-06-01T10:58:41Z

This PR implements prompt caching for the Anthropic provider via a cache_mode setting, enabling more efficient API usage for repetitive tasks or prompts with consistent elements. This addresses issue #20 and incorporates feedback on the caching strategy.

Changes Made:

Added cache_mode: Literal["off", "prompt", "auto"] to AnthropicSettings in src/mcp_agent/config.py. Default is "off".
Removed the global prompt_caching flag from RequestParams and AugmentedLLM. Caching is now solely controlled by the provider-specific cache_mode.
Updated src/mcp_agent/llm/providers/augmented_llm_anthropic.py to implement caching based on cache_mode:
- "off": No caching is applied.
- "prompt": Applies cache_control: {"type": "ephemeral"} to the last content block of the first user message in the API request. Ideal for caching large, static initial user prompts (e.g., from apply_prompt).
- "auto": Applies cache_control: {"type": "ephemeral"} to the system prompt (if present) and to the last content block of the last message in the API request. This aims to cache the general instruction and the most recent turn.
Corrected API request format for system prompt caching to adhere to Anthropic's expected list structure.
Ensured caching logic respects Anthropic's limit of 4 cached blocks per request by making the "auto" mode more conservative (caching system prompt + last message only).

Functionality:

The cache_mode in AnthropicSettings (configurable in fastagent.config.yaml) now controls how caching is applied:

"off": Default. No caching.
"prompt": Caches the first user message. Useful for apply_prompt scenarios with large initial user inputs.
"auto": Caches the system prompt and the latest user/assistant interaction (specifically, the last message in the request).

This can lead to:
* Reduced latency on subsequent calls with the same cached prefix.
* Lower token costs for cached portions of the prompt.

This feature is particularly useful for:
* Long conversations where the initial context (system prompt, history) remains the same.
* RAG applications where large documents are part of the prompt.
* Scenarios with extensive tool definitions or few-shot examples.

Closes #20.

monotykamary · 2025-06-01T11:22:28Z

This PR has been updated based on feedback:

The global prompt_caching flag has been removed. Caching is now exclusively controlled by the cache_mode setting in AnthropicSettings (fastagent.config.yaml).
cache_mode: "auto" now caches the system prompt (if present) and the last content block of the last message in the request. This is a more conservative approach to stay within Anthropic's limit of 4 cached blocks.
cache_mode: "prompt" caches the last content block of the first user message in the request.
cache_mode: "off" disables caching.

Addressed API errors related to cache_control formatting and block limits. The latest changes ensure the system prompt is correctly formatted for caching and that the "auto" mode respects the 4-block limit.

njbbaer · 2025-06-10T06:10:54Z

Thanks for putting this together @monotykamary! I tried pulling this custom branch into my project, since caching is necessary for my use-case to be cost effective, and from some minimal testing it does work.

However, I notice cache_mode: "auto" only caches up to the last user message. This isn't so effective for me because a user messages can trigger a string of tool calls, which won't be cached until the chain is complete. Why not cache at the last message of any type? Or perhaps cache at both, since we can use up to 4 cache blocks.

Another small note is that I needed to install the deprecated library to make your branch work. I think the main branch of fast-agent now requires that dependency, so perhaps you just need to pull that into your branch.

monotykamary · 2025-06-10T18:51:55Z

I'll have a quick crack at it 👍

…pt-caching

evalstate · 2025-07-10T19:52:49Z

@monotykamary -- i think we're ok with the current implementation in main now?

feat(anthropic): correctly implement prompt caching for messages api

a055062

monotykamary changed the title ~~feat(anthropic): correctly implement prompt caching for messages api~~ feat(anthropic): implement prompt caching for messages api Jun 1, 2025

feat: Implement Anthropic cache_mode with refined 'auto' behavior

9faf69a

monotykamary added 2 commits June 11, 2025 02:09

Merge remote-tracking branch 'upstream/main' into feat/anthropic-prom…

3ef976f

…pt-caching

feat(cache): cache last 3 messages for anthropic auto mode

58a891f

monotykamary force-pushed the feat/anthropic-prompt-caching branch from e2b9f4c to 58a891f Compare June 11, 2025 03:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(anthropic): implement prompt caching for messages api #200

feat(anthropic): implement prompt caching for messages api #200

Uh oh!

monotykamary commented Jun 1, 2025 •

edited

Loading

Uh oh!

monotykamary commented Jun 1, 2025

Uh oh!

njbbaer commented Jun 10, 2025

Uh oh!

monotykamary commented Jun 10, 2025

Uh oh!

evalstate commented Jul 10, 2025

Uh oh!

Uh oh!

feat(anthropic): implement prompt caching for messages api #200

Are you sure you want to change the base?

feat(anthropic): implement prompt caching for messages api #200

Uh oh!

Conversation

monotykamary commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

monotykamary commented Jun 1, 2025

Uh oh!

njbbaer commented Jun 10, 2025

Uh oh!

monotykamary commented Jun 10, 2025

Uh oh!

evalstate commented Jul 10, 2025

Uh oh!

Uh oh!

monotykamary commented Jun 1, 2025 •

edited

Loading