Skip to content

Conversation

monotykamary
Copy link
Contributor

@monotykamary monotykamary commented Jun 1, 2025

This PR implements prompt caching for the Anthropic provider via a cache_mode setting, enabling more efficient API usage for repetitive tasks or prompts with consistent elements. This addresses issue #20 and incorporates feedback on the caching strategy.

image image image

Changes Made:

  • Added cache_mode: Literal["off", "prompt", "auto"] to AnthropicSettings in src/mcp_agent/config.py. Default is "off".
  • Removed the global prompt_caching flag from RequestParams and AugmentedLLM. Caching is now solely controlled by the provider-specific cache_mode.
  • Updated src/mcp_agent/llm/providers/augmented_llm_anthropic.py to implement caching based on cache_mode:
    • "off": No caching is applied.
    • "prompt": Applies cache_control: {"type": "ephemeral"} to the last content block of the first user message in the API request. Ideal for caching large, static initial user prompts (e.g., from apply_prompt).
    • "auto": Applies cache_control: {"type": "ephemeral"} to the system prompt (if present) and to the last content block of the last message in the API request. This aims to cache the general instruction and the most recent turn.
  • Corrected API request format for system prompt caching to adhere to Anthropic's expected list structure.
  • Ensured caching logic respects Anthropic's limit of 4 cached blocks per request by making the "auto" mode more conservative (caching system prompt + last message only).

Functionality:

The cache_mode in AnthropicSettings (configurable in fastagent.config.yaml) now controls how caching is applied:

  • "off": Default. No caching.
  • "prompt": Caches the first user message. Useful for apply_prompt scenarios with large initial user inputs.
  • "auto": Caches the system prompt and the latest user/assistant interaction (specifically, the last message in the request).

This can lead to:
* Reduced latency on subsequent calls with the same cached prefix.
* Lower token costs for cached portions of the prompt.

This feature is particularly useful for:
* Long conversations where the initial context (system prompt, history) remains the same.
* RAG applications where large documents are part of the prompt.
* Scenarios with extensive tool definitions or few-shot examples.

Closes #20.

@monotykamary monotykamary changed the title feat(anthropic): correctly implement prompt caching for messages api feat(anthropic): implement prompt caching for messages api Jun 1, 2025
@monotykamary
Copy link
Contributor Author

This PR has been updated based on feedback:

  • The global prompt_caching flag has been removed. Caching is now exclusively controlled by the cache_mode setting in AnthropicSettings (fastagent.config.yaml).
  • cache_mode: "auto" now caches the system prompt (if present) and the last content block of the last message in the request. This is a more conservative approach to stay within Anthropic's limit of 4 cached blocks.
  • cache_mode: "prompt" caches the last content block of the first user message in the request.
  • cache_mode: "off" disables caching.

Addressed API errors related to cache_control formatting and block limits. The latest changes ensure the system prompt is correctly formatted for caching and that the "auto" mode respects the 4-block limit.

@njbbaer
Copy link

njbbaer commented Jun 10, 2025

Thanks for putting this together @monotykamary! I tried pulling this custom branch into my project, since caching is necessary for my use-case to be cost effective, and from some minimal testing it does work.

However, I notice cache_mode: "auto" only caches up to the last user message. This isn't so effective for me because a user messages can trigger a string of tool calls, which won't be cached until the chain is complete. Why not cache at the last message of any type? Or perhaps cache at both, since we can use up to 4 cache blocks.

Another small note is that I needed to install the deprecated library to make your branch work. I think the main branch of fast-agent now requires that dependency, so perhaps you just need to pull that into your branch.

@monotykamary
Copy link
Contributor Author

I'll have a quick crack at it 👍

@monotykamary monotykamary force-pushed the feat/anthropic-prompt-caching branch from e2b9f4c to 58a891f Compare June 11, 2025 03:26
@evalstate
Copy link
Owner

@monotykamary -- i think we're ok with the current implementation in main now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Anthropic prompt caching support
4 participants