feat(anthropic): implement prompt caching for messages api #200
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements prompt caching for the Anthropic provider via a
cache_mode
setting, enabling more efficient API usage for repetitive tasks or prompts with consistent elements. This addresses issue #20 and incorporates feedback on the caching strategy.Changes Made:
cache_mode: Literal["off", "prompt", "auto"]
toAnthropicSettings
insrc/mcp_agent/config.py
. Default is "off".prompt_caching
flag fromRequestParams
andAugmentedLLM
. Caching is now solely controlled by the provider-specificcache_mode
.src/mcp_agent/llm/providers/augmented_llm_anthropic.py
to implement caching based oncache_mode
:"off"
: No caching is applied."prompt"
: Appliescache_control: {"type": "ephemeral"}
to the last content block of the first user message in the API request. Ideal for caching large, static initial user prompts (e.g., fromapply_prompt
)."auto"
: Appliescache_control: {"type": "ephemeral"}
to the system prompt (if present) and to the last content block of the last message in the API request. This aims to cache the general instruction and the most recent turn.Functionality:
The
cache_mode
inAnthropicSettings
(configurable infastagent.config.yaml
) now controls how caching is applied:"off"
: Default. No caching."prompt"
: Caches the first user message. Useful forapply_prompt
scenarios with large initial user inputs."auto"
: Caches the system prompt and the latest user/assistant interaction (specifically, the last message in the request).This can lead to:
* Reduced latency on subsequent calls with the same cached prefix.
* Lower token costs for cached portions of the prompt.
This feature is particularly useful for:
* Long conversations where the initial context (system prompt, history) remains the same.
* RAG applications where large documents are part of the prompt.
* Scenarios with extensive tool definitions or few-shot examples.
Closes #20.