-
-
Notifications
You must be signed in to change notification settings - Fork 715
Rate Limiting #5
Copy link
Copy link
Open
Milestone
Description
1. Overview:
Implement mechanisms to control the frequency of operations performed by agents, particularly calls to external services like LLM providers and tools. This helps prevent exceeding API rate limits imposed by third-party services, ensures fair usage in multi-tenant environments, manages costs, and improves overall system stability by avoiding request throttling or blocking.
2. Goals:
- Provide configurable rate limiting strategies (e.g., requests per second/minute, token usage per minute).
- Allow defining rate limits at different granularities (e.g., per agent, per tool, per LLM provider, global).
- Integrate rate limiting checks seamlessly into the agent execution flow (before LLM calls and tool executions).
- Offer clear feedback or error handling when a rate limit is hit (e.g., delayed execution, specific error messages).
- Ensure the rate limiting mechanism is efficient and doesn't introduce significant overhead.
- Allow enabling/disabling rate limiting easily.
3. Proposed Architecture & Components:
RateLimiterInterface/Base Class: Defines the core methods for rate limiting (e.g.,acquire(),check()). Concrete implementations could include:TokenBucketLimiter: Classic token bucket algorithm.LeakyBucketLimiter: Leaky bucket algorithm.FixedWindowCounterLimiter: Simple counter within fixed time windows.
RateLimitManager: A central service (potentially part ofVoltAgentor configurable perAgent) responsible for:- Loading rate limit configurations.
- Instantiating and managing
RateLimiterinstances based on configuration. - Providing methods for agents/tools to check and acquire permits before making calls.
- Configuration: A way to define rate limit rules (e.g., in the agent options or a separate configuration file). This should specify:
- The scope (agent ID, tool name, provider type, 'global').
- The limit (e.g., 10 requests per minute).
- The strategy (e.g., 'token_bucket').
- Integration Points: Modify core agent logic to consult the
RateLimitManager:- LLM Calls: Before calling
llm.generateText,llm.streamText, etc. - Tool Calls: Within the
ToolManageror before executing a tool's_callmethod.
- LLM Calls: Before calling
4. Affected Core Modules:
Agent: Core execution logic needs modification to check limits before LLM calls.ToolManager/AgentTool: Tool execution logic needs modification to check limits.LLMProvider: Might need adjustments or hooks to integrate checks.VoltAgent/Agent Options: Configuration needs to be handled.- Potentially new utility modules for rate limiter implementations.
5. Acceptance Criteria (Initial MVP):
- Users can configure a simple global rate limit (e.g., max N requests per minute) for all LLM calls across all agents.
- The framework prevents exceeding this limit by introducing delays or throwing specific errors.
- A basic
FixedWindowCounterLimiteris implemented. - Configuration is possible via
Agentoptions. - Documentation explains how to enable and configure the global LLM rate limit.
6. Potential Challenges & Considerations:
- Handling distributed rate limiting if
voltagentis run across multiple instances. - Accurately measuring token usage for token-based limits, especially with streaming.
- Choosing appropriate default limits and strategies.
- Balancing strictness of limits with agent responsiveness.
- Providing clear and actionable feedback to the developer/user when limits are hit.
- Performance impact of the rate limiting checks.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Todo