Rate Limiting

**1. Overview:**

Implement mechanisms to control the frequency of operations performed by agents, particularly calls to external services like LLM providers and tools. This helps prevent exceeding API rate limits imposed by third-party services, ensures fair usage in multi-tenant environments, manages costs, and improves overall system stability by avoiding request throttling or blocking.

**2. Goals:**

*   Provide configurable rate limiting strategies (e.g., requests per second/minute, token usage per minute).
*   Allow defining rate limits at different granularities (e.g., per agent, per tool, per LLM provider, global).
*   Integrate rate limiting checks seamlessly into the agent execution flow (before LLM calls and tool executions).
*   Offer clear feedback or error handling when a rate limit is hit (e.g., delayed execution, specific error messages).
*   Ensure the rate limiting mechanism is efficient and doesn't introduce significant overhead.
*   Allow enabling/disabling rate limiting easily.

**3. Proposed Architecture & Components:**

*   **`RateLimiter` Interface/Base Class:** Defines the core methods for rate limiting (e.g., `acquire()`, `check()`). Concrete implementations could include:
    *   `TokenBucketLimiter`: Classic token bucket algorithm.
    *   `LeakyBucketLimiter`: Leaky bucket algorithm.
    *   `FixedWindowCounterLimiter`: Simple counter within fixed time windows.
*   **`RateLimitManager`:** A central service (potentially part of `VoltAgent` or configurable per `Agent`) responsible for:
    *   Loading rate limit configurations.
    *   Instantiating and managing `RateLimiter` instances based on configuration.
    *   Providing methods for agents/tools to check and acquire permits before making calls.
*   **Configuration:** A way to define rate limit rules (e.g., in the agent options or a separate configuration file). This should specify:
    *   The scope (agent ID, tool name, provider type, 'global').
    *   The limit (e.g., 10 requests per minute).
    *   The strategy (e.g., 'token_bucket').
*   **Integration Points:** Modify core agent logic to consult the `RateLimitManager`:
    *   **LLM Calls:** Before calling `llm.generateText`, `llm.streamText`, etc.
    *   **Tool Calls:** Within the `ToolManager` or before executing a tool's `_call` method.

**4. Affected Core Modules:**

*   `Agent`: Core execution logic needs modification to check limits before LLM calls.
*   `ToolManager` / `AgentTool`: Tool execution logic needs modification to check limits.
*   `LLMProvider`: Might need adjustments or hooks to integrate checks.
*   `VoltAgent` / `Agent Options`: Configuration needs to be handled.
*   Potentially new utility modules for rate limiter implementations.

**5. Acceptance Criteria (Initial MVP):**

*   Users can configure a simple global rate limit (e.g., max N requests per minute) for all LLM calls across all agents.
*   The framework prevents exceeding this limit by introducing delays or throwing specific errors.
*   A basic `FixedWindowCounterLimiter` is implemented.
*   Configuration is possible via `Agent` options.
*   Documentation explains how to enable and configure the global LLM rate limit.

**6. Potential Challenges & Considerations:**

*   Handling distributed rate limiting if `voltagent` is run across multiple instances.
*   Accurately measuring token usage for token-based limits, especially with streaming.
*   Choosing appropriate default limits and strategies.
*   Balancing strictness of limits with agent responsiveness.
*   Providing clear and actionable feedback to the developer/user when limits are hit.
*   Performance impact of the rate limiting checks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rate Limiting #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Rate Limiting #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions