Skip to content

Max available capacity limiter#1

Open
xmartis3 wants to merge 21 commits intomain-ceritfrom
max-available-capacity-limiter
Open

Max available capacity limiter#1
xmartis3 wants to merge 21 commits intomain-ceritfrom
max-available-capacity-limiter

Conversation

@xmartis3
Copy link
Copy Markdown
Collaborator

No description provided.

KrKOo and others added 21 commits April 20, 2026 13:46
  - tokens_to_add→ requests_to_add
  - current_tokens → current_requests
- Remove TypedDict types for cache data
- Store cache values separately instead of nested objects
- Simplify budget management logic
- Add DEFAULT_TTL_TIME constant (300 seconds)
- Use async_increment_cache with TTL for all cache operations
- Fix refill logic to properly track last_refill timestamp
- Remove unused self._prev_load attribute
- Extract BASE_RATE as module-level constant
- Fix _fetch_tokens_used_in_window to use last_refill instead of timestamp
- Return int from _refill_user_budget after increment
- Added missing await to _get_model_limits() call which was causing
  'cannot unpack non-iterable coroutine object' error
- Fixed cache key usage to use constants instead of inline strings
- Added requests_used tracking alongside tokens_used for workload calculation
- Removed unused DEFAULT_REFILL_RATE and WORKLOAD_WINDOW_MINUTES variables

The error occurred because _get_model_limits is an async function but was
being called without await on line 189, returning a coroutine instead of
the expected tuple of (tpm_limit, rpm_limit).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants