feat(zai): adopt tier framework for plan-aware rate limiting#83
feat(zai): adopt tier framework for plan-aware rate limiting#83Societus wants to merge 6 commits intorepowise-dev:mainfrom
Conversation
- Add litellm to interactive provider selection menu - Support LITELLM_BASE_URL for local proxy deployments (no API key required) - Auto-add openai/ prefix when using api_base for proper LiteLLM routing - Add dummy API key for local proxies (OpenAI SDK requirement) - Add validation and tests for litellm provider configuration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add first-class support for Z.AI with OpenAI-compatible API. - New ZAIProvider with thinking disabled by default for GLM-5 family - Plan selection: 'coding' (subscription) or 'general' (pay-as-you-go) - Environment variables: ZAI_API_KEY, ZAI_PLAN, ZAI_BASE_URL, ZAI_THINKING - Rate limit defaults and auto-detection in CLI helpers Closes repowise-dev#68
Add RATE_LIMIT_TIERS class attribute and resolve_rate_limiter() static method to BaseProvider. Any provider with subscription tiers can define RATE_LIMIT_TIERS and pass tier + tiers to resolve_rate_limiter() to get automatic tier-aware rate limiter creation. Precedence: tier > explicit rate_limiter > None. Tier matching is case-insensitive. Invalid tiers raise ValueError. This is a provider-agnostic foundation -- no provider-specific code. Providers adopt it by defining RATE_LIMIT_TIERS and calling resolve_rate_limiter() in their constructor. Ref: repowise-dev#68
Wire Z.AI provider into the BaseProvider tier framework (from PR #NN). Changes: - Define RATE_LIMIT_TIERS on ZAIProvider with Lite/Pro/Max configs derived from Z.AI support guidance (April 2026) - Use resolve_rate_limiter() in constructor (tier > explicit > none) - Add ZAI_TIER env var support in CLI helpers - Add ZAI_TIER_DEFAULTS to rate_limiter.py for reference - Update PROVIDER_DEFAULTS['zai'] to conservative Lite-tier default - Bump retry budget: 5 retries / 30s max wait (from 3/4s) for Z.AI load-shedding tolerance - Add tier parameter to constructor and docstring Rate limit context: - Z.AI concurrency limits are aggregate, dynamic, and load-dependent - Advanced models (GLM-5 family) consume 2-3x quota per prompt - Conservative defaults: Lite 10 RPM, Pro 30 RPM, Max 60 RPM - Ref: https://docs.z.ai/devpack/usage-policy Depends on: feat/generic-tier-framework Supersedes: repowise-dev#80 (deprecates monolithic PR in favor of layered approach) Ref: repowise-dev#68
Add MiniMax as a built-in provider using the generic tier framework (repowise-dev#82). MiniMax is an OpenAI-compatible API provider with the M2.x model family (M2.7, M2.5, M2.1, M2) and published token plan rate tiers. Changes: - New MiniMaxProvider with RATE_LIMIT_TIERS (starter/plus/max/ultra) derived from published 5-hour rolling window limits - Uses resolve_rate_limiter() from BaseProvider for tier resolution - reasoning_split=True by default to separate thinking from content - Bumped retry budget: 5 retries / 30s max for load-shedding tolerance - Registered in provider registry with openai package dependency hint - Conservative PROVIDER_DEFAULTS (Starter-tier: 5 RPM / 25K TPM) - CLI env vars: MINIMAX_API_KEY, MINIMAX_BASE_URL, MINIMAX_REASONING_SPLIT, MINIMAX_TIER - 30 unit tests (constructor, tiers, generate, stream_chat, registry) Rate limit tiers (from https://platform.minimax.io/docs/token-plan/intro): Starter: 1,500 req/5hrs -> 5 RPM / 25K TPM Plus: 4,500 req/5hrs -> 15 RPM / 75K TPM Max: 15,000 req/5hrs -> 50 RPM / 250K TPM Ultra: 30,000 req/5hrs -> 100 RPM / 500K TPM Highspeed variants (e.g., MiniMax-M2.7-highspeed) share the same rate limits as their base plan -- the difference is faster inference, not quota. This provider is structurally identical to Z.AI (repowise-dev#83) and was trivial to implement because both use the generic tier framework. The framework eliminated all per-provider boilerplate for tier resolution. Depends on: repowise-dev#82 (generic tier framework) Ref: repowise-dev#68
swati510
left a comment
There was a problem hiding this comment.
Nice work on the generic tier framework, it's the right shape. Two things worth fixing before merge:
-
ZAI_TIER_DEFAULTS in packages/core/src/repowise/core/rate_limiter.py duplicates the same values as ZAIProvider.RATE_LIMIT_TIERS. Nothing imports ZAI_TIER_DEFAULTS since resolve_rate_limiter reads from the class attribute. It's dead code waiting to drift. Drop it or have the provider read from it, one source of truth.
-
The base URL normalization in zai.py concerns me. _PLAN_BASE_URLS values end with /v4, then init force-appends /v1, so the client hits https://api.z.ai/api/coding/paas/v4/v1/chat/completions. Has this been tested against the live API? Z.AI's OpenAI-compatible endpoint is /paas/v4 as-is, the SDK adds /chat/completions itself.
| # Normalize base URL for OpenAI SDK | ||
| effective_base_url = effective_base_url.rstrip("/") | ||
| if not effective_base_url.endswith("/v1"): | ||
| effective_base_url += "/v1" |
There was a problem hiding this comment.
Confirm this works against the live Z.AI API. Their OpenAI-compatible endpoint is /v4 as-is, the SDK tacks on /chat/completions. Adding /v1 here produces /v4/v1/chat/completions which I'd expect to 404.
ZAI_TIER_DEFAULTS in rate_limiter.py duplicated the same values as ZAIProvider.RATE_LIMIT_TIERS and nothing imported it. Single source of truth lives on the provider class. The /v1 suffix normalization produced /v4/v1/chat/completions which 404s against Z.AI's live API. Their endpoint is /paas/v4 as-is; the OpenAI SDK appends /chat/completions itself. Tested against live Z.AI API: /v4/chat/completions → 200 /v4/v1/chat/completions → 404 Addresses review feedback from @swati510 on repowise-dev#83.
Summary
Wire Z.AI provider into the generic tier framework from #82. Adds plan-aware rate limiting based on Z.AI subscription tier (Lite/Pro/Max) with environment variable configuration.
Depends on: #82 (generic tier framework -- merge that first)
Changes
Z.AI Provider (
zai.py)RATE_LIMIT_TIERSwith Lite/Pro/Max configs derived from Z.AI support guidance (April 2026)resolve_rate_limiter()fromBaseProviderin constructortierparameter to constructor and docstringRate Limiter (
rate_limiter.py)PROVIDER_DEFAULTS["zai"]to conservative Lite-tier default (10 RPM / 50K TPM)ZAI_TIER_DEFAULTSdict for reference and documentationCLI Helpers (
helpers.py)ZAI_TIERenv var reading in both explicit and auto-detect provider resolution pathsTests (
test_zai_provider.py)Rate Limit Context
Z.AI support provided the following guidance (April 2026):
Key facts:
Configuration
Test Plan
uv run pytest tests/unit/test_providers/test_zai_provider.py -v # 34 passed (21 existing + 13 new tier tests)PR Stack
Related