All notable changes to blockrun-llm will be documented in this file.
- Smart router: AUTO/ECO
SIMPLEprimaries promoted frommoonshot/kimi-k2.5→moonshot/kimi-k2.6(Moonshot's flagship — 256K context, vision +reasoning_content, $0.95 in / $4.00 out per 1M). The catalog now hideskimi-k2.5as superseded, so it no longer appears in/v1/modelsand the SDK could not resolve its pricing — routing was silently falling through to the next fallback.kimi-k2.5retained as the first fallback for clients explicitly pinned to its pricing. - Doc refresh: README Smart Routing example output and SIMPLE tier table now reference
moonshot/kimi-k2.6.
- New flagship model:
openai/gpt-5.5(released 2026-04-23, first fully retrained base since GPT-4.5). 1M context, 128K output, native agent + computer use. Pricing $5.00 / $30.00 per 1M tokens. - Smart router:
PREMIUM_TIERS["MEDIUM"]now points atopenai/gpt-5.5;gpt-5.4demoted to first fallback. The cost-savings baseline inestimate_costwas rebased from GPT-5.4 ($2.50/$15) to GPT-5.5 ($5.00/$30) so reported savings stay meaningful against the current flagship. - Doc-example refresh:
AnthropicClientcross-provider example andexamples/arbitrage_analyzer.pyfrontiertier now referenceopenai/gpt-5.5. - Reconciles
__version__andVERSION(previously drifted at 0.16.1 vs 0.15.0); both now 0.17.0.
ImageClientdefault timeout 120s → 200s. The gateway's per-call OpenAI timeout forgpt-image-2was bumped to 180s server-side (it routinely takes ~120-180s at 1536x1024 and larger), so the SDK's old 120s default was cutting the request before the server had a chance to return. New default leaves ~20s of buffer above the server cap. Existing users passing an explicittimeout=are unaffected.
- VideoClient switches to async submit+poll. Upstream
/v1/videos/generationsmoved from sync to async on 2026-04-23 (submit returns a job id; client polls until completion). Public signature ofVideoClient.generate(...)is unchanged — still blocks until the video is ready and returnsVideoResponsewith the MP4 URL and tx hash. Internally the client now signs once, submits, and replays the same signature on GET polls every 5s until upstream completes. Settlement only fires on the first completed poll, so upstream failure or budget exhaustion = zero charge. - Added
budget_secondsparameter togenerate()(default 300s) to cap the polling window. - Bumped advertised
max_timeout_secondson video requests from 300s to 600s so the signed auth stays valid across the full polling window.
- New image model:
openai/gpt-image-2(ChatGPT Images 2.0). Reasoning-driven generation with multilingual text rendering + character consistency. Pricing: $0.06 for 1024² / $0.12 for 1536×1024 or 1024×1536. Supports bothclient.generate()andclient.edit()via the/v1/images/image2imageendpoint. - New video models: 3 ByteDance Seedance variants on
VideoClient:bytedance/seedance-1.5-pro— $0.03/sec, 720p, 5s default (up to 10s).bytedance/seedance-2.0-fast— $0.15/sec, ~60-80s generation, sweet-spot price/quality.bytedance/seedance-2.0— $0.30/sec, 720p Pro quality. All support text-to-video and image-to-video. Pass the model ID toVideoClient.generate(..., model=...).
- README Image/Video sections list new models; image editing section notes
gpt-image-1andgpt-image-2as supported. - Also:
pyproject.tomlversion was stuck at 0.13.0 despite__version__saying 0.14.1 (prevented PyPI publishes from shipping the NVIDIA refresh). Both now aligned at 0.15.0.
- NVIDIA free-tier refresh (backend 2026-04-21). Router updated to point at the current survivors + the two new models:
nvidia/qwen3-next-80b-a3b-thinking(reasoning flagship, 116 tok/s) andnvidia/mistral-small-4-119b(fastest free chat, 114 tok/s). - Retired IDs no longer referenced by
router.py:nvidia/nemotron-super-49b,nvidia/nemotron-ultra-253b,nvidia/mistral-large-3-675b. The backend still redirects them, but offline routing now points at the canonical successors (nvidia/qwen3-next-80b-a3b-thinking,nvidia/mistral-small-4-119b,nvidia/llama-4-maverick,nvidia/glm-4.7). - AUTO / ECO
SIMPLEprimaries switched fromnvidia/kimi-k2.5(retired) tomoonshot/kimi-k2.5— backend redirect still works, but the router now references the canonical target. - README NVIDIA table refreshed (8 visible models +
moonshot/kimi-k2.5).
- New
SearchClient— wrapsPOST /v1/search(standalone Grok Live Search). $0.025 per source + margin, 1–50 sources per call. - New
XClient— 13 methods mapping the/v1/x/*endpoints (user lookup/info/followers/following/verified-followers/tweets/mentions, tweet lookup/replies/thread, search, trending, articles/rising). Replaces orphanedX*types that had no caller. - New
PriceClient— Pyth-backed market data with.price(),.history(),.list_symbols(). Crypto, FX and commodity are fully free (price + history + list); stocks across 12 markets (us/hk/jp/kr/gb/de/fr/nl/ie/lu/cn/ca) and theusstocklegacy alias charge for price + history, list stays free. The client handles both paths transparently. ChatMessagegains optionalreasoning_contentandthinkingfields for reasoning-capable models (DeepSeek Reasoner, Grok 4 / 4.20 reasoning).ChatUsagegains optionalcache_read_input_tokens/cache_creation_input_tokensfor Anthropic prompt caching telemetry.Modelgains optionalbilling_mode(paid/flat/free),flat_price,categories,hiddensolist_models()can surface full backend metadata.- New market-data types:
PricePoint,PriceBar,PriceHistoryResponse,SymbolListResponse. VERSIONfile synced to match__init__.py.
- New
VideoClient— generate AI videos viaxai/grok-imagine-video($0.05/sec, 8s default). VideoResponse,VideoClip,VideoModeltypes added.- Text-to-video and image-to-video supported; client blocks until polling completes (~30-120s).
ImageDatanow exposessource_urlandbacked_upfor gateway-mirrored assets.- Grok Imagine image models (
xai/grok-imagine-image,-pro) routable viaImageClient. - Grok 4.20 chat models (
xai/grok-4.20-reasoning,-non-reasoning,-multi-agent) routable via the chat API.
- 43+ models supported
- Base and Solana chain payments
- x402 v2 protocol
- Image generation support
- Anthropic-compatible client
- Smart model routing
- Response caching