feat: add LiteLLM as unified AI gateway provider by RheagalFire · Pull Request #1521 · open-compass/VLMEvalKit

RheagalFire · 2026-04-22T15:10:03Z

Summary

Adds LiteLLMAPI, a new API provider backed by LiteLLM, enabling access to 100+ LLM providers (OpenAI, Anthropic, Azure, Bedrock, Vertex AI, Together, Groq, etc.) through a single unified interface.
Follows the same pattern as TogetherAPI and BedrockAPI. Additive only, existing providers untouched.

Motivation

VLMEvalKit currently requires a separate provider file for each LLM backend. Users who want to evaluate models across Azure, Bedrock, or Vertex AI need provider-specific code. LiteLLM provides a unified completion() interface that handles auth, formatting, and provider-specific quirks, enabling cross-provider evaluation with a single configuration change.

Changes

vlmeval/api/litellm_api.py -- new LiteLLMAPI provider extending BaseAPI
vlmeval/api/__init__.py -- import + __all__ registration
vlmeval/config.py -- 8 model presets across 5 providers (OpenAI, Anthropic, Google, AWS Bedrock, Together AI, Groq)
requirements.txt -- added litellm>=1.55,<1.85
tests/test_litellm_api.py -- 22 unit tests covering init, content prep, message prep, generate_inner, error handling, registration

Key implementation details

Optional dependency: litellm>=1.55,<1.85, lazy-imported at module level with try/except (same pattern as bedrock.py with boto3). Base install unaffected.
drop_params=True by default, silently drops provider-unsupported kwargs (e.g. seed/strict on Anthropic, response_format on Bedrock). Prevents cross-provider evaluation failures.
Flexible auth: accepts key= param, LITELLM_API_KEY env var, or provider-specific env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY, AZURE_API_KEY, etc.).
litellm_kwargs passthrough for advanced settings (seed, top_p, provider-specific params).

Usage and Testing

1. Unit tests (22/22 pass):
tests/test_litellm_api.py::TestLiteLLMAPIInit::test_default_params PASSED
tests/test_litellm_api.py::TestLiteLLMAPIInit::test_custom_params PASSED
tests/test_litellm_api.py::TestLiteLLMAPIInit::test_key_from_env PASSED
tests/test_litellm_api.py::TestLiteLLMAPIInit::test_key_param_overrides_env PASSED
tests/test_litellm_api.py::TestPrepareContent::test_text_only PASSED
tests/test_litellm_api.py::TestPrepareContent::test_image_and_text PASSED
tests/test_litellm_api.py::TestPrepareMessages::test_flat_inputs PASSED
tests/test_litellm_api.py::TestPrepareMessages::test_system_prompt PASSED
tests/test_litellm_api.py::TestPrepareMessages::test_role_based_inputs PASSED
tests/test_litellm_api.py::TestGenerateInner::test_success PASSED
tests/test_litellm_api.py::TestGenerateInner::test_drop_params_default_true PASSED
tests/test_litellm_api.py::TestGenerateInner::test_api_key_forwarded PASSED
tests/test_litellm_api.py::TestGenerateInner::test_api_key_omitted_when_none PASSED
tests/test_litellm_api.py::TestGenerateInner::test_api_base_forwarded PASSED
tests/test_litellm_api.py::TestGenerateInner::test_error_returns_negative_one PASSED
tests/test_litellm_api.py::TestGenerateInner::test_litellm_not_installed PASSED
tests/test_litellm_api.py::TestGenerateInner::test_temperature_override PASSED
tests/test_litellm_api.py::TestGenerateInner::test_max_tokens_override PASSED
tests/test_litellm_api.py::TestGenerateInner::test_litellm_kwargs_passthrough PASSED
tests/test_litellm_api.py::TestConfigRegistration::test_litellm_entries_in_config PASSED
tests/test_litellm_api.py::TestConfigRegistration::test_litellm_in_init_all PASSED
tests/test_litellm_api.py::TestConfigRegistration::test_version_pin_in_docstring PASSED
======================== 22 passed in 1.29s =========================

2. Live E2E against Azure GPT-4o-mini (text):
LIVE E2E TEST 1: Azure GPT-4o-mini (text)
Model: azure/gpt-4o-mini
Query: What is 2+2? Reply with just the number.
ret_code: 0
answer: 4
model: gpt-4o-mini-2024-07-18
usage: prompt=20, completion=2, total=22

LIVE E2E TEST 2: Azure GPT-4o-mini (multi-turn with system prompt)
Model: azure/gpt-4o-mini
System: You are a math tutor. Be concise.
Query: What is the square root of 144?
ret_code: 0
answer: The square root of 144 is 12.
model: gpt-4o-mini-2024-07-18
usage: prompt=29, completion=11, total=40

3. Live E2E vision test (Anthropic Claude Sonnet 4-6 via Azure):

LiteLLM Provider for VLMEvalKit -- Live Integration Tests

[Test 1] Text-only completion
Model: anthropic/claude-sonnet-4-6
Prompt: 'What is 2+2? Reply with just the number.'
Answer: 4
Tokens: in=20 out=5
PASSED

[Test 2] Vision -- real image via _prepare_content pipeline
Model: anthropic/claude-sonnet-4-6
Image: BA50EF10-8F5D-4719-BA18-B20A80EF5A8F.png
Answer: I see a cute cartoon character (a round white figure) holding a
glass, sitting next to two bottles of Jack Daniel's whiskey.
Tokens: in=38 out=40
PASSED

[Test 3] VLMEvalKit input format -- dict list with type/value
Model: anthropic/claude-sonnet-4-6
Input: [{'type': 'image', 'value': ''}, {'type': 'text', 'value': '...'}]
Answer: cartoon character, glass, Jack Daniel's whiskey bottle (x2),
wooden box/crate, liquid
Tokens: in=38 out=63
PASSED

================================================================
All 3 tests passed -- text + vision + VLMEvalKit format

4. Lint: flake8 --max-line-length 99 -> all clean.

Example usage

from functools import partial     
from vlmeval.api import LiteLLMAPI
                                                                                                                                                                                                                                                                                                                                                                        
# Use any LiteLLM model string                                                                  
model = partial(LiteLLMAPI, model='azure/gpt-4o-mini', temperature=0, max_tokens=2048, retry=10)                                                                                                                                                                                                                                                                        
                           
# Or via config.py presets:
# python run.py --model LiteLLM_GPT4o --data MMBench_DEV_EN

…lm dep - Move litellm_api import to correct alphabetical position in __init__.py - Skip empty text values in _prepare_content when images are present (matches Bedrock/GPT pattern) - Place litellm_kwargs spread before explicit params so runtime overrides take precedence - Add litellm>=1.55,<1.85 to requirements.txt

RheagalFire · 2026-04-22T15:12:08Z

cc @kennymckormick would like your review

Add 4 more config entries to showcase LiteLLM's cross-provider vision support: Gemini 2.5 Pro, Bedrock Claude 3.5 Sonnet, Llama 3.2 Vision (Together AI), and Llama 4 Scout (Groq).

RheagalFire added 2 commits April 22, 2026 18:33

feat: add LiteLLM as unified AI gateway provider

05e0aaf

feat: add vision model presets for Gemini Pro, Bedrock, Together, Groq

1d97488

Add 4 more config entries to showcase LiteLLM's cross-provider vision support: Gemini 2.5 Pro, Bedrock Claude 3.5 Sonnet, Llama 3.2 Vision (Together AI), and Llama 4 Scout (Groq).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LiteLLM as unified AI gateway provider#1521

feat: add LiteLLM as unified AI gateway provider#1521
RheagalFire wants to merge 3 commits intoopen-compass:mainfrom
RheagalFire:feat/add-litellm-provider

RheagalFire commented Apr 22, 2026 •

edited

Loading

Uh oh!

RheagalFire commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RheagalFire commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Key implementation details

Usage and Testing

3. Live E2E vision test (Anthropic Claude Sonnet 4-6 via Azure):

Example usage

Uh oh!

RheagalFire commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RheagalFire commented Apr 22, 2026 •

edited

Loading