Skip to content

feat: add LiteLLM as unified AI gateway provider#1521

Open
RheagalFire wants to merge 3 commits intoopen-compass:mainfrom
RheagalFire:feat/add-litellm-provider
Open

feat: add LiteLLM as unified AI gateway provider#1521
RheagalFire wants to merge 3 commits intoopen-compass:mainfrom
RheagalFire:feat/add-litellm-provider

Conversation

@RheagalFire
Copy link
Copy Markdown

@RheagalFire RheagalFire commented Apr 22, 2026

Summary

  • Adds LiteLLMAPI, a new API provider backed by LiteLLM, enabling access to 100+ LLM providers (OpenAI, Anthropic, Azure, Bedrock, Vertex AI, Together, Groq, etc.) through a single unified interface.
  • Follows the same pattern as TogetherAPI and BedrockAPI. Additive only, existing providers untouched.

Motivation

VLMEvalKit currently requires a separate provider file for each LLM backend. Users who want to evaluate models across Azure, Bedrock, or Vertex AI need provider-specific code. LiteLLM provides a unified completion() interface that handles auth, formatting, and provider-specific quirks, enabling cross-provider evaluation with a single configuration change.

Changes

  • vlmeval/api/litellm_api.py -- new LiteLLMAPI provider extending BaseAPI
  • vlmeval/api/__init__.py -- import + __all__ registration
  • vlmeval/config.py -- 8 model presets across 5 providers (OpenAI, Anthropic, Google, AWS Bedrock, Together AI, Groq)
  • requirements.txt -- added litellm>=1.55,<1.85
  • tests/test_litellm_api.py -- 22 unit tests covering init, content prep, message prep, generate_inner, error handling, registration

Key implementation details

  • Optional dependency: litellm>=1.55,<1.85, lazy-imported at module level with try/except (same pattern as bedrock.py with boto3). Base install unaffected.
  • drop_params=True by default, silently drops provider-unsupported kwargs (e.g. seed/strict on Anthropic, response_format on Bedrock). Prevents cross-provider evaluation failures.
  • Flexible auth: accepts key= param, LITELLM_API_KEY env var, or provider-specific env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY, AZURE_API_KEY, etc.).
  • litellm_kwargs passthrough for advanced settings (seed, top_p, provider-specific params).

Usage and Testing

1. Unit tests (22/22 pass):
tests/test_litellm_api.py::TestLiteLLMAPIInit::test_default_params PASSED
tests/test_litellm_api.py::TestLiteLLMAPIInit::test_custom_params PASSED
tests/test_litellm_api.py::TestLiteLLMAPIInit::test_key_from_env PASSED
tests/test_litellm_api.py::TestLiteLLMAPIInit::test_key_param_overrides_env PASSED
tests/test_litellm_api.py::TestPrepareContent::test_text_only PASSED
tests/test_litellm_api.py::TestPrepareContent::test_image_and_text PASSED
tests/test_litellm_api.py::TestPrepareMessages::test_flat_inputs PASSED
tests/test_litellm_api.py::TestPrepareMessages::test_system_prompt PASSED
tests/test_litellm_api.py::TestPrepareMessages::test_role_based_inputs PASSED
tests/test_litellm_api.py::TestGenerateInner::test_success PASSED
tests/test_litellm_api.py::TestGenerateInner::test_drop_params_default_true PASSED
tests/test_litellm_api.py::TestGenerateInner::test_api_key_forwarded PASSED
tests/test_litellm_api.py::TestGenerateInner::test_api_key_omitted_when_none PASSED
tests/test_litellm_api.py::TestGenerateInner::test_api_base_forwarded PASSED
tests/test_litellm_api.py::TestGenerateInner::test_error_returns_negative_one PASSED
tests/test_litellm_api.py::TestGenerateInner::test_litellm_not_installed PASSED
tests/test_litellm_api.py::TestGenerateInner::test_temperature_override PASSED
tests/test_litellm_api.py::TestGenerateInner::test_max_tokens_override PASSED
tests/test_litellm_api.py::TestGenerateInner::test_litellm_kwargs_passthrough PASSED
tests/test_litellm_api.py::TestConfigRegistration::test_litellm_entries_in_config PASSED
tests/test_litellm_api.py::TestConfigRegistration::test_litellm_in_init_all PASSED
tests/test_litellm_api.py::TestConfigRegistration::test_version_pin_in_docstring PASSED
======================== 22 passed in 1.29s =========================

2. Live E2E against Azure GPT-4o-mini (text):
LIVE E2E TEST 1: Azure GPT-4o-mini (text)
Model: azure/gpt-4o-mini
Query: What is 2+2? Reply with just the number.
ret_code: 0
answer: 4
model: gpt-4o-mini-2024-07-18
usage: prompt=20, completion=2, total=22

LIVE E2E TEST 2: Azure GPT-4o-mini (multi-turn with system prompt)
Model: azure/gpt-4o-mini
System: You are a math tutor. Be concise.
Query: What is the square root of 144?
ret_code: 0
answer: The square root of 144 is 12.
model: gpt-4o-mini-2024-07-18
usage: prompt=29, completion=11, total=40

3. Live E2E vision test (Anthropic Claude Sonnet 4-6 via Azure):

LiteLLM Provider for VLMEvalKit -- Live Integration Tests     

[Test 1] Text-only completion
Model: anthropic/claude-sonnet-4-6
Prompt: 'What is 2+2? Reply with just the number.'
Answer: 4
Tokens: in=20 out=5
PASSED

[Test 2] Vision -- real image via _prepare_content pipeline
Model: anthropic/claude-sonnet-4-6
Image: BA50EF10-8F5D-4719-BA18-B20A80EF5A8F.png
Answer: I see a cute cartoon character (a round white figure) holding a
glass, sitting next to two bottles of Jack Daniel's whiskey.
Tokens: in=38 out=40
PASSED

[Test 3] VLMEvalKit input format -- dict list with type/value
Model: anthropic/claude-sonnet-4-6
Input: [{'type': 'image', 'value': ''}, {'type': 'text', 'value': '...'}]
Answer: cartoon character, glass, Jack Daniel's whiskey bottle (x2),
wooden box/crate, liquid
Tokens: in=38 out=63
PASSED

================================================================
All 3 tests passed -- text + vision + VLMEvalKit format

4. Lint: flake8 --max-line-length 99 -> all clean.

Example usage

from functools import partial     
from vlmeval.api import LiteLLMAPI
                                                                                                                                                                                                                                                                                                                                                                        
# Use any LiteLLM model string                                                                  
model = partial(LiteLLMAPI, model='azure/gpt-4o-mini', temperature=0, max_tokens=2048, retry=10)                                                                                                                                                                                                                                                                        
                           
# Or via config.py presets:
# python run.py --model LiteLLM_GPT4o --data MMBench_DEV_EN

…lm dep

- Move litellm_api import to correct alphabetical position in __init__.py
- Skip empty text values in _prepare_content when images are present (matches Bedrock/GPT pattern)
- Place litellm_kwargs spread before explicit params so runtime overrides take precedence
- Add litellm>=1.55,<1.85 to requirements.txt
@RheagalFire
Copy link
Copy Markdown
Author

cc @kennymckormick would like your review

Add 4 more config entries to showcase LiteLLM's cross-provider vision
support: Gemini 2.5 Pro, Bedrock Claude 3.5 Sonnet, Llama 3.2 Vision
(Together AI), and Llama 4 Scout (Groq).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant