Awesome-LLM-VLM-Foundation-Models 🚀⭐⭐⭐

Awesome curated list of LLM, VLM and other Foundation Models

No.	Model	Year	Company	Size & Context Window	Best For / Strengths	Access & Cost
1	GPT-4o (“Omni”)	2024-05	OpenAI	Multimodal / 128K tokens	Text+image+audio+voice; fast & free-tier use	Free-tier in ChatGPT; API: $2.50/1M in, $10/1M out
2	GPT-4o mini	2024-07	OpenAI	~8 B params / 128K tokens	Cost-effective multimodal	ChatGPT replacement; API: $0.15/1M in, $0.60/1M out
3	o4-mini-high / o4-mini	2025-04	OpenAI	Compact reasoning / multimodal	STEM, coding, fast reasoning with vision	API: input $1.10, output $4.40 per 1M tokens
4	o3-mini-high / o3-mini	2024	OpenAI	Small reasoning models	Technical/scientific reasoning on a budget	API: same pricing as o3 & mini models
5	Llama 4 Maverick	2025	Meta AI	Large Mixture-of-Expert (128 experts), 400B parameters, 1M context	Coding, reasoning; GPT-4o-level	$0.19-$0.49/1M in & out tokens
6	Llama 4 Scout	2025	Meta AI	Small (fits 1 A100/H100) / 109B parameters, 10M context	Generalist, long context small model	Open weights
7	Llama 3.1 405B	2024	Meta AI	405 B params / 128K tokens	Research, long-context, coding	Open source
8	Claude 3.7 Sonnet	2024-10	Anthropic	~175 B params / 200K tokens	Extended reasoning & coding	API (paid via Anthropic)
9	Claude 4	2025-04	Anthropic	~200 B params / 200K tokens	Advanced coding, creative writing, multimodal tasks	API (paid via Anthropic); AWS Bedrock, Vertex AI
10	Claude 4.1 Opus	2025-07	Anthropic	~250 B params / 200K tokens	Superior coding, creative writing, transparent reasoning	API (paid via Anthropic); $15/1M out
11	Gemini 2.5 Pro	2024-05	Google DM	Undisclosed; multimodal / 1M tokens	Advanced reasoning, multimodal	API (paid via Google)
12	Gemini 2.5 Pro Preview	2025-03	Google DM	Undisclosed; multimodal / 1M tokens	Advanced reasoning, coding, multimodal tasks	Google AI Studio (free experimental access); API: $1.25/1M in, $10/1M out
13	Gemini 2.5 Flash	2025-04	Google DM	Undisclosed; multimodal / 1M tokens	Fast, cost-effective multimodal tasks	API: $0.3/1M tokens in, $2.5/1M out; Google AI Studio, Vertex AI
14	Gemini 2.5 Flash-Lite	2025-07	Google DM	Undisclosed; multimodal / 1M tokens	Cost-effective, low-latency tasks like classification, summarization	API: $0.10/1M in, $0.40/1M out; Google AI Studio, Vertex AI
15	Stable LM 2 12B	2024-04	Stability AI	12 B params	Open model with good benchmarks	Open source
16	Qwen 2.5-VL 32B	2025-03	Alibaba	32 B params; multimodal / 128K tokens	Vision+language tasks	Open source (Apache 2.0)
17	Mistral Small 3.1	2025-03	Mistral AI	24 B params; 128K tokens	Image & doc understanding	Open source (Apache 2.0)
18	Gemma 3 (27B)	2025-03	Google DM	27 B params	One-GPU efficient model	Open source
19	EmbeddingGemma	2025-09	Google DM	Small, optimized for embeddings, on-device use cases / 308M params / 2K tokens	Text embeddings for semantic search, clustering	Open source
20	Fox-1 1.6B Instruct	2024-11	Fox-1 project	1.6 B params	Instruction-following small LLM, conversational	Open source (Apache 2.0)
21	Grok 3	2025-02	xAI (Elon Musk)	Unknown (Chat-focused) / 1M tokens	Conversational AI, Twitter/X integration	Proprietary (likely X Premium)
22	Grok-3 mini	2025-02	xAI (Elon Musk)	Small; reasoning-focused / 1M tokens	Cost-effective reasoning, coding, STEM tasks	Proprietary (likely X Premium)
27	Grok 4	2025-07	xAI (Elon Musk)	~2.4T params / 256K tokens	Advanced reasoning, coding (Grok 4 Code), multimodal, real-time data	SuperGrok $30/mo, Heavy $300/mo; API: $3/1M in, $15/1M out; X Premium+ access
23	Grok-4 Heavy	2025-07	xAI (Elon Musk)	Unknown	Advanced reasoning, coding, real-time data, high-compute tasks	SuperGrok $300/mo; API: $3/1M in, $15/1M out
24	DeepSeek R1	2025	DeepSeek AI	Reasoning-focused / 128K tokens	Reasoning tasks, competitive with GPT-4.5	Open weights
25	DeepSeek V3.1	2025-06	DeepSeek AI	Undisclosed; reasoning-focused / 128K tokens	Advanced reasoning, coding, cost-efficiency	Open weights
26	Cerebras Qwen3-32B	2025-05	Cerebras	32 B params	High-speed reasoning	Open source (Apache 2.0)
28	Kimi K2	2025-07	Moonshot AI	1T params (32B active) / 128K tokens	Mixture-of-experts (MoE). Agentic intelligence, coding, reasoning, tool use	Open source (Modified MIT); API: $0.15/1M in, $2.50/1M out
29	gpt-oss-20b	2025-08	OpenAI	21B params (3.6B active) / 128K tokens	Reasoning, agentic tasks, local deployment, low latency	Open source (Apache 2.0), downloadable via Hugging Face, Ollama, GitHub
30	gpt-oss-120b	2025-08	OpenAI	117B params (5.1B active) / 128K tokens	Deep reasoning, agentic tasks, enterprise-grade deployment	Open source (Apache 2.0), downloadable via Hugging Face, Ollama, GitHub
31	GPT-5	2025-08	OpenAI	~15T params / 400K tokens	Advanced reasoning, coding, multimodal, scientific tasks	API: $1.25/1M in, $10/1M out; ChatGPT Plus/Pro/Team, Free-tier access

Foundation Models Leaderboards (2025)

This is a curated list of new and up-to-date leaderboards for Large Language Models (LLMs), Vision-Language Models (VLMs), and multimodal models, published or updated in 2025. Each leaderboard provides performance metrics, rankings, and comparisons for state-of-the-art foundation models.

LLM Leaderboard 2025 - llm-stats.com
Comprehensive leaderboard for LLMs with performance metrics and benchmark data. Includes interactive analysis tools to compare models like GPT-4o, Llama, o1, Gemini, and Claude based on context window, speed, and price.
Open LLM Leaderboard - Hugging Face
Evaluates open-source LLMs using benchmarks like IFEval, BBH, and MATH. Features real-time filtering and analysis of models, with community voting and comprehensive results.
LLM Leaderboard 2025 - Vellum
Compares capabilities, price, and context window for leading commercial and open-source LLMs. Features 2025 benchmark data from model providers and independent evaluations, focusing on non-saturated benchmarks (excluding MMLU).
LLM Leaderboard - Artificial Analysis
Ranks over 100 LLMs across metrics like intelligence, price, performance, speed (tokens per second), and context window. Provides detailed comparisons for models from OpenAI, Google, DeepSeek, Alibaba Cloud and others.
SEAL LLM Leaderboards
Expert-driven, private evaluations of LLMs across domains like coding and instruction following. Uses curated datasets to prevent overfitting and ensure high-complexity evaluations.
Open VLM Leaderboard - Hugging Face
Ranks open-source VLMs using 23 multimodal benchmarks (e.g., MMBench_V11, MathVista). Evaluates models like GPT-4v, Gemini, QwenVLPlus, and LLaVA on image-text tasks.
Zero-Shot Video Question Answer on Video-MME
This task present the results of Zeroshot Question Answer results on TGIF-QA dataset for LLM powered Video Conversational Models.

Frameworks and Tools for LLMs, VLMs, and Foundation Models (2025)

This list highlights key frameworks, tools, and libraries for developing, deploying, and managing Large Language Models (LLMs), Vision-Language Models (VLMs), and foundation models.

🛠 Application Development & Prompt Engineering Frameworks

LangChain
A versatile framework for building LLM-powered applications. It simplifies prompt chaining, memory management, and integration with external data sources like vector databases and APIs. Used for chatbots, RAG systems, and agent-based workflows.
LlamaIndex
A data framework designed for connecting LLMs with custom data sources. It excels in data ingestion, indexing, and retrieval for RAG applications, enabling semantic search and context-aware querying. Ideal for document analysis and knowledge base systems.
DSPy
A framework for programming foundation models by defining tasks rather than crafting prompts. It optimizes pipelines for LLMs using modular components, improving performance in tasks like reasoning and text generation. Suited for developers seeking maintainable codebases.
Semantic Kernel
A Microsoft-developed SDK for integrating LLMs into applications. It supports orchestration of AI tasks, memory management, and plugins for connecting to external tools. Used for building scalable AI agents in Python, C#, and Java.
AutoGen
A Python-based framework for creating multi-agent LLM systems. It enables agents to collaborate on tasks like data retrieval and code execution, enhancing complex workflows. Used for building autonomous AI agents and research.

🔍 Retrieval-Augmented Generation (RAG) & Semantic Search

Haystack
An open-source framework for building LLM-powered search and RAG applications. It supports semantic search, document retrieval, and question answering, with integrations for Hugging Face, OpenAI, and vector stores like Pinecone. Used for enterprise search systems.
Chroma
An open-source embedding database optimized for managing and searching vector embeddings. Commonly used for semantic search and RAG pipelines with LangChain or LlamaIndex.
Jina
A scalable cloud-native framework for multimodal search and neural semantic retrieval. Supports building RAG pipelines with images, text, and more.
Qdrant
An open-source vector search engine for storing and querying embeddings at scale. Built for semantic search, recommendation engines, and RAG applications.

🚀 Model Serving & Deployment

Ollama
A lightweight framework for running LLMs locally. It provides a simple API and supports models like Llama 3 and Gemma, enabling developers to build and test AI applications on personal hardware. Perfect for local AI development and prototyping.
OpenLLM
Run any open-source LLMs (Llama 3.3, Qwen2.5, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command.
vLLM
An open-source library designed to serve LLMs efficiently and at scale, especially for inference. Uses PagedAttention to optimize memory usage, batching, and throughput.
Text Generation Inference (TGI)
Hugging Face’s optimized inference server for deploying large Transformer models with low latency and high throughput.
FastChat
A powerful open-source framework to serve and chat with LLMs interactively. Includes a web UI, REST API, and support for various model families like Vicuna and LLaMA.

⚙️ ML Workflow Automation & Management

MLflow
An open-source platform for managing the machine learning lifecycle, including LLMs and VLMs. It supports experiment tracking, model versioning, and deployment, with integrations for LangChain, LlamaIndex, and DSPy. Ideal for reproducible AI workflows.
n8n
An open-source, low-code workflow automation platform. It integrates LLMs with external tools and APIs to automate tasks like data processing or chatbot responses. Used for building scalable AI-driven workflows with minimal coding.
Flowise
An open-source, low-code platform for building LLM applications. It features a drag-and-drop interface and integrates with LangChain and LlamaIndex, making it accessible for non-coders to create chatbots and RAG systems.

🧑‍🔧 Fine-Tuning & Training Optimization

Hugging Face Transformers
A comprehensive library for training, fine-tuning, and deploying LLMs and VLMs. It supports models like BERT, GPT, and CLIP, with tools for NLP, computer vision, and multimodal tasks. Used for research and production-grade AI applications.
PEFT (Parameter-Efficient Fine-Tuning)
A library for efficient fine-tuning of large models using techniques like LoRA, prompt tuning, and adapters. Ideal for customizing LLMs on limited hardware.
bitsandbytes
A lightweight CUDA extension for quantization and low-bit inference/training of LLMs. Enables memory-efficient training of large models.
LMFlow
A framework for easy and fast fine-tuning, instruction tuning, and deployment of LLMs. Includes support for model compression and evaluation.

✅ Evaluation, Testing & Benchmarking

DeepEval
A testing framework for evaluating LLM applications. It offers over 14 research-backed metrics to assess RAG pipelines and safety risks, integrating with frameworks like LangChain and LlamaIndex. Used for quality assurance in AI development.
PromptTools
A Python library for debugging, comparing, and evaluating LLM prompts with visualizations and logging support.
AlpacaEval
A community-driven evaluation toolkit for benchmarking LLMs' instruction-following ability using standardized prompts.
OpenCompass
A comprehensive open-source framework for large-scale benchmarking of LLMs and VLMs using curated datasets and metrics.
OpenCompass
A comprehensive open-source framework for large-scale benchmarking of LLMs and VLMs using curated datasets and metrics.
VLMEvalKit 🖼️
Open-source evaluation toolkit of large vision-language models (LVLMs). It enables one-command evaluation of LVLMs on various benchmarks (support 220+ LMMs, 80+ benchmarks).

📊 Interactive UI & Demos

Gradio
An intuitive Python library for creating interactive web interfaces for ML models. Popular for prototyping and demonstrating LLM/VLM applications.
Open WebUI
An open-source web interface for interacting with local and hosted LLMs. Supports multiple backends and provides a sleek, extensible UI.

🎨 Multimodal & Vision-Language Models

OpenMMLab Multimoda-GPTl
Based on the open-source multi-modal model OpenFlamingo create various visual instruction data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue.
OpenMMLab MMagic
Multimodal Advanced, Generative, and Intelligent Creation (MMagic).

🧠 Interpretability & Analysis

Transformer Lens
A library for visualizing and interpreting transformer internals. Helps researchers understand model behavior neuron-by-neuron.

🖼️ Vision‑Language Model (VLM) Zoo

GLM-4.1V-9B-Thinking, 🤗 HF - MIT License 🚀
Open-source VLM from THUDM, excelling in multimodal reasoning with support for 64k context, 4K image processing, and bilingual (English/Chinese) capabilities. It outperforms many models of similar size and rivals larger models like Qwen2.5-VL-72B on 18/28 benchmarks, including STEM and long document understanding.
Qwen 2.5 VL (7B / 72B)
Multimodal VLM from Alibaba with dynamic resolution, video input, object localization and support for ~29 languages. Top open‑source performer in OCR and agentic workflows.
Gemma 3 (4B–27B)
Google’s open multimodal model with SigLIP image encoder, excels in multilingual captioning and VQA; strong 128k context performance.
PaliGemma
Compact Gᴇᴍᴍᴀ‑2 B‑based VLM combining SigLIP visual encoder with strong captioning, segmentation, and VQA transferability.
Llama 3.2 Vision (11B/90B)
Vision‑adapted Llama model with excellent OCR, document understanding, VQA, and 128k token context.
Phi‑4 Multimodal
Microsoft’s VLM supporting vision‑language tasks with MIT license and edge‑friendly capabilities.
DeepSeek‑VL
Open‑source VLM optimized for scientific reasoning and compact deployment.
CogVLM
Strong-performing model in VQA and vision-centric tasks.
BakLLaVA
LAION‑Ontocord-Skunkworks OSS AI group LMM combining Mistral 7B with LLaVA architecture for efficient VQA pipelines.

📄 OCR Model Zoo

OCRFlux
OCRFlux is a multimodal large language model based toolkit for converting PDFs and images into clean, readable, plain Markdown text. A 3B parameter model that can run on a single NVIDIA 3090 GPU, making it accessible for local deployment.
Llama-3.1-Nemotron-Nano-VL-8B-V1, 🤗 HF
Llama-Nemotron-Nano-VL-8B-V1 (by NVIDIA) is a leading document intelligence vision language model (VLMs) that enables the ability to query and summarize images and video from the physical or virtual world.
Qwen 2.5 VL (32B / 72B)
State‑of‑the‑art open OCR performance (~75% accuracy), outperforms even Mistral‑OCR; excels in document, video, and multilingual text extraction.
Mistral‑OCR
Purpose‑trained OCR variant of Mistral, delivering ~72.2% accuracy on structured document benchmarks.
Llama 3.2 Vision (11B / 90B)
Strong OCR and document understanding capabilities, part of the top open VLMs.
Gemma 3 27B
Offers competitive OCR performance through its vision‑language architecture.
DeepSeek‑v3‑03‑24
Lightweight, open‑source OCR-ready VLM evaluated in 2025 benchmarks.
TextHawk 2
Bilingual OCR and grounding VLM showing state‑of‑the‑art across OCRBench, DocVQA, ChartQA, with 16× fewer tokens.
VISTA‑OCR
New lightweight generative OCR model unifying detection and recognition with only 150M params; interactive and high‑accuracy.
PP‑DocBee
Multimodal document understanding model with superior performance on English/Chinese benchmarks.

📄 Medical LLMs, VLMs and MLLMs (multimodal)

Model	Year	Company/Org	Used for
MedGemma	2025	Google DeepMind	OCR, image captioning, general vision NLP (3B–28B parameters)
MedSigLIP	2025	Google DeepMind	Scalable multimodal medical reasoning
Med‑Gemini	2024	Google DeepMind	Multimodal medical applications
LLaVA-Med	2024	Microsoft	Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities
CONCH	2024	Mahmood Lab + Harvard Medical School	Vision-Language Pathology Foundation Model - Nature Medicine
BioMistral‑7B	2024	CNRS + Mistral	Medical-domain fine-tuned LLM on PubMed (7B parameters)
BioMedLM 2.7B	2024	Stanford CRFM+MosaicML	Medical-domain trained exclusively on biomedical abstracts and papers from The Pile
Med‑PaLM M	2022-2023	Google Research	Multimodal medical Q&A with image and text input

Papers 📑 ⭐

No.	Title	Authors	Journal Name	Year
1	Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models	Yukang Yang, et al.	arXiv:2502.20332	2025
2	Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models	James Chua, et al.	arXiv:2506.13206	2025
3	Emergent Response Planning in Large Language Models	Zhichen Dong, et al.	arXiv:2502.06258	2025
4	Emergent Abilities in Large Language Models: A Survey	Leonardo Berti, et al.	arXiv:2503.05788	2025
5	LIMO: Less Is More for Reasoning	Yixin Ye, et al.	arXiv:2502.03387	2025
6	An Introduction to Vision-Language Modeling	Florian Bordes, et al.	arXiv:2405.17247	2024
7	What Matters When Building Vision-Language Models?	Hugo Laurençon, et al.	arXiv:2405.02246	2024
8	Building and better understanding vision-language models: insights and future directions	Hugo Laurençon, et al.	arXiv:2408.12637	2024
9	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Zhiyu Wu, et al.	arXiv:2412.10302	2024
10	Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	Peng Wang, et al.	arXiv:2409.12191	2024
11	PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter	Junfei Xiao, et al.	arXiv:2402.10896	2024
12	Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving	Akshay Gopalkrishnan, et al.	arXiv:2403.19838	2024

Conferences & Papers 🥇 📑 ⭐

CVPR - IEEE / CVF Computer Vision and Pattern Recognition Conference
- 2025 Accepted Papers
- 2025 Posters
NeurIPS - Conference on Neural Information Processing Systems
ICLR - International Conference on Learning Representations
- 2025 Accepted Papers
- 2025 Posters
ACL - Association for Computational Linguistics
- 2025 Accepted Papers

If you need support with your AI project or if you're simply AI and new technology enthusiast, don't hesitate to connect with me on LinkedIn 👍

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-LLM-VLM-Foundation-Models 🚀⭐⭐⭐

Foundation Models Leaderboards (2025)

Frameworks and Tools for LLMs, VLMs, and Foundation Models (2025)

🛠 Application Development & Prompt Engineering Frameworks

🔍 Retrieval-Augmented Generation (RAG) & Semantic Search

🚀 Model Serving & Deployment

⚙️ ML Workflow Automation & Management

🧑‍🔧 Fine-Tuning & Training Optimization

✅ Evaluation, Testing & Benchmarking

📊 Interactive UI & Demos

🎨 Multimodal & Vision-Language Models

🧠 Interpretability & Analysis

🖼️ Vision‑Language Model (VLM) Zoo

📄 OCR Model Zoo

📄 Medical LLMs, VLMs and MLLMs (multimodal)

Papers 📑 ⭐

Conferences & Papers 🥇 📑 ⭐

About

Uh oh!

Releases

Packages

srebroa/Awesome-LLM-VLM-Foundation-Models

Folders and files

Latest commit

History

Repository files navigation

Awesome-LLM-VLM-Foundation-Models 🚀⭐⭐⭐

Foundation Models Leaderboards (2025)

Frameworks and Tools for LLMs, VLMs, and Foundation Models (2025)

🛠 Application Development & Prompt Engineering Frameworks

🔍 Retrieval-Augmented Generation (RAG) & Semantic Search

🚀 Model Serving & Deployment

⚙️ ML Workflow Automation & Management

🧑‍🔧 Fine-Tuning & Training Optimization

✅ Evaluation, Testing & Benchmarking

📊 Interactive UI & Demos

🎨 Multimodal & Vision-Language Models

🧠 Interpretability & Analysis

🖼️ Vision‑Language Model (VLM) Zoo

📄 OCR Model Zoo

📄 Medical LLMs, VLMs and MLLMs (multimodal)

Papers 📑 ⭐

Conferences & Papers 🥇 📑 ⭐

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages