A curated list of prompt/adapter learning methods for vision-language models (e.g., CLIP, ALIGN).
- If you know that some papers published in top conferences (CVPR, ICCV, ECCV, ICML, NeurlPS, ICLR) or journals (TPAMI, IJCV, TIP) have not been included in this list, please feel free to contact me at any time, either by sending an email (zhengli97[at]qq.com) or submitting an issue.
- We would appreciate more people joining us in maintaining this list of papers.
- Note that papers without open-source code are not recommended.
- We sincerely thank the following people for contributing to this list: Lingfeng Yang, Ge Wu, Jiazuo Yu, Qiji Ma[List]
Use text-based prompts/adapters.
Use image-based prompts/adapters.
Use text- and image-based prompts/adapters.
- A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models. [Paper]
- Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey. [Paper]
CLIPLearning Transferable Visual Models From Natural Language Supervision. ICML 2021.
[Paper] [Code]ALIGNScaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. ICML 2021.
[Paper]LiTLiT: Zero-Shot Transfer with Locked-image text Tuning. CVPR 2022.
[Paper] [Code]EVA-CLIPEVA-CLIP: Improved Training Techniques for CLIP at Scale. 2023.
[Paper] [Code]SigLIPSigmoid Loss for Language Image Pre-Training. ICCV 2023.
[Paper] [Code]AlphaCLIPAlpha-CLIP: A CLIP Model Focusing on Wherever You Want. CVPR 2024.
[Paper] [Code]CLIP-KDCLIP-KD: An Empirical Study of CLIP Model Distillation. CVPR 2024.
[Paper] [Code] [论文解读]LongCLIPLong-CLIP: Unlocking the Long-Text Capability of CLIP. ECCV 2024.
[Paper] [Code]
Base-to-Novel: ImageNet-1K, Caltech101, Oxford Pets, StanfordCars, Flowers102, Food101, FGVC Aircraft, SUN397, DTD, EuroSAT, UCF101.
Domain Generalization: ImageNet-V2, ImageNet-Sketch, ImageNet-Adversarial, ImageNet-Rendition.
Due to various factors, the links to some datasets may be outdated or invalid.
To make it easy for you to download these datasets, we maintain a repository on HuggingFace, which contains all the datasets to be used (except ImageNet). Each dataset also includes the corresponding split_zhou_xx.json file.
[Huggingface_Dataset_Download_Link]
Base-to-Novel Generalization. (ViT-B/16 CLIP)
| Methods | Paper | Pub | Base | Novel | HM (main) | Code | Type |
|---|---|---|---|---|---|---|---|
| CLIP | Link | ICML 21 | 69.34 | 74.22 | 71.70 | Link | Model |
| CoOp | Link | IJCV 22 | 82.69 | 63.22 | 71.66 | Link | - |
| ATPrompt | Link | ICCV 25 | 82.68 | 68.04 | 74.65 | Link | - |
| ATPrompt+PromptKD | - | - | 87.05 | 81.82 | 84.35 | - | Plugin |
| CoCoOp | Link | CVPR 22 | 80.47 | 71.69 | 75.83 | Link | - |
| DPC | Link | CVPR 25 | 85.15 | 68.84 | 76.13 | Link | - |
| DPC+PromptKD | - | - | 87.55 | 80.55 | 83.91 | - | Plugin |
| ProDA | Link | CVPR 22 | 81.56 | 72.30 | 76.65 | Link | - |
| TextRefiner | Link | AAAI 25 | 79.74 | 74.32 | 76.94 | Link | - |
| TextRefiner+PromptKD | - | - | 85.22 | 79.64 | 82.33 | - | Plugin |
| KgCoOp | Link | CVPR 23 | 80.73 | 73.60 | 77.00 | Link | - |
| RPO | Link | ICCV 23 | 81.13 | 75.00 | 77.78 | Link | - |
| DePT | Link | CVPR 24 | 83.80 | 72.89 | 77.97 | Link | - |
| DePT+PromptSRC | - | - | 85.19 | 76.17 | 80.43 | - | Plugin |
| MaPLe | Link | CVPR 23 | 82.28 | 75.14 | 78.55 | Link | - |
| QNet | Link | ICLR 24 | 83.32 | 75.65 | 79.30 | Link | - |
| CasPL | Link | ECCV 24 | 84.78 | 74.49 | 79.30 | Link | - |
| CasPL+PromptSRC | - | - | 86.11 | 79.54 | 82.69 | - | Plugin |
| TCP | Link | CVPR 24 | 84.13 | 75.36 | 79.51 | Link | - |
| MMA | Link | CVPR 24 | 83.20 | 76.80 | 79.87 | Link | - |
| PromptSRC | Link | ICCV 23 | 84.26 | 76.10 | 79.97 | Link | - |
| 2SFS | Link | CVPR 25 | 85.55 | 75.48 | 80.20 | Link | - |
| HPT | Link | AAAI 24 | 84.32 | 76.86 | 80.23 | Link | - |
| CoPrompt | Link | ICLR 24 | 84.00 | 77.23 | 80.48 | Link | - |
| SkipT | Link | CVPR 25 | 85.04 | 77.53 | 81.11 | Link | - |
| MMRL | Link | CVPR 25 | 85.68 | 77.16 | 81.20 | Link | - |
| LLaMP | Link | CVPR 24 | 85.16 | 77.71 | 81.27 | Link | - |
| PromptKD | Link | CVPR 24 | 86.96 | 80.73 | 83.73 | Link | - |
Table 1. Average results on 11 datasets. (Only works with open-source code will be listed.)
CoOpLearning to Prompt for Vision-Language Models. IJCV 2022.
[Paper] [Code]CoCoOpConditional Prompt Learning for Vision-Language Models. CVPR 2022.
[Paper] [Code]ProDAPrompt Distribution Learning. CVPR 2022.
[Paper] [Code]VPTVisual Prompt Tuning. ECCV 2022.
[Paper] [Code]VPExploring Visual Prompts for Adapting Large-Scale Models. Arxiv 2022.
[Paper] [Code]
MaPLeMaPLe: Multi-modal Prompt Learning. CVPR 2023.
[Paper] [Code]KgCoOpVisual-Language Prompt Tuningx with Knowledge-guided Context Optimization. CVPR 2023.
[Paper] [Code]LASPLASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models. CVPR 2023.
[Paper] [Code]DAM-VPDiversity-Aware Meta Visual Prompting. CVPR 2023.
[Paper] [Code]TaskResTask Residual for Tuning Vision-Language Models. CVPR 2023.
[Paper] [Code]RPORead-only Prompt Optimization for Vision-Language Few-shot Learning. ICCV 2023.
[Paper] [Code]KAPTKnowledge-Aware Prompt Tuning for Generalizable Vision-Language Models. ICCV 2023.
[Paper] [Code Not Found]CuPLWhat does a platypus look like? Generating customized prompts for zero-shot image classification. ICCV 2023.
[Paper] [Code]ProGradPrompt-aligned Gradient for Prompt Tuning. ICCV 2023.
[Paper][Code]PromptSRCSelf-regulating Prompts: Foundational Model Adaptation without Forgetting. ICCV 2023.
[Paper] [Code]LFABlack Box Few-Shot Adaptation for Vision-Language models. ICCV 2023.
[Paper] [Code]DeFoLearning to Decompose Visual Features with Latent Textual Prompts. ICLR 2023.
[Paper] [Code Not Found]PLOTPLOT: Prompt Learning with Optimal Transport for Vision-Language Models. ICLR 2023.
[Paper] [Code]POMPPrompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition. NeurIPS 2023.
[Paper] [Code]
MetaPromptLearning Domain Invariant Prompt for Vision-Language Models. TIP 2024.
[Paper] [Code Not Found]ProVPProgressive Visual Prompt Learning with Contrastive Feature Re-formation. IJCV 2024.
[Paper] [Code]CoPLCoPL: Contextual Prompt Learning for Vision-Language Understanding. AAAI 2024.
[Paper] [Code Not Found]SA2VPSA2VP: Spatially Aligned-and-Adapted Visual Prompt. AAAI 2024.
[Paper] [Code]HPTLearning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models. AAAI 2024.
[Paper] [Code]LaViPLaViP: Language-Grounded Visual Prompts. AAAI 2024.
[Paper] [Code Not Found]CoPromptConsistency-guided Prompt Learning for Vision-Language Models. ICLR 2024.
[Paper] [Code]PromptKDPromptKD: Unsupervised Prompt Distillation for Vision Language Models. CVPR 2024.
[Paper] [Code] [中文版] [论文解读] [视频解读]DePTDePT: Decoupled Prompt Tuning. CVPR 2024.
[Paper] [Code]ArGueArGue: Attribute-Guided Prompt Tuning for Vision-Language Models. CVPR 2024.
[Paper] [Code Not Found]TCPTCP: Textual-based Class-aware Prompt tuning for Visual-Language Model. CVPR 2024.
[Paper] [Code]MMAMMA: Multi-Modal Adapter for Vision-Language Models. CVPR 2024.
[Paper] [Code]LLaMPLarge Language Models are Good Prompt Learners for Low-Shot Image Classification. CVPR 24.
[Paper] [Code]KDPLImproving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation. ECCV 2024.
[Paper] [Code]CoCoLeConceptual Codebook Learning for Vision-Language Models. ECCV 2024.
[Paper] [Code Not Found]CasPLCascade Prompt Learning for Vision-Language Model Adaptation. ECCV 2024.
[Paper] [Code] [论文解读]GalLoPGalLoP: Learning Global and Local Prompts for Vision-Language Models. ECCV 2024.
[Paper] [Code Not Found]AWTAWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation. NeurIPS 2024.
[Paper] [Code]QNetPrompt Learning with Quaternion Networks. ICLR 2024.
[Paper] [Code(Empty)]QMaPLeQuantized Prompt for Efficient Generalization of Vision-Language Models. ECCV 2024.
[Paper] [Code(Empty)]
TextRefinerTextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning. AAAI 2025.
[Paper] [Code] [论文解读]ProTextLearning to Prompt with Text Only Supervision for Vision-Language Models. AAAI 2025.
[Paper] [Code]FATEFATE: Feature-Adapted Parameter Tuning for Vision-Language Models. AAAI 2025.
[Paper] [Code Not Found]TAPTree of Attributes Prompt Learning For Vision Language Models. ICLR 2025.
[Paper] [Code]DeKgDivergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning. ICLR 2025.
[Paper] [Code]MMRLMMRL: Multi-Modal Representation Learning for Vision-Language Models. CVPR 2025.
[Paper] [Code]DPCDPC: Dual-Prompt Collaboration for Tuning Vision-Language Models. CVPR 2025.
[Paper] [Code] [论文解读]2SFSRethinking Few-Shot Adaptation of Vision-Language Models in Two Stages. CVPR 2025.
[Paper] [Code]SkipTSkip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves. CVPR 2025.
[Paper] [Code]NLPromptNLPrompt: Noise-Label Prompt Learning for Vision-Language Models. CVPR 2025.
[Paper] [Code]TACTask-Aware Clustering for Prompting Vision-Language Models. CVPR 2025.
[Paper] [Code]OpenworldAUCOpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning. ICML 2025.
[Paper] [Code]FMEnhancing Target-unspecific Tasks through a Features Matrix. ICML 2025.
[Paper] [Code Not Found]SurPLSurrogate Prompt Learning: Towards Efficient and Diverse Prompt Learning for Vision-Language Models. ICML 2025.
[Paper] [Code]ATPromptAdvancing Textual Prompt Learning with Anchored Attributes. ICCV 2025.
[Paper] [Code] [论文解读] [中文版]HicroPLHierarchical Cross-modal Prompt Learning for Vision-Language Models. ICCV 2025.
[Paper] [Code]CaPLCausality-guided Prompt Learning for Vision-language Models via Visual Granulation. ICCV 2025.
[Paper] [Code(Empty)]LwEIBLearning with Enriched Inductive Biases for Vision-Language Models IJCV 2025.
[Paper] [Code]BIPBi-modality Individual-aware Prompt tuning for Visual-Language Model. TPAMI 2025.
[Paper] [Code]DAPTDecouple before Align: Visual Disentanglement Enhances Prompt Tuning. TPAMI 2025.
[Paper] [Code]SpotlighterSpotlighter: Revisiting Prompt Tuning from a Representative Mining View. EMNLP 2025 Findings.
[Paper] [Code]VaMPVaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models. NeurIPS 2025.
[Paper] [Code Not Found]KAIDKAID: Knowledge-Aware Interactive Distillation for Vision-Language Models. ACM MM 2025.
[Paper] [Code Not Found] [论文解读]
CPTCPT: Colorful Prompt Tuning for pre-trained vision-language models. Arxiv 2021.
[Paper] [Code]DetProLearning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model. CVPR 2022.
[Paper] [Code]PromptDetPromptDet: Towards Open-vocabulary Detection using Uncurated Images. ECCV 2022.
[Paper] [Code]- Visual Prompting via Image Inpainting. NeurIPS 2022.
[Paper] OVSegOpen-Vocabulary Semantic Segmentation with Mask-adapted CLIP. CVPR 2023.
[Paper] [Code]LoGoPromptLoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models. ICCV 2023.
[Paper]RedCircleWhat does CLIP know about a red circle? Visual prompt engineering for VLMs. ICCV 2023.
[Paper]]FGVPFine-Grained Visual Prompting. NeurIPS 2023.
[Paper] [Code]SoMSet-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. Arxiv 2023.
[Paper] [Code]Alpha-CLIPAlpha-CLIP: A CLIP Model Focusing on Wherever You Want. CVPR 2024.
[Paper] [Code]ViP-LLaVAMaking Large Multimodal Models Understand Arbitrary Visual Prompts. CVPR 2024.
[Paper] [Code]SSCSegment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation. ECCV 2024.
[Paper] [Code]
| Methods | Pub | ImageNet | -A | -V2 | -R | -S | Avg. (main) | Code |
|---|---|---|---|---|---|---|---|---|
| CoOp | IJCV 22 | 71.51 | 49.71 | 64.20 | 75.21 | 47.99 | 59.28 | Link |
| CoCoOp | CVPR 22 | 71.02 | 50.63 | 64.07 | 76.18 | 48.75 | 59.91 | Link |
| DiffTPT | ICCV 23 | 70.30 | 55.68 | 65.10 | 75.00 | 46.80 | 60.65 | Link |
| TPT | NeurIPS 22 | 68.98 | 54.77 | 63.45 | 77.06 | 47.94 | 60.81 | Link |
| TPT+CoOp | NeurIPS 22 | 73.61 | 57.95 | 66.83 | 77.27 | 49.29 | 62.84 | Link |
| PromptAlign | NeurIPS 23 | --- | 59.37 | 65.29 | 79.33 | 59.37 | 63.55 | Link |
| TPS+CoOp | Arxiv 24 | 73.73 | 60.49 | 66.84 | 77.44 | 49.08 | 65.52 | Link |
| RLCF | ICLR 24 | 73.23 | 65.45 | 69.77 | 83.35 | 54.74 | 68.33 | Link |
| RLCF+CoOp | ICLR 24 | 76.05 | 69.74 | 70.62 | 84.51 | 56.49 | 70.34 | Link |
| COSMIC | CVPR 25 | 78.19 | 73.32 | 69.62 | 85.60 | 62.79 | 72.83 | Link |
Table 2. Test-time prompt tuning methods on OOD data.
TPTTest-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models. NeurIPS 2022.
[Paper] [Code]SwapPromptSwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models. NeurIPS 2023.
[Paper]PrompAlignAlign Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization. NeurIPS 2023.
[Paper] [Code]TPSJust Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models. Arxiv 2024.
[Paper] [Code]RLCFTest-time Adaptation with CLIP reward for zero-shot generalization in Vision-Language Models. ICLR 2024.
[Paper] [Code]InTTAInvariant Test-Time Adaptation for Vision-Language Model Generalization. Arxiv 2024.
[Paper] [Code]TDAEfficient Test-Time Adaptation of Vision-Language Models. CVPR 2024.
[Paper] [Code]DMNDual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models. CVPR 2024.
[Paper] [Code]C-TPTC-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion. ICLR 2024.
[Paper] [Code]DynaPromptDynaPrompt: Dynamic Test-Time Prompt Tuning. ICLR 2025.
[Paper]R-TPTR-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning. CVPR 2025.
[Paper] [Code]StatARealistic Test-Time Adaptation of Vision-Language Models. CVPR 2025.
[Paper] [Code]O-TPTO-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models. CVPR 2025.
[Paper] [Code]COSMICCOSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation. CVPR 2025.
[Paper] [Code]- Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models. ICCV 2025. [Paper] [Code(https://github.com/CenturyChen/ICCV25-MCP)]
CLIP-AdapterCLIP-Adapter: Better Vision-Language Models with Feature Adapters. Arxiv 2021.
[Paper] [Code]Tip-AdapterTip-Adapter: Training-free Adaption of CLIP for Few-shot Classification. ECCV 2022.
[Paper] [Code]APENot All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement. ICCV 2023.
[Paper] [Code]CaFoPrompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners. CVPR 2023.
[Paper] [Code]Meta-AdapterMeta-Adapter: An Online Few-shot Learner for Vision-Language Model. NeurIPS 2023.
[Paper] [Code]
ActionCLIPActionclip: A new paradigm for video action recognition. arxiv 21.
[Paper] [Code]VideoPromptPrompting visual-language models for efficient video understanding. ECCV 2022.
[Paper] [Code]InTTAExpanding Language-Image Pretrained Models for General Video Recognition. ECCV 2022.
[Paper] [Code]ReProCompositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection. ICLR 2023.
[Paper] [Code]Vita-CLIPVita-CLIP: Video and text adaptive CLIP via Multimodal Prompting. CVPR 2023.
[Paper] [Code]ViFi-CLIPFine-tuned CLIP Models are Efficient Video Learners. CVPR 2023.
[Paper] [Code]OpenVCLIPOpen-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization. ICML 2023.
[Paper] [Code]M2-CLIPM2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition. AAAI 2024.
[Paper] [Code]ViLT-CLIPViLT-CLIP: Video and Language Tuning CLIP with Multimodal Prompt Learning and Scenario-Guided Optimization. AAAI 2024.
[Paper] [Code(None)]FROSTERFROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition. ICLR 2024.
[Paper] [Code]
BT-AdapterBT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning. CVPR 2024.
[Paper] [Code]
L2PLearning to Prompt for Continual Learning. CVPR 2022.
[Paper] [Code]DualPromptDualPrompt: Complementary Prompting for Rehearsal-free Continual Learning. ECCV 2022.
[Paper] [Code]EvoPromptEvolving Parameterized Prompt Memory for Continual Learning. AAAI 2024.
[Paper]CPromptConsistent Prompting for Rehearsal-Free Continual Learning. CVPR 2024.
[Paper] [Code]DIKIMind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models. ECCV 2024.
[Paper] [Code]
MoE-Adapters4CLBoosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters. CVPR 2024.
[Paper] [Code]SSIATSemantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer. CVPR 2024.
[Paper] [Code]
LoCoOpLoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning. NeurIPS 2023.
[Paper] [Code]DeCoOpDeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection. ICML 2024.
[Paper] [Code]
IDPTInstance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models. ICCV 2023. [Paper] [Code]PPTParameter-efficient Prompt Learning for 3D Point Cloud Understanding. ICRA 2024.
[Paper] [Code]Point-PRCPoint-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis. NeurIPS 2024.
[Paper] [Code]
BiomedCoOpBiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models. CVPR 2025.
[Paper] [Code]
PPLThink Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation. CVPR 2025.
[Paper]
CLIP4clipClip4clip: An empirical study of clip for end to end video clip retrieval and captioning. Neurocomputing 2022.
[Paper] [Code]VoPVoP: Text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval. CVPR 2023.
[Paper] [Code]DGLDGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval. AAAI 2024.
[Paper] [Code]