Prompt-affinity-Multi-modal-Class-Centroids-for-Unsupervised-Domain-Adaption

Highlights

Abstract: In recent years, the advancements in large vision-language models (VLMs) like CLIP have sparked a renewed interest in leveraging the prompt learning mechanism to preserve semantic consistency between source and target domains in unsupervised domain adaption (UDA). While these approaches show promising results, they encounter fundamental limitations when quantifying the similarity between source and target domain data, primarily stemming from the redundant and modality-missing class centroids. To address these limitations, we propose Prompt-affinity Multi-modal Class Centroids for UDA (termed as PMCC). Firstly, we fuse the text class centroids (directly generated from the text encoder of CLIP with manual prompts for each class) and image class centroids (generated from the image encoder of CLIP for each class based on source domain images) to yield the multi-modal class centroids. Secondly, we conduct the cross-attention operation between each source or target domain image and these multi-modal class centroids. In this way, these class centroids that contain rich semantic information of each class will serve as a bridge to effectively measure the semantic similarity between different domains. Finally, we design a logit bias head and employ a multi-modal prompt learning mechanism to accurately predict the true class of each image for both source and target domains. We conduct extensive experiments on 4 popular UDA datasets including Office-31, Office-Home, VisDA-2017, and DomainNet. The experimental results validate our PMCC achieves higher performance with lower model complexity than the state-of-the-art (SOTA) UDA methods. The code of this project is available at GitHub: https://github.com/246dxw/PMCC.

Main Contributions

New perspective： To the best of our knowledge, this is the first attempt to leverage both the visual and textual semantic information of each class from VLMs to preserve the semantic consistency between source and target domains in UDA.
Novel method： We introduce a novel UDA method PMCC by constructing multi-modal class centroids that serve as a semantic bridge to effectively measure the similarity between data across domains with the multi-modal prompt learning mechanism.
High Performance： Extensive experiments on 4 popular UDA datasets (Office-31, Office-Home, VisDA-2017 and DomainNet) demonstrate that PMCC achieves higher performance with lower model complexity than the state-of-the-art UDA methods. Take Office-Home for example, PMCC yields higher average accuracy but its learnable parameters, training time and inference time only account for 8.8%, 73.6%, and 54.5% of the strongest baseline, respectively.

Results

PMCC in comparison with existing prompt tuning methods

Results reported below show accuracy across 3 UDA datasets with ViT-B/16 backbone. Our PMCC method adopts the paradigm of multi-modal prompt tuning.

Name	Office-Home Acc.	Office-31 Acc.	VisDA-2017 Acc.
CLIP	82.1	77.5	88.9
CoOp	83.9	89.4	82.7
CoCoOp	84.1	88.9	84.2
VPT-deep	83.9	89.4	86.2
MaPLe	84.2	89.6	83.5
DAPL	84.4	81.2	89.5
PDA	85.7	91.2	89.7
PMCC(Ours)	86.0	91.8	90.1

Method (DomainNet Acc.)	-> c	-> i	-> p	-> q	-> r	-> s	Avg
ViT	43.6	42.4	39.1	19.5	39.9	44.4	38.2
CLIP	70.1	46.4	61.7	13.7	82.9	62.6	56.2
DAMP (RN101)	69.7	51.0	67.5	14.7	82.5	61.5	57.8
PDA	74.3	49.6	69.9	14.9	84.4	66.0	59.9
DAPrompt	73.4	50.3	69.8	14.7	84.8	65.6	59.8
PMCC	74.7	49.8	70.1	15.4	84.2	66.5	60.1

Installation

For installation and other package requirements, please follow the instructions as follows. This codebase is tested on Ubuntu 18.04 LTS with python 3.7. Follow the below steps to create environment and install dependencies.

Setup conda environment.

# Create a conda environment
conda create -y -n pmcc python=3.7

# Activate the environment
conda activate pmcc

# Install torch (requires version >= 1.8.1) and torchvision
# Please refer to https://pytorch.org/get-started/previous-versions/ if your cuda version is different
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch

Install dassl library.

# Instructions borrowed from https://github.com/KaiyangZhou/Dassl.pytorch#installation

# Clone this repo
git clone https://github.com/KaiyangZhou/Dassl.pytorch.git
cd Dassl.pytorch

# Install dependencies
pip install -r requirements.txt

# Install this library (no need to re-build if the source code is modified)
python setup.py develop
cd ..

Clone PMCC code repository and install requirements.

# Clone PMCC code base
git clone https://github.com/246dxw/PMCC.git
cd PMCC

# Install requirements
pip install -r requirements.txt

Data preparation

Please follow the instructions as follows to prepare all datasets. Datasets list:

Training and Evaluation

Please follow the instructions for training, evaluating and reproducing the results. Firstly, you need to modify the directory of data by yourself.

Training

# Example: trains on Office-Home dataset, and the source domian is art and the target domain is clipart (a-c)
bash scripts/pmcc/main_pmcc.sh officehome b32_ep10_officehome PMCC ViT-B/16 2 a-c 0

Evaluation

# evaluates on Office-Home dataset, and the source domian is art and the target domain is clipart (a-c)
bash scripts/pmcc/eval_pmcc.sh officehome b32_ep10_officehome PMCC ViT-B/16 2 a-c 0

The details are at each method folder in [scripts folder](PMCC/scripts at main · 246dxw/PMCC (github.com)).

Acknowledgements

Our style of reademe refers to PDA. And our code is based on CoOp and CoCoOp, DAPL ，MaPLe and PDA etc. repository. We thank the authors for releasing their code. If you use their model and code, please consider citing these works as well. Supported methods are as follows:

Method	Paper	Code
CoOp	IJCV 2022	link
CoCoOp	CVPR 2022	link
VPT	ECCV 2022	link
IVLP & MaPLe	CVPR 2023	link
DAPL	TNNLS 2023	link
PDA	AAAI 2024	link

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
clip		clip
configs		configs
dassl		dassl
datasets		datasets
scripts		scripts
trainers		trainers
tsne		tsne
utils		utils
.gitignore		.gitignore
Architecture.png		Architecture.png
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt-affinity-Multi-modal-Class-Centroids-for-Unsupervised-Domain-Adaption

Highlights

Main Contributions

Results

PMCC in comparison with existing prompt tuning methods

Installation

Data preparation

Training and Evaluation

Training

Evaluation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prompt-affinity-Multi-modal-Class-Centroids-for-Unsupervised-Domain-Adaption

Highlights

Main Contributions

Results

PMCC in comparison with existing prompt tuning methods

Installation

Data preparation

Training and Evaluation

Training

Evaluation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages