Multimodal large language models (MLLMs) have shown remarkable performance in vision-language tasks. However, existing MLLMs are primarily trained on generic datasets, limiting their ability to reason on domain-specific visual cues such as those in facial images. In particular, tasks that require detailed understanding of facial structure, expression, emotion, and demographic features remain underexplored by MLLMs due to the lack of large-scale annotated face image-text datasets. In this work, we introduce FaceLLM, a multimodal large language model trained specifically for facial image understanding. Our experiments demonstrate that FaceLLM achieves the state-of-the-art performance of MLLMs on various face-centric tasks. Project page: https://www.idiap.ch/paper/facellm
We use LLaMA-Factory, FaceXBench, VLMEvalKit repositories in our implementation. You can create facellm
conda environment and install the required dependencies using the following commands:
conda create --name facellm python=3.11
# Install LLaMA-Factory
git clone https://github.com/hiyouga/LLaMA-Factory
cd LLaMA-Factory
conda activate facellm
cd LLaMA-Factory
pip install -r requirements.txt
pip install -e ".[torch,metrics]"
# Install facexbench and VLMEvalKit
cd ..
git clone https://github.com/Kartik-3004/facexbench
git clone https://github.com/open-compass/VLMEvalKit.git
cp facexbench/evaluate.py VLMEvalKit/
cp facexbench/aggregate_results.py VLMEvalKit/
cd VLMEvalKit
pip install -r requirements.txt
pip install -e .
pip install flash-attn --no-build-isolation
We use FairFaceGPT to train FaceLLM. We provide insutructions to download and preprocess the FairFaceGPT dataset.
FairFaceGPT is a dataset of question-answer pairs generated from the val set of the FairFace dataset using ChatGPT. It is designed to enhance the understanding of facial images in multimodal large language models (MLLMs). You can generate the dataset with OpenAI API using the following command:
python fairfacegpt.py
NOTE: You need to have OpenAI API installed: pip install openai
. Also, you need to set your OpenAI API key in the fairfacegpt.py
file.
The dataset is available at project page. We use dataset.json
file to train FaceLLM.
You need to download the FairFaceGPT dataset and use the provided dataset.json
file. Then, you need to update data/dataset_info.json
file in the LLaMA-Factory repository to include the FairFaceGPT dataset. The dataset_info.json
file in the LLaMA-Factory repository should look like the following:
{
// Add FairFaceGPT dataset info here
"fairfacegpt_dataset": {
"formatting": "sharegpt",
"file_name": "<path_to_fairfacegpt_dataset_json_file>",
"columns": {
"messages": "conversations",
"image": "image"
},
"split": "train",
"tags": {
"role_tag": "from",
"content_tag": "value",
"user_tag": "human",
"assistant_tag": "assistant"
},
"image_root": "<path_to_fairface_image_directory>"
},
// Other datasets
...
}
You can also use the data/dataset_info.json
file provided in this repository and replace it with data/dataset_info.json
file in the LLaMA-Factory repository.
cp data/dataset_info.json LLaMA-Factory/data/dataset_info.json
We use FairFace/fairface_img_margin025
for FairFace image directory.
We use LLaMA-Factory to train FaceLLM. The training script is provided in train.sh
file. You can run the training script using the following command:
bash train.sh
After training, you can export the model using the llamafactory-cli export
command.
This will create a directory with the exported model files that can be used for inference or evaluation.
# export models
llamafactory-cli export \
--model_name_or_path OpenGVLab/InternVL3-38B-hf \
--template intern_vl \
--adapter_name_or_path ./saves/lora/InternVL3-38B-hf/checkpoint-7303 \
--export_dir ./saves/export/FaceLLM-38B \
--export_size 1 # Optional: Split model into shards of 1GB each
You can also run the export.sh
script provided in this repository to export the model.
You can run FaceLLM using the llamafactory-cli webui
command. This will start a web interface where you can interact with the model. You can also run the inference.py
script to generate responses from the model:
python inference.py --path_image <path_to_face_image> --prompt "<your_prompt>"
We use FaceXBench for evaluation of FaceLLM on various face understanding tasks. FaceXBench also uses VLMEvalKit to evaluate MLLMs. Threrefore, we need to integrate FaceLLM into VLMEvalKit and then run FaceXBench.
You need to create a new model configuration file for FaceLLM in the VLMEvalKit repository. To this end you need to make the following changes:
- First, you need to copy the
vlmeval/vlm/facellm.py
file from the current repository to thevlmeval/vlm/
directory in the VLMEvalKit repository. This file provides the model definition and execution script for the FaceLLM and is used by VLMEvalKit to load the model for evaluation.
cp vlmeval/vlm/facellm.py VLMEvalKit/vlmeval/vlm/
- Second, you need to update
vlmeval/config.py
file in the VLMEvalKit repository to include the FaceLLM model. You can replace thevlmeval/config.py
file with the one provided in this repository invlmeval/config.py
.
cp vlmeval/config.py VLMEvalKit/vlmeval/config.py
After integrating FaceLLM into VLMEvalKit, you can run the evaluation script provided in the VLMEvalKit
repository. The evaluation script will run FaceXBench on the FaceLLM model and generate the evaluation results.
You can run the evaluation script using the following command:
python VLMEvalKit/evaluate.py --model "FaceLLM-38B"
After running the evaluation script, you will get the evaluation results as JSON files in the facexbench/results/
directory. You can also run the following script to get accuracies for different tasks:
python VLMEvalKit/aggregate_results.py --model FaceLLM-38B --results_dir facexbench/results/FaceLLM-38B
This will generate a results.txt
file in the facexbench/results/FaceLLM-38B
directory, which contains the accuracies for different tasks and subtasks.
If you use this code, FaceLLM models, or the FairFaceGPT dataset, please cite our paper:
@article{facellm2025,
author = {Hatef Otroshi Shahreza and S{\'e}bastien Marcel},
title = {FaceLLM: A Multimodal Large Language Model for Face Understanding},
journal = {arXiv preprint arXiv:2507.10300},
year = {2025}
}