Generalizing to Unseen Speakers: Multimodal Emotion Recognition in Conversations with Speaker Generalization
Link for the paper: Generalizing to Unseen Speakers: Multimodal Emotion Recognition in Conversations with Speaker Generalization, TAFFC 2025.
- Python 3.8.5
- torch 1.7.1
- CUDA 11.3
- torch-geometric 1.7.2
The raw data can be found at IEMOCAP and MELD.
In our paper, we use pre-extracted features. The multimodal features (including RoBERTa-based and GloVe-based textual features) are available at here.
Download MELD and set to path "./MELD/raw-texts" to obtain speaker information.
The implementation results may vary with training machines and random seeds. We suggest that one can try different random seeds for better results.
To train on IEMOCAP:
bash train_IEMOCAP.sh
To train on MELD:
bash train_MELD.sh
If you found this code useful, please cite the following paper:
@ARTICLE{10981602,
author={Tu, Geng and Jing, Ran and Liang, Bin and Yu, Yue and Yang, Min and Qin, Bing and Xu, Ruifeng},
journal={IEEE Transactions on Affective Computing},
title={Generalizing to Unseen Speakers: Multimodal Emotion Recognition in Conversations With Speaker Generalization},
year={2025},
volume={},
number={},
pages={1-12},
keywords={Uncertainty;Prototypes;Emotion recognition;Context modeling;Electronic mail;Visualization;Training;Oral communication;Graph neural networks;Data models;Contrastive learning;multimodal conversational emotion recognition;prototype graph;speaker generalization},
doi={10.1109/TAFFC.2025.3566059}
}