This is the source code for the NeurIPS 2024 paper "Perplexity-aware Correction for Robust Alignment with Noisy Preferences", Keyi Kong*(SDU), Xilie Xu* (NUS), Di Wang (KAUST), Jingfeng Zhang (University of Auckland/RIKEN-AIP), Mohan Kankanhalli (NUS).
PerpCorrect corrects noisy preferences using PPLDiff, which is calculated through an iteratively trained surrogate LLM.
This code mainly uses Huggingface's trl library. You can use the following script to configure the environment.
pip install -r requirements.txt# preprocess preferences dataset first
python src/preprocessing.py
# supervised fine-tune
bash bash/sft.shFor DPO series experiments, you can use following script.
bash bash/dpo.shFor PPO series experiments, you can use following script.
bash bash/ppo.shIf you need rDPO experiments, you need to modify the trl library as follows:
if self.loss_type == "sigmoid":
# cDPO
if self.label_smoothing >= 0:
losses = (
- F.logsigmoid(self.beta * logits) * (1 - self.label_smoothing)
- F.logsigmoid(-self.beta * logits) * self.label_smoothing
)
# rDPO
else :
losses = (
- F.logsigmoid(self.beta * logits) * (1 + self.label_smoothing)
- F.logsigmoid(-self.beta * logits) * self.label_smoothing
) / (1 + 2 * self.label_smoothing)The project is built upon trl .
@inproceedings{
kong2024perplexityaware,
title={Perplexity-aware Correction for Robust Alignment with Noisy Preferences},
author={Keyi Kong and Xilie Xu and Di Wang and Jingfeng Zhang and Mohan Kankanhalli},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=OUXnnPJzXJ}
}Please drop an e-mail to luxinyayaya@mail.sdu.edu.cn if you have any enquiry.
