Skip to content

sail-sg/feedback-conditional-policy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FCP (Feedback Conditional Policy)

This is the official repository for the paper Language Models Can Learn from Verbal Feedback Without Scalar Rewards.

A training framework that implements Feedback Conditional Policy (FCP) for aligning large language models with verbal feedback.

πŸš€ Quick Start

Prerequisites

  • verl framework
  • Set your OPENAI_API_KEY environment variable before training

πŸ‹οΈ Training

Offline FCP Training

Use LLaMA-Factory's built-in SFT training code with the SFT datasets mentioned below.

FCP Bootstrapping (Online) Training

Run the VERL training script:

./verl/recipe/fcp/run_fcp.sh

Configuration details can be found in verl/recipe/fcp/config/fcp_trainer.yaml.

πŸ“Š Datasets & Frameworks

We use different frameworks and datasets for different training stages:

Offline FCP Training

Framework: LLaMA-Factory
Datasets:

FCP Bootstrapping (Online) Training

Framework: verl
Datasets:

πŸ“– Citation

If you find this code useful, please consider citing our paper:

@article{luo2025languagemodelslearnverbal,
      title={Language Models Can Learn from Verbal Feedback Without Scalar Rewards}, 
      author={Renjie Luo and Zichen Liu and Xiangyan Liu and Chao Du and Min Lin and Wenhu Chen and Wei Lu and Tianyu Pang},
      journal={arXiv preprint arXiv:2509.22638},
      year={2025}
}

About

Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •