🚧 Repository under development. 🚧
This is the official implementation of the paper Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning.
Please install the required packages provided in requirements.txt
To run our RL fine-tuning experiments, you first need to download the pre-trained models.
All models are available on HuggingFace, in https://huggingface.co/lecraquito.
To download any pre-trained model, you can run these commands from the root of the repo:
git config --global credential.helper store
module load git-lfs
git lfs install
git clone https://huggingface.co/lecraquito/gpt2_reduced_vocab_FT_9digits_20kPlease run the following command from the root of the repo:
python -m src.rl_compare_pretrainPlease run the following command from the root of the repo:
python -m src.rl_train