Fine-tune a language model with reinforcement learning on an arithmetic task

🚧 Repository under development. 🚧

This is the official implementation of the paper Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning.

Installation

Please install the required packages provided in requirements.txt

Reproduce experiments

Prerequisite: download the pre-trained models

To run our RL fine-tuning experiments, you first need to download the pre-trained models. All models are available on HuggingFace, in https://huggingface.co/lecraquito. To download any pre-trained model, you can run these commands from the root of the repo:

git config --global credential.helper store
module load git-lfs
git lfs install
git clone https://huggingface.co/lecraquito/gpt2_reduced_vocab_FT_9digits_20k

Comparison of varying levels of pre-training (section 5.2)

Please run the following command from the root of the repo:

python -m src.rl_compare_pretrain

Influence of the prioritized KL divergence (section 5.3)

Please run the following command from the root of the repo:

python -m src.rl_train

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ChatGPT		ChatGPT
dataset		dataset
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-tune a language model with reinforcement learning on an arithmetic task

Installation

Reproduce experiments

Prerequisite: download the pre-trained models

Comparison of varying levels of pre-training (section 5.2)

Influence of the prioritized KL divergence (section 5.3)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jvasso/llm-rl-arithmetic

Folders and files

Latest commit

History

Repository files navigation

Fine-tune a language model with reinforcement learning on an arithmetic task

Installation

Reproduce experiments

Prerequisite: download the pre-trained models

Comparison of varying levels of pre-training (section 5.2)

Influence of the prioritized KL divergence (section 5.3)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages