在 verl 的基础上,添加了对 Process Reward Model、LLM-as-a-Judge 和输出可视化的支持。
- Python >= 3.11.0
- PyTorch >= 2.0.0
- CUDA >= 12.4
For conda users:
conda create -n verl_plus python=3.11
conda activate verl_plusgit clone https://github.com/BiNLP/verl
cd verl
pip install -e .pip install vllm==0.8.5
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip install flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip install -r requirements.txtpython3 examples/data_preprocess/gsm8k.py --local_dir data/gsm8k
python3 examples/data_preprocess/logiqa.py --local_dir data/logiqa
ray start --head --port 6379
sh bash_scripts/*.sh
一般使用我自己的评估脚本 LogiEval
使用 lighteval:
sh bash_scripts/eval.sh
sh bash_scripts/VllmBackend/start_vllm_server.sh
PRM_PATH="/path/to/prm/model"
reward_model.worker_type = 'prm'
reward_model.worker_type = 'judge'
reward_model.worker_type = 'async_judge'