This repository consolidates my research and academic projects into a single place.
A curated and regularly updated paper list on the roles and mechanisms of SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning) in LLM reasoning training. Covers topics including:
- Comparison of mechanisms between RLVR and SFT
- The entropy mechanism in RLVR
- GRPO in RLVR: flaws and corrections
- Exploration-exploitation optimization in GRPO
- Hybrid SFT-RL training
A personal academic portfolio website built with Jekyll and GitHub Pages. Contains publication records, talks, teaching materials, and other academic information.

