v0.9.1

lvhan028 released this 04 Jul 10:05

· 233 commits to main since this release

db1849b

What's Changed

🚀 Features

feature: enable tool_call and reasoning_content parsing for qwen3 by @ywx217 in #3615
Support Mooncake migration backend for PD disaggregation by @Risc-lt in #3620
Support load fused moe weights by @RunningLeon in #3672
Seperate api_server and pytorch engine into different processors by @grimoire in #3627
add reward model api by @CUHKSZzxy in #3665

💥 Improvements

[ascend]import patch at initiazing time by @JackWeiw in #3662
[ascend]use custon transdata in python kernel by @JackWeiw in #3671
move import transformers in patch by @grimoire in #3660
set ray envs by @grimoire in #3643
raise ImportError when enable ep and not install dlblas by @zhaochaoxing in #3636
Reduce sampling memory usage by @lzhangzz in #3666

🐞 Bug fixes

fix dockerfile by @lvhan028 in #3657
Fix top-p only sampling with padded vocab size by @lzhangzz in #3661
fix pt engine stop & cancel by @irexyc in #3681
Fix convert bf16 to numpy by @RunningLeon in #3686
disable torch.compile in cuda graph runner by @grimoire in #3691
fix reward model api by @CUHKSZzxy in #3703

📚 Documentations

add reward model documents by @CUHKSZzxy in #3706

🌐 Other

upgrade torch and triton by @grimoire in #3677
support do_preprocess=False for chat.completions by @irexyc in #3645
[ci] change flash atten installation in pr test by @zhulinJulia24 in #3688
fix profile_throughput.py by @irexyc in #3692
fix profile_generation.py by @irexyc in #3707
update dlblas version in dockerfile by @CUHKSZzxy in #3711
bump version to v0.9.1 by @lvhan028 in #3685

New Contributors

@ywx217 made their first contribution in #3615
@Risc-lt made their first contribution in #3620

Full Changelog: v0.9.0...v0.9.1

Contributors

grimoire, ywx217, and 9 other contributors

Assets 12