v2.1.1
文档
- 新增多机张量并行部署文档
- 文心系列模型最佳实践文档更新到最新用法
- 更新CUDA Graph使用说明
新增功能
- 返回结果新增
completion_tokens与prompt_tokens,支持返回原始输入与模型原始输出文本 - completion接口支持
echo参数
Bug修复
- 修复V1 KVCache调度下LogProb无法返回问题
- 修复
chat_template_kwargs参数无法生效问题 - 修复混合架构部署下的EP并行问题
- 修复completion接口返回结果中输出Token计数错误问题
- 修复logprobs返回结果聚合问题
What's Changed
- [Docs] Add Multinode deployment document by @ltd0924 in #3416
- [docs] cherry-pick update docs by @zoooo0820 in #3422
- [Docs]update installation readme by @yongqiangma in #3435
- [Docs] release 2.1 by @ming1753 in #3441
- [Docs]Updata docs of graph opt backend by @gongshaotian in #3443
- [Feature] Support logprob in scheduler v1 for release/2.1 by @rainyfly in #3446
- [Bugfix]fix config bug in dynamic_weight_manager by @gzy19990617 in #3432
- [Feature] Pass through the chat_template_kwargs to the data processing module by @luukunn in #3469
- [CI] fix run_ci error in release/2.1 by @EmmonsCurse in #3499
- [BugFix] fix ep real_bsz by @lizexu123 in #3396
- [Feature] add prompt_tokens and completion_tokens by @memoryCoderC in #3505
- [fix] setting disable_chat_template while passing prompt_token_ids led to response error by @liyonghua0910 in #3511
- [Excutor] Fixed the issue of CUDA graph execution failure caused by d… by @gongshaotian in #3512
- [Feature] add tool parser by @luukunn in #3518
- [BUGFIX] fix ep mixed bug by @ltd0924 in #3513
- [BugFix] Api server bugs by @ltd0924 in #3530
- [Feature] Support limit thinking len for text models by @K11OntheBoat in #3527
- [Bug Fix] Close get think_end_id for XPU for now. by @K11OntheBoat in #3563
- [Feature] Support mixed deployment with yiyan adapter by @rainyfly in #3533
- [Cherry-Pick] Launch expert_service before kv_cache initialization in worker_process by @zeroRains in #3558
- 【BugFix】completion接口echo回显支持 by @AuferGachet in #3477
- [fix] fix completion stream api output_tokens not in usage by @liyonghua0910 in #3588
- [fix] fix ZmqIpcClient.close() error by @liyonghua0910 in #3600
- [Bugfix] Correct logprobs aggregation for multiple prompts in /completions endpoint by @sunlei1024 in #3620
- [BugFix] ep mixed mode offline exit failed by @ltd0924 in #3623
- 【Bugfix】修复2.1分支上0.3B模型性能大幅下降 by @AuferGachet in #3624
- [CI] add cleanup logic in release/2.1 workflows by @EmmonsCurse in #3655
- [BugFix] fix parameter is 0 by @ltd0924 in #3663
- [fix] qwen output inconsistency when top_p=0 (#3634) by @liyonghua0910 in #3662
- Revert "[BugFix] fix parameter is 0" by @Jiang-Jia-Jun in #3681
- [feat] add metrics for yiyan adapter by @liyonghua0910 in #3615
- [bugfix]PR3663 parameter is 0 by @ltd0924 in #3679
- [BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER. by @lizexu123 in #3670
- Revert "[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER." by @Jiang-Jia-Jun in #3719
- [Cherry-Pick] fix the bug when num_key_value_heads < tensor_parallel_size by @zeroRains in #3722
- [Optimize] Increase zmq buffer size to prevent apiserver too slowly t… by @gongshaotian in #3728
- [Fix] Do not drop result when request result slowly by @rainyfly in #3704
- [Bug fix] Fix prefix cache in v1 by @rainyfly in #3710
- [Bug fix] Fix mix deployment perf with yiyan adapter in release21 by @rainyfly in #3703
Full Changelog: v2.1.0...v2.1.1