Release v2.1.1 · PaddlePaddle/FastDeploy

文档

新增多机张量并行部署文档
文心系列模型最佳实践文档更新到最新用法
更新CUDA Graph使用说明

新增功能

返回结果新增completion_tokens与prompt_tokens，支持返回原始输入与模型原始输出文本
completion接口支持echo参数

Bug修复

修复V1 KVCache调度下LogProb无法返回问题
修复chat_template_kwargs参数无法生效问题
修复混合架构部署下的EP并行问题
修复completion接口返回结果中输出Token计数错误问题
修复logprobs返回结果聚合问题

What's Changed

[Docs] Add Multinode deployment document by @ltd0924 in #3416
[docs] cherry-pick update docs by @zoooo0820 in #3422
[Docs]update installation readme by @yongqiangma in #3435
[Docs] release 2.1 by @ming1753 in #3441
[Docs]Updata docs of graph opt backend by @gongshaotian in #3443
[Feature] Support logprob in scheduler v1 for release/2.1 by @rainyfly in #3446
[Bugfix]fix config bug in dynamic_weight_manager by @gzy19990617 in #3432
[Feature] Pass through the chat_template_kwargs to the data processing module by @luukunn in #3469
[CI] fix run_ci error in release/2.1 by @EmmonsCurse in #3499
[BugFix] fix ep real_bsz by @lizexu123 in #3396
[Feature] add prompt_tokens and completion_tokens by @memoryCoderC in #3505
[fix] setting disable_chat_template while passing prompt_token_ids led to response error by @liyonghua0910 in #3511
[Excutor] Fixed the issue of CUDA graph execution failure caused by d… by @gongshaotian in #3512
[Feature] add tool parser by @luukunn in #3518
[BUGFIX] fix ep mixed bug by @ltd0924 in #3513
[BugFix] Api server bugs by @ltd0924 in #3530
[Feature] Support limit thinking len for text models by @K11OntheBoat in #3527
[Bug Fix] Close get think_end_id for XPU for now. by @K11OntheBoat in #3563
[Feature] Support mixed deployment with yiyan adapter by @rainyfly in #3533
[Cherry-Pick] Launch expert_service before kv_cache initialization in worker_process by @zeroRains in #3558
【BugFix】completion接口echo回显支持 by @AuferGachet in #3477
[fix] fix completion stream api output_tokens not in usage by @liyonghua0910 in #3588
[fix] fix ZmqIpcClient.close() error by @liyonghua0910 in #3600
[Bugfix] Correct logprobs aggregation for multiple prompts in /completions endpoint by @sunlei1024 in #3620
[BugFix] ep mixed mode offline exit failed by @ltd0924 in #3623
【Bugfix】修复2.1分支上0.3B模型性能大幅下降 by @AuferGachet in #3624
[CI] add cleanup logic in release/2.1 workflows by @EmmonsCurse in #3655
[BugFix] fix parameter is 0 by @ltd0924 in #3663
[fix] qwen output inconsistency when top_p=0 (#3634) by @liyonghua0910 in #3662
Revert "[BugFix] fix parameter is 0" by @Jiang-Jia-Jun in #3681
[feat] add metrics for yiyan adapter by @liyonghua0910 in #3615
[bugfix]PR3663 parameter is 0 by @ltd0924 in #3679
[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER. by @lizexu123 in #3670
Revert "[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER." by @Jiang-Jia-Jun in #3719
[Cherry-Pick] fix the bug when num_key_value_heads < tensor_parallel_size by @zeroRains in #3722
[Optimize] Increase zmq buffer size to prevent apiserver too slowly t… by @gongshaotian in #3728
[Fix] Do not drop result when request result slowly by @rainyfly in #3704
[Bug fix] Fix prefix cache in v1 by @rainyfly in #3710
[Bug fix] Fix mix deployment perf with yiyan adapter in release21 by @rainyfly in #3703

Full Changelog: v2.1.0...v2.1.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.1.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

文档

新增功能

Bug修复

What's Changed

Contributors

Uh oh!