[BugFix][APIServer] Support max_completion_tokens in CompletionRequest for OpenAI API compat#7459
Conversation
|
Thanks for your contribution! |
f66cdf4 to
3d7bb7b
Compare
…PI compat Fix PaddlePaddle#2697: ChatCompletionRequest already supports max_completion_tokens but CompletionRequest was missing the field. Add max_completion_tokens with deprecated max_tokens annotation, consistent with ChatCompletionRequest. Ensure max_completion_tokens takes priority in to_dict_for_infer().
3d7bb7b to
80a5d92
Compare
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-17 13:43 CST
📋 Review 摘要
PR 概述:为 CompletionRequest 添加 max_completion_tokens 字段支持,与 OpenAI API 规范保持一致,同时修复 ChatCompletionRequest 中 max_completion_tokens=0 被错误回退的问题。
变更范围:entrypoints/openai/protocol.py、新增测试文件
影响面 Tag:APIServer
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | tests/entrypoints/test_completion_max_completion_tokens.py:40 |
测试使用本地 mock 类而非导入生产代码,无法真正验证生产代码行为 |
总体评价
变更逻辑正确且与 ChatCompletionRequest 保持一致。CompletionRequest.to_dict_for_infer() 中优先级逻辑放在 self.dict() 循环之后,确保能正确覆盖;ChatCompletionRequest 中 or → is not None 的修复也是正确的(处理 max_completion_tokens=0 的边界情况)。主要建议是测试应直接导入生产类以提高测试可靠性。
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| class CompletionRequest(BaseModel): |
There was a problem hiding this comment.
🟡 建议 测试使用了本地定义的 CompletionRequest 而非导入生产代码的类
当前测试通过在测试文件内重新定义一个最小化的 CompletionRequest 类来验证逻辑,这意味着即使生产代码存在 bug,测试也可能通过。而 TestSourceCodeVerification 使用字符串匹配来验证源码,较为脆弱且无法验证运行时行为。
建议直接导入生产类进行测试:
from fastdeploy.entrypoints.openai.protocol import CompletionRequest这样可以直接验证生产代码的行为,且当生产代码发生变更时测试能及时感知到。同时可以移除 TestSourceCodeVerification 类,因为直接导入生产类后字符串匹配验证就不再必要了。
|
Reopening to re-trigger CI checks after updating PR body. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7459 +/- ##
==========================================
Coverage ? 73.86%
==========================================
Files ? 398
Lines ? 54977
Branches ? 8613
==========================================
Hits ? 40608
Misses ? 11653
Partials ? 2716
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Fix #2697: OpenAI deprecated
max_tokensin favor ofmax_completion_tokensfor both chat and completion endpoints.ChatCompletionRequestalready supportsmax_completion_tokens(protocol.py line 688), butCompletionRequestwas missing the field entirely. This causes OpenAI SDK clients that usemax_completion_tokenson/v1/completionsto either ignore the parameter or fail silently.Motivation
The OpenAI API specification requires
max_completion_tokensas the preferred parameter for controlling output length on both/v1/chat/completionsand/v1/completionsendpoints. FastDeploy's/v1/completionsendpoint only acceptsmax_tokens, breaking compatibility with clients that follow the current OpenAI API spec (e.g., official OpenAI Python SDK v1.x+).Related issues: #2697, #2815, #2816 (previous fixes for
max_completion_tokensinfinish_reasonlogic were merged torelease/2.0.2but theCompletionRequestfield was never added todevelop).Modifications
fastdeploy/entrypoints/openai/protocol.py:max_completion_tokens: Optional[int] = Nonefield toCompletionRequestmax_tokensas deprecated withField(deprecated=...), consistent withChatCompletionRequestto_dict_for_infer():max_completion_tokens if max_completion_tokens is not None else max_tokens(handles edge case wheremax_completion_tokens=0should not fall back tomax_tokens)is not Nonefix toChatCompletionRequest.to_dict_for_infer()for consistencytests/entrypoints/test_completion_max_completion_tokens.py: Added 12 unit tests covering:max_tokensmax_completion_tokenspriority into_dict_for_infer()max_completion_tokens=0does not fall back tomax_tokensUsage or Command
Run tests:
Accuracy Tests
N/A — this is an API parameter change that does not affect model inference logic.
Checklist
[BugFix][APIServer]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag. (N/A — submitting todevelop.)