Skip to content

[BugFix][APIServer] Support max_completion_tokens in CompletionRequest for OpenAI API compat#7459

Open
ZhijunLStudio wants to merge 1 commit intoPaddlePaddle:developfrom
ZhijunLStudio:fix/issue-2697-max-completion-tokens
Open

[BugFix][APIServer] Support max_completion_tokens in CompletionRequest for OpenAI API compat#7459
ZhijunLStudio wants to merge 1 commit intoPaddlePaddle:developfrom
ZhijunLStudio:fix/issue-2697-max-completion-tokens

Conversation

@ZhijunLStudio
Copy link
Copy Markdown

@ZhijunLStudio ZhijunLStudio commented Apr 17, 2026

Fix #2697: OpenAI deprecated max_tokens in favor of max_completion_tokens for both chat and completion endpoints. ChatCompletionRequest already supports max_completion_tokens (protocol.py line 688), but CompletionRequest was missing the field entirely. This causes OpenAI SDK clients that use max_completion_tokens on /v1/completions to either ignore the parameter or fail silently.

Motivation

The OpenAI API specification requires max_completion_tokens as the preferred parameter for controlling output length on both /v1/chat/completions and /v1/completions endpoints. FastDeploy's /v1/completions endpoint only accepts max_tokens, breaking compatibility with clients that follow the current OpenAI API spec (e.g., official OpenAI Python SDK v1.x+).

Related issues: #2697, #2815, #2816 (previous fixes for max_completion_tokens in finish_reason logic were merged to release/2.0.2 but the CompletionRequest field was never added to develop).

Modifications

  • fastdeploy/entrypoints/openai/protocol.py:
    • Added max_completion_tokens: Optional[int] = None field to CompletionRequest
    • Marked max_tokens as deprecated with Field(deprecated=...), consistent with ChatCompletionRequest
    • Added priority logic in to_dict_for_infer(): max_completion_tokens if max_completion_tokens is not None else max_tokens (handles edge case where max_completion_tokens=0 should not fall back to max_tokens)
    • Applied same is not None fix to ChatCompletionRequest.to_dict_for_infer() for consistency
  • tests/entrypoints/test_completion_max_completion_tokens.py: Added 12 unit tests covering:
    • Field existence and default values
    • Backward compatibility with max_tokens
    • max_completion_tokens priority in to_dict_for_infer()
    • Edge case: max_completion_tokens=0 does not fall back to max_tokens
    • Source code verification

Usage or Command

# Before: max_completion_tokens was silently ignored on /v1/completions
import requests
requests.post("http://localhost:8000/v1/completions", json={
    "model": "default",
    "prompt": "Hello",
    "max_completion_tokens": 100  # This was ignored
})

# After: max_completion_tokens is properly handled
# It maps to internal max_tokens, taking priority over max_tokens if both set

Run tests:

python -m pytest tests/entrypoints/test_completion_max_completion_tokens.py -v

Accuracy Tests

N/A — this is an API parameter change that does not affect model inference logic.

Checklist

  • Add at least a tag in the PR title: [BugFix][APIServer]
  • Format your code, run pre-commit before commit.
  • Add unit tests (12 tests, all passing).
  • Provide accuracy results. (N/A for this fix — does not affect model inference.)
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag. (N/A — submitting to develop.)

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 17, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 17, 2026
@luotao1 luotao1 self-assigned this Apr 17, 2026
@luotao1 luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Apr 17, 2026
@ZhijunLStudio ZhijunLStudio force-pushed the fix/issue-2697-max-completion-tokens branch from f66cdf4 to 3d7bb7b Compare April 17, 2026 04:33
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

…PI compat

Fix PaddlePaddle#2697: ChatCompletionRequest already supports max_completion_tokens
but CompletionRequest was missing the field. Add max_completion_tokens
with deprecated max_tokens annotation, consistent with ChatCompletionRequest.
Ensure max_completion_tokens takes priority in to_dict_for_infer().
@ZhijunLStudio ZhijunLStudio force-pushed the fix/issue-2697-max-completion-tokens branch from 3d7bb7b to 80a5d92 Compare April 17, 2026 05:39
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-17 13:43 CST

📋 Review 摘要

PR 概述:为 CompletionRequest 添加 max_completion_tokens 字段支持,与 OpenAI API 规范保持一致,同时修复 ChatCompletionRequestmax_completion_tokens=0 被错误回退的问题。
变更范围entrypoints/openai/protocol.py、新增测试文件
影响面 TagAPIServer

问题

级别 文件 概述
🟡 建议 tests/entrypoints/test_completion_max_completion_tokens.py:40 测试使用本地 mock 类而非导入生产代码,无法真正验证生产代码行为

总体评价

变更逻辑正确且与 ChatCompletionRequest 保持一致。CompletionRequest.to_dict_for_infer() 中优先级逻辑放在 self.dict() 循环之后,确保能正确覆盖;ChatCompletionRequestoris not None 的修复也是正确的(处理 max_completion_tokens=0 的边界情况)。主要建议是测试应直接导入生产类以提高测试可靠性。

# ---------------------------------------------------------------------------


class CompletionRequest(BaseModel):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 测试使用了本地定义的 CompletionRequest 而非导入生产代码的类

当前测试通过在测试文件内重新定义一个最小化的 CompletionRequest 类来验证逻辑,这意味着即使生产代码存在 bug,测试也可能通过。而 TestSourceCodeVerification 使用字符串匹配来验证源码,较为脆弱且无法验证运行时行为。

建议直接导入生产类进行测试:

from fastdeploy.entrypoints.openai.protocol import CompletionRequest

这样可以直接验证生产代码的行为,且当生产代码发生变更时测试能及时感知到。同时可以移除 TestSourceCodeVerification 类,因为直接导入生产类后字符串匹配验证就不再必要了。

@ZhijunLStudio
Copy link
Copy Markdown
Author

Reopening to re-trigger CI checks after updating PR body.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@91b8bf2). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7459   +/-   ##
==========================================
  Coverage           ?   73.86%           
==========================================
  Files              ?      398           
  Lines              ?    54977           
  Branches           ?     8613           
==========================================
  Hits               ?    40608           
  Misses             ?    11653           
  Partials           ?     2716           
Flag Coverage Δ
GPU 73.86% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers HappyOpenSource 快乐开源活动issue与PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add Support for max_completion_tokens Parameter (OpenAI API Deprecation)

4 participants