Skip to content

[FDConfig] Add is_bidirectional property to ModelConfig for EB5 models#7427

Open
kevincheng2 wants to merge 2 commits intoPaddlePaddle:developfrom
kevincheng2:feature/bidirectional-mask-config
Open

[FDConfig] Add is_bidirectional property to ModelConfig for EB5 models#7427
kevincheng2 wants to merge 2 commits intoPaddlePaddle:developfrom
kevincheng2:feature/bidirectional-mask-config

Conversation

@kevincheng2
Copy link
Copy Markdown
Collaborator

@kevincheng2 kevincheng2 commented Apr 16, 2026

Motivation

  1. [FDConfig] EB5 (ERNIE5) 系列模型使用双向 mask(非因果)注意力机制,之前对该特性的判断通过 ErnieArchitectures.is_ernie5_arch 硬编码散落在各处,缺乏统一抽象,不便于扩展。

  2. [DataProcessor] qwen_vl_processorqwen3_vl_processorernie4_5_vl_processor 缺少 get_mm_max_tokens_per_item 接口实现,导致上层模块(scheduler、engine)无法在不运行 dummy input 的情况下预估多模态编码预算。

Modifications

FDConfig

  • ModelConfig 中新增 is_bidirectional property,统一判断模型是否使用双向 mask 注意力:
    • 优先读取模型 config.json 中的 is_causal 字段(通用扩展点)
    • 否则通过 ErnieArchitectures.is_ernie5_arch 判断是否为 EB5 模型(Ernie5ForCausalLMErnie5MoeForCausalLMErnie5MoEForRewardModel
  • _post_init 末尾显式设置 self.causal = not self.is_bidirectional,确保所有 attention backend 的 getattr(model_config, "causal", True) 读取正确
  • FDConfigdisable_chunked_mm_input 的硬编码判断替换为 model_config.is_bidirectional

DataProcessor

  • qwen_vl_processor/process.py:实现 get_mm_max_tokens_per_item,对齐 vllm 的 get_image_size_with_most_features 算法,支持 image / video 两种模态
  • qwen3_vl_processor/process.py:同上,新增 get_max_image_tokens / get_max_video_tokens / get_mm_max_tokens_per_item,video token 数考虑 temporal padding
  • ernie4_5_vl_processor/process.py:修复 get_mm_max_tokens_per_item 中视频 token 数计算缺少 max_frames 乘数的问题
  • qwen3_vl_processor/qwen3_vl_processor.pyqwen_vl_processor/qwen_vl_processor.py:在 Processor 层暴露 get_mm_max_tokens_per_item,委托给内部 DataProcessor

Usage or Command

# FDConfig: EB5 模型
assert model_config.is_bidirectional == True
assert model_config.causal == False

# FDConfig: 普通生成模型(Qwen、LLaMA 等)
assert model_config.is_bidirectional == False
assert model_config.causal == True

# DataProcessor: 获取多模态最大 token 数
processor.get_mm_max_tokens_per_item(seq_len=4096)
# => {"image": 1280, "video": 8192}

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 16, 2026

Thanks for your contribution!

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 16, 2026

CLA assistant check
All committers have signed the CLA.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

kevincheng2 and others added 2 commits April 16, 2026 12:07
## Motivation

EB5 (ERNIE5) 系列模型使用双向 mask(非因果)注意力机制,但之前代码中对该特性的判断是通过
`ErnieArchitectures.is_ernie5_arch` 硬编码散落在各处,缺乏统一抽象,不便于扩展新的双向 mask 模型。

## Modifications

- 在 `ModelConfig` 中新增 `is_bidirectional` property,统一判断模型是否使用双向 mask 注意力:
  - 优先读取模型 `config.json` 中的 `is_causal` 字段(通用扩展点)
  - 否则通过 `ErnieArchitectures.is_ernie5_arch` 判断是否为 EB5 模型
- 在 `_post_init` 末尾显式设置 `self.causal = not self.is_bidirectional`,
  确保所有 attention backend 的 `getattr(model_config, "causal", True)` 读取正确
- 将 `FDConfig` 中 `disable_chunked_mm_input` 的硬编码判断替换为 `model_config.is_bidirectional`

## Usage or Command

```python
# EB5 模型
assert model_config.is_bidirectional == True
assert model_config.causal == False

# 普通生成模型
assert model_config.is_bidirectional == False
assert model_config.causal == True
```
Remove the hardcoded `max_extra_num_batched_tokens=16384` from
SchedulerConfig and replace its only usage in ProposerInputBatch with
the existing `get_max_chunk_tokens()` call, which already accounts for
multimodal token budgets dynamically.

Changes:
- Delete `SchedulerConfig.max_extra_num_batched_tokens`
- Add `max_chunk_tokens` to `ProposerInputBatch.__init__` via
  `fd_config.get_max_chunk_tokens(model_config.mm_max_tokens_per_item)`
- Use `self.max_chunk_tokens` when allocating `target_hidden_states`
  instead of `max_num_batched_tokens + max_extra_num_batched_tokens`

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kevincheng2 kevincheng2 force-pushed the feature/bidirectional-mask-config branch from 1bb4cb8 to 021ebe9 Compare April 16, 2026 04:07
PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-16 12:26 CST

📋 Review 摘要

PR 概述:为 EB5 模型添加 is_bidirectional 配置属性,并为多模态 processor 实现 get_mm_max_tokens_per_item 接口
变更范围fastdeploy/config.pyfastdeploy/scheduler/config.pyfastdeploy/worker/input_batch.py、多模态 processor、测试
影响面 Tag[FDConfig] [Scheduler] [DataProcessor]

问题

级别 文件 概述
🟡 建议 tests/scheduler/test_scheduler_config.py 测试代码使用了已删除的 max_extra_num_batched_tokens 属性

总体评价

本次 PR 设计合理,is_bidirectional property 为双向 mask 模型提供了统一的抽象,优先读取模型配置中的 is_causal 字段,再通过 ERNIE5 架构判断,具有良好的扩展性。DataProcessor 的 get_mm_max_tokens_per_item 实现与 vllm 对齐,ERNIE4.5-VL 的视频 token 计算修复正确。

但测试代码未同步更新,存在阻塞性问题。

@@ -269,7 +269,6 @@ def __init__(self, args):
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除了 max_extra_num_batched_tokens 属性,但测试代码 tests/scheduler/test_scheduler_config.py 仍然使用该属性(第 138、157 行),这会导致测试失败。

请同步更新测试代码,删除或替换相关断言。

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.41558% with 12 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@13b9fe7). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/input/qwen3_vl_processor/process.py 84.37% 2 Missing and 3 partials ⚠️
fastdeploy/input/qwen_vl_processor/process.py 84.37% 2 Missing and 3 partials ⚠️
fastdeploy/config.py 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7427   +/-   ##
==========================================
  Coverage           ?   72.29%           
==========================================
  Files              ?      394           
  Lines              ?    54835           
  Branches           ?     8591           
==========================================
  Hits               ?    39643           
  Misses             ?    12417           
  Partials           ?     2775           
Flag Coverage Δ
GPU 72.29% <84.41%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants