[FDConfig] Add is_bidirectional property to ModelConfig for EB5 models#7427
[FDConfig] Add is_bidirectional property to ModelConfig for EB5 models#7427kevincheng2 wants to merge 2 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
812f4e7 to
1dd607d
Compare
## Motivation EB5 (ERNIE5) 系列模型使用双向 mask(非因果)注意力机制,但之前代码中对该特性的判断是通过 `ErnieArchitectures.is_ernie5_arch` 硬编码散落在各处,缺乏统一抽象,不便于扩展新的双向 mask 模型。 ## Modifications - 在 `ModelConfig` 中新增 `is_bidirectional` property,统一判断模型是否使用双向 mask 注意力: - 优先读取模型 `config.json` 中的 `is_causal` 字段(通用扩展点) - 否则通过 `ErnieArchitectures.is_ernie5_arch` 判断是否为 EB5 模型 - 在 `_post_init` 末尾显式设置 `self.causal = not self.is_bidirectional`, 确保所有 attention backend 的 `getattr(model_config, "causal", True)` 读取正确 - 将 `FDConfig` 中 `disable_chunked_mm_input` 的硬编码判断替换为 `model_config.is_bidirectional` ## Usage or Command ```python # EB5 模型 assert model_config.is_bidirectional == True assert model_config.causal == False # 普通生成模型 assert model_config.is_bidirectional == False assert model_config.causal == True ```
Remove the hardcoded `max_extra_num_batched_tokens=16384` from SchedulerConfig and replace its only usage in ProposerInputBatch with the existing `get_max_chunk_tokens()` call, which already accounts for multimodal token budgets dynamically. Changes: - Delete `SchedulerConfig.max_extra_num_batched_tokens` - Add `max_chunk_tokens` to `ProposerInputBatch.__init__` via `fd_config.get_max_chunk_tokens(model_config.mm_max_tokens_per_item)` - Use `self.max_chunk_tokens` when allocating `target_hidden_states` instead of `max_num_batched_tokens + max_extra_num_batched_tokens` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1bb4cb8 to
021ebe9
Compare
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-16 12:26 CST
📋 Review 摘要
PR 概述:为 EB5 模型添加 is_bidirectional 配置属性,并为多模态 processor 实现 get_mm_max_tokens_per_item 接口
变更范围:fastdeploy/config.py、fastdeploy/scheduler/config.py、fastdeploy/worker/input_batch.py、多模态 processor、测试
影响面 Tag:[FDConfig] [Scheduler] [DataProcessor]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | tests/scheduler/test_scheduler_config.py |
测试代码使用了已删除的 max_extra_num_batched_tokens 属性 |
总体评价
本次 PR 设计合理,is_bidirectional property 为双向 mask 模型提供了统一的抽象,优先读取模型配置中的 is_causal 字段,再通过 ERNIE5 架构判断,具有良好的扩展性。DataProcessor 的 get_mm_max_tokens_per_item 实现与 vllm 对齐,ERNIE4.5-VL 的视频 token 计算修复正确。
但测试代码未同步更新,存在阻塞性问题。
| @@ -269,7 +269,6 @@ def __init__(self, args): | |||
| """ | |||
There was a problem hiding this comment.
删除了 max_extra_num_batched_tokens 属性,但测试代码 tests/scheduler/test_scheduler_config.py 仍然使用该属性(第 138、157 行),这会导致测试失败。
请同步更新测试代码,删除或替换相关断言。
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7427 +/- ##
==========================================
Coverage ? 72.29%
==========================================
Files ? 394
Lines ? 54835
Branches ? 8591
==========================================
Hits ? 39643
Misses ? 12417
Partials ? 2775
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
[FDConfig] EB5 (ERNIE5) 系列模型使用双向 mask(非因果)注意力机制,之前对该特性的判断通过
ErnieArchitectures.is_ernie5_arch硬编码散落在各处,缺乏统一抽象,不便于扩展。[DataProcessor]
qwen_vl_processor、qwen3_vl_processor、ernie4_5_vl_processor缺少get_mm_max_tokens_per_item接口实现,导致上层模块(scheduler、engine)无法在不运行 dummy input 的情况下预估多模态编码预算。Modifications
FDConfig
ModelConfig中新增is_bidirectionalproperty,统一判断模型是否使用双向 mask 注意力:config.json中的is_causal字段(通用扩展点)ErnieArchitectures.is_ernie5_arch判断是否为 EB5 模型(Ernie5ForCausalLM、Ernie5MoeForCausalLM、Ernie5MoEForRewardModel)_post_init末尾显式设置self.causal = not self.is_bidirectional,确保所有 attention backend 的getattr(model_config, "causal", True)读取正确FDConfig中disable_chunked_mm_input的硬编码判断替换为model_config.is_bidirectionalDataProcessor
qwen_vl_processor/process.py:实现get_mm_max_tokens_per_item,对齐 vllm 的get_image_size_with_most_features算法,支持 image / video 两种模态qwen3_vl_processor/process.py:同上,新增get_max_image_tokens/get_max_video_tokens/get_mm_max_tokens_per_item,video token 数考虑 temporal paddingernie4_5_vl_processor/process.py:修复get_mm_max_tokens_per_item中视频 token 数计算缺少max_frames乘数的问题qwen3_vl_processor/qwen3_vl_processor.py、qwen_vl_processor/qwen_vl_processor.py:在 Processor 层暴露get_mm_max_tokens_per_item,委托给内部 DataProcessorUsage or Command
Checklist
pre-commitbefore commit.