refactor(hpu_model_runner): restructure multimodal-related code #2066

Jing1Ling · 2025-10-22T06:56:39Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results

Purpose

The current multimodal model warmup logic is tightly coupled with HPUModelRunner, making it difficult to extend when adding new models.
Ideally, HPUModelRunner should only handle abstract warmup flow (e.g., wrap_multimodal_module_in_hpu(), init_multimodal_buckets(), create_dummy_inputs()) without any model-specific conditions.

This PR introduces an extensible MultimodalHandler class to unify the key steps of multimodal model warmup.
By subclassing MultimodalHandler and overriding specific methods, new multimodal models can be easily integrated.

The default implementation targets QwenVL-series models, and a separate Gemma3MultimodalHandler is provided to maintain backward compatibility with Gemma3.

Test Plan

Test Result

wenbinc-Bin · 2025-10-22T06:59:44Z

vllm/worker/hpu_model_runner.py

+            [[1, image_h, int(mm_len / image_h)]])
+        pixel_values = torch.randn(
+            image_grid_thw[0].prod(),
+            1176)  # TODO: figure out the variable name


we can find a way to change 1176 to "channel * temporal_patch_size * patch_size * patch_size" to avoid magic number.

wenbinc-Bin · 2025-10-22T07:01:28Z

vllm/worker/hpu_model_runner.py

+
+    def compute_input_embedding(self, model, **kwargs):
+        input_ids = kwargs['input_ids']
+        if model.config.model_type == 'qwen2_5_omni_thinker':


Because we refactor the code, we don't need to check it here. we can overwrite this function in qwen2_5_omni_thinker.py

refactor(hpu_model_runner): restructure multimodal-related code

84319bb

wenbinc-Bin reviewed Oct 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(hpu_model_runner): restructure multimodal-related code #2066

refactor(hpu_model_runner): restructure multimodal-related code #2066

Uh oh!

Jing1Ling commented Oct 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

wenbinc-Bin Oct 22, 2025

Uh oh!

wenbinc-Bin Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor(hpu_model_runner): restructure multimodal-related code #2066

Are you sure you want to change the base?

refactor(hpu_model_runner): restructure multimodal-related code #2066

Uh oh!

Conversation

Jing1Ling commented Oct 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

Uh oh!

wenbinc-Bin Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

wenbinc-Bin Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jing1Ling commented Oct 22, 2025 •

edited by github-actions bot

Loading