Skip to content

Conversation

@you-and-you
Copy link
Contributor

PR Category

Inference

PR Types

User Experience

PR Description

  1. Successfully supported Hunyuan Video
image 2. Have a better performance than before image

@you-and-you you-and-you requested review from a team and zhaoyinglia as code owners November 6, 2025 02:48
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Hunyuan Video model. The changes include new configuration files for inference and modifications to the inference engine.

My review has identified a few critical issues:

  • There are debug print statements that should be removed.
  • A typo cup_offload should be cpu_offload, and this needs to be fixed in both the configuration and the code.
  • There is an unsafe dictionary access that could lead to a KeyError.
  • A hardcoded path in an example configuration file makes it non-portable.

Please address these points to improve the code quality and robustness.

class: diffusers.HunyuanVideoPipeline
from_pretrained:
torch_dtype: bfloat16
cup_offload: True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There appears to be a typo in cup_offload. This should likely be cpu_offload to match pipeline.enable_model_cpu_offload() used in flagscale/inference/inference_engine.py. Please correct this typo here and in the corresponding logic in inference_engine.py.

      cpu_offload: True

Comment on lines 339 to 344
if ("cup_offload" in kwargs["pipeline"]["from_pretrained"] and kwargs["pipeline"]["from_pretrained"]["cup_offload"]):
pipeline.enable_model_cpu_offload()
else:
device = kwargs.get("device", None)
if device:
pipeline.to(device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block has two issues:

  1. Typo: cup_offload should be cpu_offload. This is to correctly trigger pipeline.enable_model_cpu_offload().
  2. Unsafe Dictionary Access: Accessing kwargs["pipeline"]["from_pretrained"] directly is unsafe and will raise a KeyError if pipeline or from_pretrained keys are missing in the configuration. You should use .get() for safe access.

Please correct the typo and refactor for safe dictionary access.

Suggested change
if ("cup_offload" in kwargs["pipeline"]["from_pretrained"] and kwargs["pipeline"]["from_pretrained"]["cup_offload"]):
pipeline.enable_model_cpu_offload()
else:
device = kwargs.get("device", None)
if device:
pipeline.to(device)
from_pretrained_cfg = kwargs.get("pipeline", {}).get("from_pretrained", {})
if from_pretrained_cfg.get("cpu_offload"):
pipeline.enable_model_cpu_offload()
else:
device = kwargs.get("device", None)
if device:
pipeline.to(device)

runner:
hostfile: null
cmds:
before_start: source /root/miniconda3/bin/activate flagscale-inference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The path /root/miniconda3/bin/activate is hardcoded. This makes the example not portable and it will fail for users with different environment setups. Please consider using a more generic approach or add a comment explaining that this path needs to be configured by the user.

if v is not None and k not in ("model", "loader")
}

print(f"self.vconfig:{self.vconfig}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This print statement appears to be for debugging. It should be removed before merging to keep the logs clean.

Comment on lines +233 to +236
start_time = time.time()
outputs = self.model_or_pipeline(**kwargs)
gen_time = time.time() - start_time
print(f"gen_time: {gen_time:.2f}s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These lines for timing and printing the generation time appear to be for debugging. Such print statements should be removed from the final code. If performance measurement is a desired feature, it should be implemented using a proper logging framework (like the logger already imported in this file) instead of printing to stdout.

Suggested change
start_time = time.time()
outputs = self.model_or_pipeline(**kwargs)
gen_time = time.time() - start_time
print(f"gen_time: {gen_time:.2f}s")
outputs = self.model_or_pipeline(**kwargs)

@CLAassistant
Copy link

CLAassistant commented Nov 18, 2025

CLA assistant check
All committers have signed the CLA.

@legitnull legitnull changed the title Flagscale(diffusers) Support The Model Of Hunyuan Video [Diffusion] support Taylorseer on Hunyuan Video Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants