Skip to content

Conversation

@e1ijah1
Copy link

@e1ijah1 e1ijah1 commented Dec 11, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add support for LongCat-Image #220

Test Plan

python3 examples/offline_inference/text_to_image/text_to_image.py --model meituan-longcat/LongCat-Image/ --prompt '一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。' --width 1024 --height 1024 --num_inference_steps 50 --num_images_per_prompt 1 --output longcat_image_output.png

Test Result

longcat_image_output
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 332 to 336
def rewire_prompt(self, prompt, device):
language = get_prompt_language(prompt)
if language == 'zh':
question = SYSTEM_PROMPT_ZH + f"\n用户输入为:{prompt}\n改写后的prompt为:"
else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Expose forward/encode methods instead of nesting in init

Lines 332-456 define rewire_prompt, encode_prompt, prepare_latents, and even forward as nested functions inside __init__ without assigning them to self. As a result the class is constructed with only __init__; calls such as LongCatImagePipeline.forward(...) or self.rewire_prompt(...) from the loader will raise AttributeError, so the new pipeline cannot be executed at all. These functions need to be defined as normal methods on the class (or bound to self) so Omni can call the pipeline.

Useful? React with 👍 / 👎.

@e1ijah1 e1ijah1 changed the title [Model] Add LongCat-Image support [WIP][Model] Add LongCat-Image support Dec 12, 2025
@e1ijah1 e1ijah1 marked this pull request as draft December 12, 2025 06:24
@e1ijah1
Copy link
Author

e1ijah1 commented Dec 14, 2025

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 509 to 513
height = req.height or height or self.default_sample_size * self.vae_scale_factor
width = req.width or width or self.default_sample_size * self.vae_scale_factor
num_inference_steps = req.num_inference_steps or num_inference_steps
generator = req.generator or generator
guidance_scale = req.guidance_scale or guidance_scale

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Respect zero guidance_scale parameter

The call parameter plumbing treats falsy values as missing, so a request with guidance_scale=0.0 is replaced by the default 4.5 (guidance_scale = req.guidance_scale or guidance_scale). This means users cannot disable classifier-free guidance or run pure unconditional sampling: the pipeline will always enter the CFG path, duplicate latents, and apply guidance even when the caller explicitly asked for zero guidance (e.g., the test in tests/single_stage/test_diffusion_model.py passes guidance_scale=0.0).

Useful? React with 👍 / 👎.

Comment on lines 543 to 561
negative_prompt = '' if negative_prompt is None else negative_prompt
negative_prompt = [negative_prompt]*num_images_per_prompt
prompt = [prompt]*num_images_per_prompt

prompt_embeds, text_ids = self.encode_prompt(prompt)
negative_prompt_embeds, negative_text_ids = self.encode_prompt(negative_prompt)

# 4. Prepare latent variables
num_channels_latents = 16

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Expand prompt embeddings to match batch size

Prompt embeddings are generated only once and never duplicated to match the requested batch: prompt/negative_prompt are wrapped in a list, but encode_prompt consumes only prompts[0] and returns a single embedding. Immediately after, latents are created for batch_size * num_images_per_prompt items, so when num_outputs_per_prompt > 1 or multiple prompts are passed, the transformer receives hidden states for N samples but encoder hidden states for only 1, which will fail at the first attention block due to mismatched batch dimensions. The default test case requesting two images triggers this mismatch.

Useful? React with 👍 / 👎.

@e1ijah1 e1ijah1 changed the title [WIP][Model] Add LongCat-Image support [Model] Add LongCat-Image support Dec 14, 2025
@e1ijah1 e1ijah1 marked this pull request as ready for review December 14, 2025 13:28
@e1ijah1 e1ijah1 force-pushed the feat/longcat-image branch 2 times, most recently from 22d9a98 to 1c96e88 Compare December 14, 2025 14:09
@e1ijah1
Copy link
Author

e1ijah1 commented Dec 14, 2025

cc @ZJY0516 @hsliuustc0106 PTAL

Copy link
Collaborator

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix pre-commit and doc error

@e1ijah1 e1ijah1 requested review from ZJY0516 December 15, 2025 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants