-
Notifications
You must be signed in to change notification settings - Fork 120
[Model] Add LongCat-Image support #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| def rewire_prompt(self, prompt, device): | ||
| language = get_prompt_language(prompt) | ||
| if language == 'zh': | ||
| question = SYSTEM_PROMPT_ZH + f"\n用户输入为:{prompt}\n改写后的prompt为:" | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expose forward/encode methods instead of nesting in init
Lines 332-456 define rewire_prompt, encode_prompt, prepare_latents, and even forward as nested functions inside __init__ without assigning them to self. As a result the class is constructed with only __init__; calls such as LongCatImagePipeline.forward(...) or self.rewire_prompt(...) from the loader will raise AttributeError, so the new pipeline cannot be executed at all. These functions need to be defined as normal methods on the class (or bound to self) so Omni can call the pipeline.
Useful? React with 👍 / 👎.
vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image.py
Outdated
Show resolved
Hide resolved
vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image.py
Outdated
Show resolved
Hide resolved
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| height = req.height or height or self.default_sample_size * self.vae_scale_factor | ||
| width = req.width or width or self.default_sample_size * self.vae_scale_factor | ||
| num_inference_steps = req.num_inference_steps or num_inference_steps | ||
| generator = req.generator or generator | ||
| guidance_scale = req.guidance_scale or guidance_scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Respect zero guidance_scale parameter
The call parameter plumbing treats falsy values as missing, so a request with guidance_scale=0.0 is replaced by the default 4.5 (guidance_scale = req.guidance_scale or guidance_scale). This means users cannot disable classifier-free guidance or run pure unconditional sampling: the pipeline will always enter the CFG path, duplicate latents, and apply guidance even when the caller explicitly asked for zero guidance (e.g., the test in tests/single_stage/test_diffusion_model.py passes guidance_scale=0.0).
Useful? React with 👍 / 👎.
| negative_prompt = '' if negative_prompt is None else negative_prompt | ||
| negative_prompt = [negative_prompt]*num_images_per_prompt | ||
| prompt = [prompt]*num_images_per_prompt | ||
|
|
||
| prompt_embeds, text_ids = self.encode_prompt(prompt) | ||
| negative_prompt_embeds, negative_text_ids = self.encode_prompt(negative_prompt) | ||
|
|
||
| # 4. Prepare latent variables | ||
| num_channels_latents = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expand prompt embeddings to match batch size
Prompt embeddings are generated only once and never duplicated to match the requested batch: prompt/negative_prompt are wrapped in a list, but encode_prompt consumes only prompts[0] and returns a single embedding. Immediately after, latents are created for batch_size * num_images_per_prompt items, so when num_outputs_per_prompt > 1 or multiple prompts are passed, the transformer receives hidden states for N samples but encoder hidden states for only 1, which will fail at the first attention block due to mismatched batch dimensions. The default test case requesting two images triggers this mismatch.
Useful? React with 👍 / 👎.
22d9a98 to
1c96e88
Compare
Signed-off-by: elijah <[email protected]>
1c96e88 to
385d56f
Compare
Signed-off-by: elijah <[email protected]>
Signed-off-by: elijah <[email protected]>
Signed-off-by: elijah <[email protected]>
|
cc @ZJY0516 @hsliuustc0106 PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix pre-commit and doc error
vllm_omni/diffusion/models/longcat_image/longcat_image_transformer.py
Outdated
Show resolved
Hide resolved
Signed-off-by: elijah <[email protected]>
Signed-off-by: elijah <[email protected]>
Signed-off-by: elijah <[email protected]>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Add support for LongCat-Image #220
Test Plan
python3 examples/offline_inference/text_to_image/text_to_image.py --model meituan-longcat/LongCat-Image/ --prompt '一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。' --width 1024 --height 1024 --num_inference_steps 50 --num_images_per_prompt 1 --output longcat_image_output.pngTest Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)