Skip to content

Conversation

@dougbtv
Copy link
Contributor

@dougbtv dougbtv commented Dec 11, 2025

quick overview.

NOTE: This is dependent on the diffusion online serving PR: #259 and builds on it.

cc: @fake0fan (thanks for getting the work off to a great start in 259!)

Example client implementation @ https://github.com/dougbtv/comfyui-vllm-omni/

review tips.

again, this relies on the work in #259, and there are commits here until that lands that are on top of it.

When reviewing, I recommend going by commit, and see the changes broken into:

  • [docs]
  • [testing]
  • [feature]

so you can isolate just the changes.

the other commits are placeholders for #259

design thoughts.

The idea here is to build on the async API endpoint work that fake0fan did with using the openai completions endpoint, but, to add a diffusion endpoint.

The thought is to add the endpoint, but also a mapping for adding new model support for the endpoint, so that it can be tuned.

The API endpoints are more easily validated, using Pydantic, than the inlined parameters in the completions string. While I believe that is a reasonable action to expect image generation from a completions endpoint while serving multi-modal models, I think it would be nice to have an API endpoint where the parameters can be validated.

...and I want to use it!


overview.

[Feature] Add OpenAI DALL-E compatible image generation API

Builds on @fake0fan's diffusion online serving implementation to provide
a production-ready, OpenAI-compatible image generation API. Implements
the DALL-E /v1/images/generations endpoint with full async support and
proper error handling.

This implementation focuses on generation-only (not editing) to keep
the initial PR manageable while maintaining full functionality and
extensibility.

OpenAI DALL-E API Compatibility:

  • /v1/images/generations - Text-to-image generation
  • Full compatibility with OpenAI Python SDK
  • Request/response formats match DALL-E specification

Unified Async Server:

  • Single vllm serve <model> --omni command for all diffusion models
  • Async AsyncOmniDiffusion engine with thread-pool execution
  • Exposes both /v1/images/generations and /v1/chat/completions
  • Automatic model type detection (diffusion vs chat)

Model Support (via Model Profiles):

  • Qwen/Qwen-Image (text-to-image with true CFG, 50 steps default)
  • Tongyi-MAI/Z-Image-Turbo (fast generation, 9 steps default)
  • Model profiles encapsulate per-model defaults and constraints
  • Easy to add new models without changing API code

Features:

  • Pydantic validation for all request parameters
  • Comprehensive error handling with proper HTTP status codes
  • Model field validation and empty prompt validation
  • Response format validation (b64_json only)
  • Prompt logging at debug level (security-conscious)
  • Model-specific parameter enforcement (e.g., Z-Image forces CFG=0)

Implementation Files:

  • image_api_utils.py - Shared helper functions (parse_size, encode_image, etc.)
  • image_model_profiles.py - Model-specific configurations and constraints
  • protocol/images.py - Pydantic models for requests/responses
  • api_server.py - DALL-E endpoint implementation (/v1/images/generations)

Modified:

  • api_server.py - Integrated DALL-E endpoint with async support
  • async_diffusion.py - Import ordering fix

Built on @fake0fan's excellent diffusion online serving work. This PR
adds the DALL-E compatible API layer with full validation, error
handling, and production-ready features while keeping the scope focused
on generation to facilitate review.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +220 to +200
return OmniRequestOutput.from_diffusion(
request_id=request_id,
images=images,
prompt=prompt,
metrics={

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate diffusion errors instead of returning empty success

If DiffusionEngine.step() raises (e.g., during preprocessing/postprocessing) it returns None, but AsyncOmniDiffusion.generate() doesn’t treat that as a failure—it always falls through to build an OmniRequestOutput and the HTTP handlers will return 200 with an empty data array. That masks generation failures and gives clients a successful response even when no images were produced. This path should detect a None/empty result and surface an error instead of returning success.

Useful? React with 👍 / 👎.

@dougbtv dougbtv force-pushed the dalle-compat-image-api branch 3 times, most recently from 48fee5a to 65ab272 Compare December 11, 2025 22:41
@dougbtv
Copy link
Contributor Author

dougbtv commented Dec 11, 2025

We decided in the maintainer's call, with helpful input from Roger Wang (thank you!) to first start with a single endpoint, for v/1/images/generation -- I'll put together that as a next iteration

@dougbtv dougbtv marked this pull request as draft December 12, 2025 17:46
@dougbtv dougbtv force-pushed the dalle-compat-image-api branch 6 times, most recently from 3cf6521 to c981e03 Compare December 12, 2025 21:25
@dougbtv dougbtv changed the title DALL-E compatible image generation (and editing) endpoints DALL-E compatible image generation endpoint Dec 12, 2025
@dougbtv dougbtv force-pushed the dalle-compat-image-api branch from c981e03 to f540b9e Compare December 12, 2025 21:35
@dougbtv dougbtv marked this pull request as ready for review December 12, 2025 21:37
@dougbtv
Copy link
Contributor Author

dougbtv commented Dec 12, 2025

alright -- I've gone ahead with a refactor on this PR to address comments from Thursday's maintainer's call.

Basically the gist is that I reduced this down to just the /v1/images/generations endpoint and removed the image edit endpoint. There's still a lot to make for the basis of the single endpoint, and there's also a lot of testing and docs.

So I broke it out into three commits, with commit messages like [docs], [tests], [feature] so that it's a little easier to review.

appreciate the input!

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also align with #274

@hsliuustc0106
Copy link
Collaborator

@gcanlin I think this is related to #197, PTAL

fake0fan and others added 3 commits December 15, 2025 08:45
This commit addresses review comments and fixes the Read the Docs build:

* Fix diffusion image output handling (PR comment from chatgpt-codex-connector)
* Remove --num-inference-steps from server start examples
* Remove unnecessary try/except block around get_hf_file_to_dict import
* Add async_diffusion to mkdocs exclude list to prevent vllm import during doc build

Signed-off-by: dougbtv <[email protected]>
@dougbtv dougbtv force-pushed the dalle-compat-image-api branch from f540b9e to ffe73eb Compare December 15, 2025 14:42
Add comprehensive documentation for the OpenAI DALL-E compatible image
generation API with inline examples and model profiles.

Signed-off-by: dougbtv <[email protected]>
Add 29 comprehensive tests covering generation endpoints, model profiles,
request validation, and error handling.

Signed-off-by: dougbtv <[email protected]>
Implement /v1/images/generations endpoint with:
- AsyncOmniDiffusion integration for text-to-image generation
- Model profile system for per-model defaults and constraints
- Request/response protocol matching OpenAI DALL-E API
- Support for Qwen-Image and Z-Image-Turbo models

Signed-off-by: dougbtv <[email protected]>
@dougbtv dougbtv force-pushed the dalle-compat-image-api branch from ffe73eb to a434a65 Compare December 15, 2025 19:46
@dougbtv
Copy link
Contributor Author

dougbtv commented Dec 15, 2025

I've got the branch rebased on main, and I've incorporated the style used in #274 for documentation in my docs update, thanks for letting me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants