Skip to content

Support Iluvatar CoreX #8585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

honglyua-il
Copy link

@honglyua-il honglyua-il commented Jun 19, 2025

close #8584

The PR was validated on Iluvatar CoreX GPUs. We need to install Iluvatar Corex Toolkit first. Then run:

# Intsall dependencies
pip install -r requirements.txt
# run
python3 main.py --disable-cuda-malloc

We use the sd_xl_base_1.0 model and get the default workflow's results as below:

root@848fa421ea4c:~/ComfyUI# python3 main.py --disable-cuda-malloc --listen 0.0.0.0
Checkpoint files will always be loaded safely.
Total VRAM 32716 MB, total RAM 515630 MB
pytorch version: 2.4.1
/usr/local/corex-4.2.0/lib64/python3/dist-packages/xformers/ops/swiglu_op.py:107: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/usr/local/corex-4.2.0/lib64/python3/dist-packages/xformers/ops/swiglu_op.py:128: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(cls, ctx, dx5):
xformers version: 0.0.26.post1
Set vram state to: NORMAL_VRAM
Device: cuda:0 Iluvatar BI-V150 : native
Using pytorch attention
Python version: 3.10.12 (main, Nov 29 2024, 18:13:52) [GCC 9.4.0]
ComfyUI version: 0.3.41
ComfyUI frontend version: 1.22.2
[Prompt Server] web root: /usr/local/lib/python3.10/site-packages/comfyui_frontend_package/static
/usr/local/corex-4.2.0/lib64/python3/dist-packages/flash_attn/ops/fused_dense.py:30: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(
/usr/local/corex-4.2.0/lib64/python3/dist-packages/flash_attn/ops/fused_dense.py:71: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output, *args):

Import times for custom nodes:
   0.0 seconds: /root/ComfyUI/custom_nodes/websocket_image_save.py

Context impl SQLiteImpl.
Will assume non-transactional DDL.
/usr/local/lib/python3.10/site-packages/alembic/config.py:564: DeprecationWarning: No path_separator found in configuration; falling back to legacy splitting on spaces, commas, and colons for prepend_sys_path.  Consider adding path_separator=os to Alembic config.
  util.warn_deprecated(
No target revision found.
/usr/local/corex-4.2.0/lib64/python3/dist-packages/aiohttp/web_urldispatcher.py:202: DeprecationWarning: Bare functions are deprecated, use async ones
  warnings.warn(
Starting server

To see the GUI go to: http://0.0.0.0:8188
got prompt
model weight dtype torch.float16, manual cast: None
model_type EPS
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load SDXLClipModel
loaded completely 31430.74140625 1560.802734375 True
/root/ComfyUI/comfy/ldm/modules/attention.py:451: UserWarning: Optional attn_mask_ param is not recommended to use. For better performance,1.Assuming causal attention masking, 'is_causal' parameter can be selected.2.Assuming alibi attention masking, 'PT_SDPA_USE_ALIBI_MASK' env can be selected. (Triggered internally at /home/corex/sw_home/apps/pytorch/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:1769.)
  out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
Requested to load SDXL
loaded completely 29856.74970703125 4897.0483474731445 True
100%|███████████████████████████████████████████| 20/20 [00:02<00:00,  7.18it/s]
Requested to load AutoencoderKL
loaded completely 24619.5859375 159.55708122253418 True
Prompt executed in 5.58 seconds

image

@honglyua-il honglyua-il force-pushed the iluvatar_support branch 2 times, most recently from 46d9466 to da50a8e Compare June 23, 2025 05:57
@honglyua-il honglyua-il force-pushed the iluvatar_support branch 4 times, most recently from fa8a063 to c795d23 Compare June 30, 2025 01:56
@honglyua-il
Copy link
Author

honglyua-il commented Jun 30, 2025

Hello @comfyanonymous Is this PR still under review? Let me know if there is anything else I need to do.

From the feedback from the community, many users expect to use ComfyUI on Iluvatar CoreX GPU. I submitted this PR and hope it can be merged as soon as possible. Thanks!

Here I have rebased the new master, and have tested it, the result showed it worked well. The test logs and images are shown below.

If you mind we modify the cuda malloc to adapt Iluvatar CoreX GPU, we can also revert the c795d23 and launch ComfyUI by running python main.py --disable-cuda-malloc.

PTAL, Thanks!

root@666c5f0762e9:~/ComfyUI# python3 main.py --listen 0.0.0.0
Checkpoint files will always be loaded safely.
Total VRAM 32716 MB, total RAM 515630 MB
pytorch version: 2.4.1
Set vram state to: NORMAL_VRAM
Device: cuda:0 Iluvatar BI-V150 : native
Using pytorch attention
Python version: 3.10.18 (main, Jun 11 2025, 16:28:51) [GCC 9.4.0]
ComfyUI version: 0.3.43
ComfyUI frontend version: 1.23.4
[Prompt Server] web root: /usr/local/lib/python3.10/site-packages/comfyui_frontend_package/static

Import times for custom nodes:
   0.0 seconds: /root/ComfyUI/custom_nodes/websocket_image_save.py

Context impl SQLiteImpl.
Will assume non-transactional DDL.
/usr/local/lib/python3.10/site-packages/alembic/config.py:577: DeprecationWarning: No path_separator found in configuration; falling back to legacy splitting on spaces, commas, and colons for prepend_sys_path.  Consider adding path_separator=os to Alembic config.
  util.warn_deprecated(
No target revision found.
/usr/local/lib/python3.10/site-packages/aiohttp/web_urldispatcher.py:204: DeprecationWarning: Bare functions are deprecated, use async ones
  warnings.warn(
Starting server

To see the GUI go to: http://0.0.0.0:8188
got prompt
model weight dtype torch.float16, manual cast: None
model_type EPS
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load SDXLClipModel
loaded completely 31476.7765625 1560.802734375 True
/root/ComfyUI/comfy/ldm/modules/attention.py:451: UserWarning: Optional attn_mask_ param is not recommended to use. For better performance,1.Assuming causal attention masking, 'is_causal' parameter can be selected.2.Assuming alibi attention masking, 'PT_SDPA_USE_ALIBI_MASK' env can be selected. (Triggered internally at /home/corex/sw_home/apps/pytorch/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:1599.)
  out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
Requested to load SDXL
loaded completely 29902.75751953125 4897.0483474731445 True
100%|███████████████████████████████████████████| 20/20 [00:02<00:00,  6.79it/s]
Requested to load AutoencoderKL
loaded completely 24661.50390625 159.55708122253418 True
Prompt executed in 5.98 seconds

image

cuda_malloc.py Outdated
@@ -50,7 +50,33 @@ def enum_display_devices():
"GeForce GTX 1650", "GeForce GTX 1630", "Tesla M4", "Tesla M6", "Tesla M10", "Tesla M40", "Tesla M60"
}

def _load_torch_submodule(filename):
"""Helper to load and check a submodule from torch's installation"""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this can't you just check if the computer has an iluvatar device?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @comfyanonymous
Thank you for your time and support.

We’ve looked into the issue and would like to propose two possible approaches to address it:

  1. Modify cuda_malloc.py to detect Iluvatar GPU names by using subprocess.check_output(['ixsmi', '-L']), following a pattern similar to how NVIDIA GPUs are currently detected.
  2. As an alternative, we could leave cuda_malloc.py unchanged and instead update the README to include instructions for launching ComfyUI with the command: python main.py --disable-cuda-malloc.

Both options seem viable, but we’d really appreciate your thoughts on which direction would be more suitable for the project.

Thanks again for your guidance

@honglyua-il honglyua-il force-pushed the iluvatar_support branch 2 times, most recently from 4b6d9a5 to dda8034 Compare July 11, 2025 03:00
@honglyua-il
Copy link
Author

hello @comfyanonymous , we have updated the README to include instructions for launching ComfyUI with the command: python main.py --disable-cuda-malloc. PTAL, Thanks!

@honglyua-il honglyua-il force-pushed the iluvatar_support branch 2 times, most recently from 46ebfd4 to b0ac606 Compare July 18, 2025 02:42
@honglyua-il
Copy link
Author

@comfyanonymous @ltdrdata PTAL.

@honglyua-il
Copy link
Author

@comfyanonymous we have rebase the new master, and because of this #9031, we do not need add --disable-cude-malloc to start ComfyUI.

PTAL, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Request for Iluvatar Corex GPU support
3 participants