Replies: 5 comments 26 replies
-
The aotriton flag flip was landed this morning on Windows, after the release build was triggered. Should be there in the next nightly. |
Beta Was this translation helpful? Give feedback.
-
|
Is this slowdown from the VAE decode part I wonder?
…On Saturday, September 13, 2025, Scott Todd ***@***.***> wrote:
Oh wait, yes. That is a "dev" release I manually triggered for testing.
The rocm.nightlies URL is for "nightly" releases. So yes, what you
installed from there should be similar to what appears in the next nightly
build, and that build has aotriton, from the logs:
2025-09-12T00:19:41.8410373Z -- USE_ROCM : ON
2025-09-12T00:19:41.8410547Z -- ROCM_VERSION :
2025-09-12T00:19:41.8410731Z -- USE_FLASH_ATTENTION : 1
2025-09-12T00:19:41.8410913Z -- USE_MEM_EFF_ATTENTION : 1
2025-09-12T00:19:41.8411107Z -- USE_ROCM_CK_SDPA : OFF
2025-09-12T00:19:41.8411308Z -- USE_ROCM_CK_GEMM : OFF
Compare shows like 2.5x faster with the AOTriton.
The only one difference that I have is on WAN2.2. Seems that this is still
3 times slower than on linux. On linux the default WAN t2v takes 15 min, on
Windows it takes 45min.
Huh... well, that's good to know. cc @jammm <https://github.com/jammm>
@xinyazhang <https://github.com/xinyazhang> . Maybe the code paths used
for WAN t2v are not as optimized via aotriton on the 7900XTX, specifically
for the prebuilt kernels used for Windows...?
—
Reply to this email directly, view it on GitHub
<#1477 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATCSOEJ2BCEVMFE2WMSO5D3SMQN3AVCNFSM6AAAAACGLYVM42VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIMZYG44DCMY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
I am testing img generation with 7900XTX right now. How long does it take to generate image in your case? Is it weird that first generation took me 6-10 minutes, but afterwards it was only 10-15 seconds? I installed torch-2.10.0a0+rocmsdk20250911-cp312-cp312-win_amd64.whl and I'm not sure if AOTriton is working correctly, because I got this warning:
And I got some MIOpen(HIP) errors too. |
Beta Was this translation helpful? Give feedback.
-
|
@jammm on Monday, would you mind sending an email with this issue and summary to Adam Tran, and I will then include my docs team: this is a very important finding that we need to make sure is well documented on a FAQ. |
Beta Was this translation helpful? Give feedback.
-
|
Uploading logs for the same
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have a linux and Windows installation done on
CPU: 9900X
GPU:7900XTX
RAM: 64GB
Yoday, I installed the Windows nightly build, which contains AOTriton.
Compare shows like 2.5x faster with the AOTriton.
The only one difference that I have is on WAN2.2.
Seems that this is still 3 times slower than on linux.
On linux the default WAN t2v takes 15 min, on Windows it takes 45min.
Does anyone have a idea why ?
M.
Beta Was this translation helpful? Give feedback.
All reactions