Skip to content

Conversation

@akhilg-nv
Copy link
Collaborator

@akhilg-nv akhilg-nv commented Aug 14, 2024

No description provided.

@akhilg-nv akhilg-nv marked this pull request as draft August 14, 2024 17:16
@pranavm-nvidia pranavm-nvidia added the tripy Pull request for the tripy project label Aug 14, 2024
@akhilg-nv akhilg-nv force-pushed the dev-akhilg-demo-diffusion branch 3 times, most recently from 0d3c2cd to c7c81bd Compare August 29, 2024 22:02
@akhilg-nv akhilg-nv force-pushed the dev-akhilg-demo-diffusion branch from c7c81bd to 5142a42 Compare September 12, 2024 02:14
@akhilg-nv akhilg-nv force-pushed the dev-akhilg-demo-diffusion branch from 5142a42 to d9dd478 Compare October 15, 2024 00:38
@akhilg-nv akhilg-nv force-pushed the dev-akhilg-demo-diffusion branch from d9dd478 to 75efd7d Compare December 13, 2024 19:27
@akhilg-nv akhilg-nv force-pushed the dev-akhilg-demo-diffusion branch from b5a9c9c to 6847a95 Compare April 22, 2025 23:38
@akhilg-nv akhilg-nv force-pushed the dev-akhilg-demo-diffusion branch from 6847a95 to c9bcab1 Compare July 3, 2025 00:37
@akhilg-nv akhilg-nv marked this pull request as ready for review July 3, 2025 00:38
@akhilg-nv
Copy link
Collaborator Author

Tested for accuracy with pytest tests/test_diffusion.py and visual inspection of outputs compared to references on set seed and prompt. Note that I think the current test setup might need some tweaks to properly enable in CI, and test-specific dependencies could be separated out from the example requirements possibly.

Perf now roughly matches oss/demo/Diffusion on my RTX-4070 Ti. Note that the e2e latency can be marginally improved by removing stream synchronize calls while maintaining correctness, though this would make the per-component timing less accurate.

Tripy:
|-----------------|--------------|
|     Module      |   Latency    |
|-----------------|--------------|
|      CLIP       |      2.82 ms |
|    UNet x 50    |   1169.86 ms |
|     VAE-Dec     |     39.18 ms |
|-----------------|--------------|
|    Pipeline     |   1211.85 ms |
|-----------------|--------------|
Throughput: 0.86 image/s

OSS demoDiffusion:
|-----------------|--------------|                                                                                                                              
|     Module      |   Latency    |                                                                                                                              
|-----------------|--------------|                                                                                                                              
|      CLIP       |      1.80 ms |                                                                                                                              
|    UNet x 50    |   1106.67 ms |                                                                                                                              
|     VAE-Dec     |     29.57 ms |                                                                                                                              
|-----------------|--------------|                                                                                                                              
|    Pipeline     |   1138.24 ms |                                                                                                                              
|-----------------|--------------| 
Throughput: 0.88 image/s  

@akhilg-nv akhilg-nv self-assigned this Jul 3, 2025
@akhilg-nv akhilg-nv force-pushed the dev-akhilg-demo-diffusion branch from c9bcab1 to 6b2d3e6 Compare July 4, 2025 03:01
akhilg-nv and others added 27 commits July 29, 2025 17:08
Signed-off-by: Akhil Goel <[email protected]>
Signed-off-by: Akhil Goel <[email protected]>
Root cause: Index for denoising timesteps were reversed while
refactoring.

Signed-off-by: Akhil Goel <[email protected]>
Signed-off-by: Akhil Goel <[email protected]>
Signed-off-by: Akhil Goel <[email protected]>
Remove lazy mode evaluation in the denoising loop.
@akhilg-nv akhilg-nv force-pushed the dev-akhilg-demo-diffusion branch from 6632d00 to 16e0912 Compare July 30, 2025 00:11
@akhilg-nv akhilg-nv merged commit a23ddfd into main Jul 30, 2025
1 check passed
@akhilg-nv akhilg-nv deleted the dev-akhilg-demo-diffusion branch July 30, 2025 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tripy Pull request for the tripy project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants