-
Notifications
You must be signed in to change notification settings - Fork 19
Add Stable Diffusion demo #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0d3c2cd to
c7c81bd
Compare
c7c81bd to
5142a42
Compare
5142a42 to
d9dd478
Compare
d9dd478 to
75efd7d
Compare
b5a9c9c to
6847a95
Compare
6847a95 to
c9bcab1
Compare
|
Tested for accuracy with Perf now roughly matches oss/demo/Diffusion on my RTX-4070 Ti. Note that the e2e latency can be marginally improved by removing stream synchronize calls while maintaining correctness, though this would make the per-component timing less accurate. |
c9bcab1 to
6b2d3e6
Compare
Signed-off-by: Akhil Goel <[email protected]>
Signed-off-by: Akhil Goel <[email protected]>
Root cause: Index for denoising timesteps were reversed while refactoring. Signed-off-by: Akhil Goel <[email protected]>
Signed-off-by: Akhil Goel <[email protected]>
Signed-off-by: Akhil Goel <[email protected]>
Signed-off-by: Akhil Goel <[email protected]>
Signed-off-by: Akhil Goel <[email protected]>
Remove lazy mode evaluation in the denoising loop.
…erf, add profiling calls
6632d00 to
16e0912
Compare
No description provided.