-
-
Notifications
You must be signed in to change notification settings - Fork 785
[XPU] Implemented 8bit optimizers in triton #1692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XPU] Implemented 8bit optimizers in triton #1692
Conversation
@matthewdouglas This PR is ready for review. Interface with 8bit interface was merged #1706 |
Benchmarking on Torch is 32bit optimizer from torch, BNB is 8bit optimizer:
For small shapes difference is smaller (1024*9):
benchmark is based on optimizer tests:
|
@matthewdouglas Could you review the PR please? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi @Egor-Krivov |
Done, the PR is ready for review. All tests passed for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
404e277
into
bitsandbytes-foundation:main
Implemented 8bit optimizers in triton to use of XPU devices.
Depends on interface from #1706
Tested with
BNB_TEST_DEVICE="xpu" pytest --show-capture=no -q tests/test_optim.py::test_optimizer8bit
Benchmarked essentially on the same test, getting better performance than torch optimizer.
This PR contains 3 implementations: