Skip to content

Enable tf32 precision acceleration for h100 #4767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

RandySheriff
Copy link

Summary:
As indicated by previous MTS benchmark:

TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.
Consider setting `torch.set_float32_matmul_precision('high')` for better performance.

By this diff, enable fp32 high precision. E2E, the diff pushes CMF500x QPS from 25319.47 to 25737.61

Differential Revision: D80908603

Randy Shuai added 2 commits August 24, 2025 14:40
Summary:

X-link: facebookresearch/FBGEMM#1788

For H100, add new option to persistent triton fp8 gemm to boost perf for CMF 500x gemm shapes:
- M=512, N=1024, K=19712, boost flops from 504 to 559, by 11%.
- M=512, N=1024, K=171712,  boost flops from 437 to 481, by 10%.

E2E - CMF 500x QPS was [25077.38](https://www.internalfb.com/intern/paste/P1915458454/), now it is [25319.47](https://www.internalfb.com/intern/paste/P1916364403/).

Differential Revision: D80881599
Summary:
As indicated by previous MTS [benchmark](https://www.internalfb.com/intern/everpaste/?handle=GEHWGSBdCZC6Q5QFALLQJHKASQUnbsIXAAAB&phabricator_paste_number=1916364403):
```
TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.
Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
```
By this diff, enable fp32 high precision. E2E, the diff pushes CMF500x QPS from [25319.47](https://www.internalfb.com/intern/everpaste/?handle=GEHWGSBdCZC6Q5QFALLQJHKASQUnbsIXAAAB&phabricator_paste_number=1916364403) to [25737.61](https://www.internalfb.com/intern/everpaste/?handle=GAsZ7R9ASEeoe9sIABHwORv6SG89bsIXAAAB&phabricator_paste_number=1916396963)

Differential Revision: D80908603
Copy link

netlify bot commented Aug 24, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 03bd8bb
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68ab86e9f34d4d000800a381
😎 Deploy Preview https://deploy-preview-4767--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Aug 24, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80908603

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants