Enable tf32 precision acceleration for h100 #4767

RandySheriff · 2025-08-24T21:40:54Z

Summary:
As indicated by previous MTS benchmark:

TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.
Consider setting `torch.set_float32_matmul_precision('high')` for better performance.

By this diff, enable fp32 high precision. E2E, the diff pushes CMF500x QPS from 25319.47 to 25737.61

Differential Revision: D80908603

Summary: X-link: facebookresearch/FBGEMM#1788 For H100, add new option to persistent triton fp8 gemm to boost perf for CMF 500x gemm shapes: - M=512, N=1024, K=19712, boost flops from 504 to 559, by 11%. - M=512, N=1024, K=171712, boost flops from 437 to 481, by 10%. E2E - CMF 500x QPS was [25077.38](https://www.internalfb.com/intern/paste/P1915458454/), now it is [25319.47](https://www.internalfb.com/intern/paste/P1916364403/). Differential Revision: D80881599

Summary: As indicated by previous MTS [benchmark](https://www.internalfb.com/intern/everpaste/?handle=GEHWGSBdCZC6Q5QFALLQJHKASQUnbsIXAAAB&phabricator_paste_number=1916364403): ``` TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. ``` By this diff, enable fp32 high precision. E2E, the diff pushes CMF500x QPS from [25319.47](https://www.internalfb.com/intern/everpaste/?handle=GEHWGSBdCZC6Q5QFALLQJHKASQUnbsIXAAAB&phabricator_paste_number=1916364403) to [25737.61](https://www.internalfb.com/intern/everpaste/?handle=GAsZ7R9ASEeoe9sIABHwORv6SG89bsIXAAAB&phabricator_paste_number=1916396963) Differential Revision: D80908603

netlify · 2025-08-24T21:40:59Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`03bd8bb`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68ab86e9f34d4d000800a381
😎 Deploy Preview	https://deploy-preview-4767--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

facebook-github-bot · 2025-08-24T21:41:03Z

This pull request was exported from Phabricator. Differential Revision: D80908603

Randy Shuai added 2 commits August 24, 2025 14:40

meta-cla bot added the cla signed label Aug 24, 2025

facebook-github-bot added the fb-exported label Aug 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable tf32 precision acceleration for h100 #4767

Enable tf32 precision acceleration for h100 #4767

RandySheriff commented Aug 24, 2025

Uh oh!

netlify bot commented Aug 24, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Aug 24, 2025

Uh oh!

Uh oh!

Enable tf32 precision acceleration for h100 #4767

Are you sure you want to change the base?

Enable tf32 precision acceleration for h100 #4767

Conversation

RandySheriff commented Aug 24, 2025

Uh oh!

netlify bot commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Aug 24, 2025

Uh oh!

Uh oh!

netlify bot commented Aug 24, 2025 •

edited

Loading