Skip to content

Fine tune CMF 500x large&medium K shapes to boost flops #4766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RandySheriff
Copy link

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1788

For H100, add new option to persistent triton fp8 gemm to boost perf for CMF 500x gemm shapes:

  • M=512, N=1024, K=19712, boost flops from 504 to 559, by 11%.
  • M=512, N=1024, K=171712, boost flops from 437 to 481, by 10%.

CMF 500x E2E QPS was 25k

Differential Revision: D80881599

Summary:
X-link: facebookresearch/FBGEMM#1788

For H100, add new option to persistent triton fp8 gemm to boost perf for CMF 500x gemm shapes:
- M=512, N=1024, K=19712, boost flops from 504 to 559, by 11%.
- M=512, N=1024, K=171712,  boost flops from 437 to 481, by 10%.

CMF 500x E2E QPS was [25k](https://www.internalfb.com/intern/paste/P1915458454/)

Differential Revision: D80881599
@meta-cla meta-cla bot added the cla signed label Aug 24, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80881599

Copy link

netlify bot commented Aug 24, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 3ac9cbb
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68ab5cf04a684d000808f54f
😎 Deploy Preview https://deploy-preview-4766--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

RandySheriff pushed a commit to RandySheriff/FBGEMM-1 that referenced this pull request Aug 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1788

For H100, add new option to persistent triton fp8 gemm to boost perf for CMF 500x gemm shapes:
- M=512, N=1024, K=19712, boost flops from 504 to 559, by 11%.
- M=512, N=1024, K=171712,  boost flops from 437 to 481, by 10%.

E2E - CMF 500x QPS was [25077.38](https://www.internalfb.com/intern/paste/P1915458454/), now it is [25319.47](https://www.internalfb.com/intern/paste/P1916364403/).

Differential Revision: D80881599
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants