draft: Add per block MSE for NVFP4 and INT4 #613

Fridah-nv · 2025-11-27T00:12:52Z

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Signed-off-by: Fridah-nv <[email protected]>

copy-pr-bot · 2025-11-27T00:12:55Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

codecov · 2025-11-27T00:23:37Z

Codecov Report

❌ Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.65%. Comparing base (f06c3f9) to head (c025df7).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
.../torch/quantization/nn/modules/tensor_quantizer.py	60.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #613      +/-   ##
==========================================
+ Coverage   74.57%   74.65%   +0.08%     
==========================================
  Files         183      183              
  Lines       18412    18546     +134     
==========================================
+ Hits        13730    13845     +115     
- Misses       4682     4701      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

realAsma · 2025-12-03T21:28:22Z

modelopt/torch/quantization/triton/fp4_kernel.py

+
+
+@triton.jit
+def blockwise_fp4_fake_quant_kernel(


Suggested change

def blockwise_fp4_fake_quant_kernel(

def static_blockwise_fp4_fake_quant_kernel(

realAsma · 2025-12-03T21:31:22Z

modelopt/torch/quantization/model_calib.py

                        disable_calib(quantizer),
                        enable_fake_quant(quantizer),
                    ):
+                        quantizer._keep_shape = True


line 233:

if is_nvfp4_static: # static per-block
amax = scaled_e4m3_impl(amax, amax.amax()) # fP8 quantization

tmp update for per block mse NVFP4 and INT4

c025df7

Signed-off-by: Fridah-nv <[email protected]>

Fridah-nv self-assigned this Nov 27, 2025

realAsma reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

draft: Add per block MSE for NVFP4 and INT4 #613

draft: Add per block MSE for NVFP4 and INT4 #613

Fridah-nv commented Nov 27, 2025

Uh oh!

copy-pr-bot bot commented Nov 27, 2025

Uh oh!

codecov bot commented Nov 27, 2025

Uh oh!

realAsma Dec 3, 2025

Uh oh!

realAsma Dec 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def blockwise_fp4_fake_quant_kernel(
	def static_blockwise_fp4_fake_quant_kernel(

draft: Add per block MSE for NVFP4 and INT4 #613

Are you sure you want to change the base?

draft: Add per block MSE for NVFP4 and INT4 #613

Conversation

Fridah-nv commented Nov 27, 2025

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Nov 27, 2025

Uh oh!

codecov bot commented Nov 27, 2025

Codecov Report

Uh oh!

realAsma Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

realAsma Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

realAsma Dec 3, 2025 •

edited

Loading