feat: add onnxslim support #478

inisis · 2025-10-28T11:49:50Z

What does this PR do?

Type of change:

Add onnxslim support

Overview: Onnxslim is under active development and committed to long-time-support, it's easy to use and is dependent on very few packages.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

copy-pr-bot · 2025-10-28T11:49:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: inisis <[email protected]>

gcunhase · 2025-10-28T17:11:33Z

setup.py

        "onnxruntime-gpu~=1.22.0 ; platform_machine != 'aarch64' and platform_system != 'Darwin' and platform_system != 'Windows'",  # noqa: E501
        "onnxruntime-directml==1.20.0; platform_system == 'Windows'",
        "onnxscript",  # For test_onnx_dynamo_export unit test
        "onnxsim ; python_version < '3.12' and platform_machine != 'aarch64'",


Please remove onnxsim installation if it's no longer being used, thanks.

Signed-off-by: inisis <[email protected]>

inisis · 2025-11-01T02:14:37Z

@gcunhase Hi, any update here? Thanks.

gcunhase · 2025-11-03T22:24:44Z

@gcunhase Hi, any update here? Thanks.

@inisis Thank you for your contribution. I'm doing some investigation on any onnxsim vs onnxslim gaps. Will get back to you as soon as possible.

gcunhase · 2025-11-06T17:45:51Z

@inisis I'm still validating onnxslim on our end, but in the meanwhile, could you please check that switching to onnxslim doesn't break quantization of https://github.com/NVIDIA/DL4AGX/tree/master/AV-Solutions/bevformer-int8-eq?

Specifically, please check that the following CLI is still functional and performant:

$ python -m modelopt.onnx.quantization --onnx_path=/mnt/models/bevformer_tiny_epoch_24_cp2_op13.onnx \
      --trt_plugins=$PLUGIN_PATH \
      --op_types_to_exclude MatMul \
      --calibration_data_path=/workspace/BEVFormer_tensorrt/data/nuscenes/calib_data.npz \
      --simplify

Thanks!

inisis · 2025-11-11T11:14:57Z

Hi, @gcunhase it took me some time to run bevformer-int8-eq, however everything is working fine, here are the results,

Env

device: NVIDIA GeForce RTX 5090
pytorch-quantization      2.2.1
torch                     2.9.0+cu128
torchvision               0.24.0+cu128
onnx                      1.17.0
onnx_graphsurgeon         0.5.8
onnx-ir                   0.1.12
onnxconverter-common      1.16.0
onnxruntime-gpu           1.20.2
onnxscript                0.5.6
onnxsim                   0.4.36
onnxslim                  0.1.74

Without simplify

With onnxsim

With onnxslim

to conclude:

Method	FPS	Acceleration Ratio
Without Simplify	354	1.00×
With onnxsim	371	1.05×
With onnxslim	381	1.08×

Well, in terms of GPU Compute Time (median, ms), onnxsim is slightly faster, I compared two models using

onnxslim --inspect /mnt/models/bevformer_tiny_epoch_24_cp2_op13.quant_sim.onnx /mnt/models/bevformer_tiny_epoch_24_cp2_op13.quant_slim.onnx

+------------------------------+------------------------------------------+------------------------------------------+
|          Model Name          | bevformer_tiny_epoch_24_cp2_op13.quant_s | bevformer_tiny_epoch_24_cp2_op13.quant_s |
|                              |                 im.onnx                  |                 lim.onnx                 |
+------------------------------+------------------------------------------+------------------------------------------+
|          Model Info          |       Op Set: 13 / IR Version: 10        |       Op Set: 13 / IR Version: 10        |
+------------------------------+------------------------------------------+------------------------------------------+
|          IN: image           |       float32: (1, 6, 3, 480, 800)       |       float32: (1, 6, 3, 480, 800)       |
|         IN: prev_bev         |         float32: (2500, 1, 256)          |         float32: (2500, 1, 256)          |
|       IN: use_prev_bev       |              float32: (1,)               |              float32: (1,)               |
|         IN: can_bus          |              float32: (18,)              |              float32: (18,)              |
|        IN: lidar2img         |          float32: (1, 6, 4, 4)           |          float32: (1, 6, 4, 4)           |
|        OUT: bev_embed        |         float32: (2500, 1, 256)          |         float32: (2500, 1, 256)          |
|     OUT: outputs_classes     |         float32: (6, 1, 900, 10)         |         float32: (6, 1, 900, 10)         |
|     OUT: outputs_coords      |         float32: (6, 1, 900, 10)         |         float32: (6, 1, 900, 10)         |
+------------------------------+------------------------------------------+------------------------------------------+
|             Add              |                   318                    |                   185                    |
|             Atan             |                    1                     |                    1                     |
|             Clip             |                    26                    |                    26                    |
|            Concat            |                    16                    |                    16                    |
|             Conv             |                    55                    |                    55                    |
|             Cos              |                    1                     |                    1                     |
|       DequantizeLinear       |                   175                    |                   393                    |
|             Div              |                    67                    |                    67                    |
|            Gather            |                    14                    |                    14                    |
|             Gemm             |                    7                     |                   140                    |
|           Greater            |                    3                     |                    3                     |
|             Less             |                    2                     |                    2                     |
|             Log              |                    15                    |                    15                    |
|            MatMul            |                   142                    |                    11                    |
|             Max              |                    1                     |                    1                     |
|           MaxPool            |                    1                     |                    1                     |
|             Mul              |                    81                    |                    81                    |
| MultiScaleDeformableAttnTRT2 |                    12                    |                    12                    |
|             Pow              |                    41                    |                    41                    |
|        QuantizeLinear        |                   175                    |                   393                    |
|          ReduceMean          |                    81                    |                    81                    |
|          ReduceProd          |                    1                     |                    1                     |
|          ReduceSum           |                    4                     |                    4                     |
|             Relu             |                    96                    |                    96                    |
|           Reshape            |                   105                    |                   269                    |
|          RotateTRT2          |                    1                     |                    1                     |
|          ScatterND           |                    58                    |                    58                    |
|           Sigmoid            |                    18                    |                    18                    |
|             Sign             |                    2                     |                    2                     |
|             Sin              |                    1                     |                    1                     |
|            Slice             |                    84                    |                    84                    |
|           Softmax            |                    5                     |                    5                     |
|            Split             |                    1                     |                    0                     |
|             Sqrt             |                    40                    |                    40                    |
|           Squeeze            |                    1                     |                    1                     |
|             Sub              |                    59                    |                    59                    |
|             Tile             |                    6                     |                    6                     |
|          Transpose           |                    36                    |                    36                    |
|          Unsqueeze           |                    30                    |                    30                    |
|            Where             |                    5                     |                    5                     |
+------------------------------+------------------------------------------+------------------------------------------+
|          Model Size          |                158.77 MB                 |                158.90 MB                 |
+------------------------------+------------------------------------------+------------------------------------------+

Onnxslim will merge Matmul + Add into Gemm, this is not in favor when using --op_types_to_exclude MatMul

gcunhase · 2025-11-11T15:18:02Z

Add | 318 | 185 |
| Atan | 1 | 1 |
| Clip | 26 | 26 |
| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?

I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.

Thanks!

inisis · 2025-11-11T15:21:53Z

Add | 318 | 185 |

| Atan | 1 | 1 |

| Clip | 26 | 26 |

| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?

I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.

Thanks!

@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one.

gcunhase · 2025-11-11T15:32:22Z

Add | 318 | 185 |

| Atan | 1 | 1 |

| Clip | 26 | 26 |

| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?
I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.
Thanks!

@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one.

No problem, let me try to verify the accuracy on my end. Thank you!

Signed-off-by: inisis <[email protected]>

inisis · 2025-11-18T00:01:42Z

Hi @gcunhase , is there any update? Thanks

gcunhase · 2025-11-21T23:39:18Z

@inisis we appreciate your contribution and wanted to make sure that there are no regressions before merging this PR. We've investigated potential risks in ~150 models and compiled a list of issues, divided into 3 categories, that would need to be solved before merging.

All mentioned models and scripts are in the zip file: repro.zip

1. Functional failures

Error logs

Error 1: repro_io_tensors_shape_dtype.onnx

Graph input and output tensors must include dtype information. Please set the dtype attribute for: Variable (NMS): (shape=None, dtype=None))

Error 2: repro_mode_error_mobilenetv1.onnx

Fail - onnxSLIM (onnxSLIM: 'mode')

How to repro

import onnx
import onnxslim

model = onnx.load(input_model_path)
simplified_model = onnxslim.slim(model)

2. ORT inference failures

Error logs

Error 1: repro_mul_incompatible_dimensions.onnx

Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from repro_mul_incompatible_dimensions.onnx failed:Node (/stages.1/stages.1.0/Mul) Op (Mul) [ShapeInferenceError] Incompatible dimensions

Error 2: repro_gemm_invalid_shape.onnx

Fail: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gemm node. Name:'/transformer/decoder/layers.0/attentions.1/attn/Gemm' Status Message: Gemm: Invalid bias shape for broadcast

How to repro

Run the check_ort_failures.py python script (update input_model_path as needed).

3. ORT numerical accuracy failures

Error logs

The simplified versions of the following models do not produce the same outputs as the original model for the same input data:

issue3_repro_conv_bn_fusion.onnx
- WAR: skip_fusion_patterns=["FusionConvBN"]
issue3_repro_conv_resize_issue.onnx
- WAR: none found.

How to repro

Run the check_ort_failures.py python script (update input_model_path as needed).

--
Please let us know if there's any additional questions on any of the items.
Thanks!

gcunhase · 2025-11-21T23:45:37Z

modelopt/onnx/quantization/quantize.py

        try:
-            model_simp, check = onnxsim.simplify(onnx_model)
-            if check:
+            model_simp = onnxslim.slim(onnx_model)


I was able to verify that BEVFormer is compatible with onnxSLIM as long as we skip Gemm Fusion optimizations (skip_fusion_patterns=["FusionGemm"]). Otherwise, perf and accuracy degradation is observed.

Please update this line accordingly, thanks.

Signed-off-by: inisis <[email protected]>

inisis · 2025-11-22T10:27:51Z

@gcunhase So much appreciation for your comprehensive testing, which has helped us improve onnxslim. All the issues you mentioned have been resolved in version 0.1.75 of onnxslim, and these models have also been added to onnxslim’s daily CI. Many thanks again.

Here are some details when solving the issues:

1. Functional failures

If model is ended with custom opertor as output, onnxslim is unable to do symbolic shape inference for it, so it will lose dtype and shape, we improved it by using the info already stored in the original model.
but users can provide custom shape inference logic for theirs own function, onnxslim supports it and has a template for it.

2. ORT inference failures

In onnxslim the shape inference for the outputs of resize node is aligned with official onnx documentation
https://onnx.ai/onnx/operators/onnx__Resize.html#summary
in the official doc, the output size is floored

output_dimension = floor(input_dimension * (roi_end - roi_start) * scale)

where is onnxruntime, the output size if rounded,
https://github.com/microsoft/onnxruntime/blob/977efe4788b2ee24371523b5fa14dd02efcd4942/onnxruntime/core/providers/cpu/tensor/upsample.cc#L70

so there is a mismatch, and in some cases, there will be an incompatible_dimensions issue, now we are aligned with ort.

3. ORT numerical accuracy failures

there is a precision issue with issue3_repro_conv_resize_issue.onnx
in check_ort_failures.py, it uses np.array_equal, which is very strict, I check the maximum diff which is 3.5762787e-07, and if tested with

opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED

the np.array_equal is passed,
so I guess there maybe some ort optimizaion which result in this numerical diff.

inisis requested review from a team as code owners October 28, 2025 11:49

inisis requested review from ajrasane and kevalmorabia97 October 28, 2025 11:49

feat: add onnxslim support

ad1f940

Signed-off-by: inisis <[email protected]>

inisis force-pushed the main branch from 14db274 to ad1f940 Compare October 28, 2025 11:52

kevalmorabia97 requested a review from gcunhase October 28, 2025 13:41

i-riyad approved these changes Oct 28, 2025

View reviewed changes

gcunhase reviewed Oct 28, 2025

View reviewed changes

refactor: remove unused package

abe6de4

Signed-off-by: inisis <[email protected]>

inisis force-pushed the main branch from b270070 to abe6de4 Compare October 28, 2025 17:19

inisis added 5 commits October 29, 2025 11:43

Merge branch 'main' into main

bc8d705

Merge branch 'main' into main

12d8b47

Merge branch 'main' into main

57426a4

Merge branch 'main' into main

6f71c8c

Merge branch 'main' into main

16468d7

Merge branch 'main' into main

c4781ae

inisis added 2 commits November 5, 2025 15:22

Merge branch 'main' into main

6c463eb

Merge branch 'main' into main

3cda6ed

Merge branch 'main' into main

8c23564

Merge branch 'main' into main

e7bc278

Merge branch 'main' into main

f0754c8

inisis added 7 commits November 14, 2025 07:52

Merge branch 'main' into main

c159fa1

Merge branch 'main' into main

c860abe

Merge branch 'main' into main

f32033e

Merge branch 'main' into main

b98082f

Merge branch 'main' into main

4a65ead

Merge branch 'main' into main

6c3b100

Merge branch 'main' into main

21b9891

Signed-off-by: inisis <[email protected]>

gcunhase reviewed Nov 21, 2025

View reviewed changes

inisis mentioned this pull request Nov 22, 2025

fix: issues related to TensorRT-Model-Optimizer inisis/OnnxSlim#221

Merged

inisis added 2 commits November 22, 2025 16:35

Merge branch 'main' into main

e7d09bb

fix: disable Gemm Add fusion

de38272

Signed-off-by: inisis <[email protected]>

feat: add onnxslim support #478

Are you sure you want to change the base?

feat: add onnxslim support #478

Conversation

inisis commented Oct 28, 2025

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Oct 28, 2025

Uh oh!

gcunhase Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

inisis Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

inisis commented Nov 1, 2025

Uh oh!

gcunhase commented Nov 3, 2025

Uh oh!

gcunhase commented Nov 6, 2025

Uh oh!

inisis commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Env

Without simplify

With onnxsim

With onnxslim

Uh oh!

gcunhase commented Nov 11, 2025

Uh oh!

inisis commented Nov 11, 2025

Uh oh!

gcunhase commented Nov 11, 2025

Uh oh!

inisis commented Nov 18, 2025

Uh oh!

gcunhase commented Nov 21, 2025

1. Functional failures

Error logs

How to repro

2. ORT inference failures

Error logs

How to repro

3. ORT numerical accuracy failures

Error logs

How to repro

Uh oh!

gcunhase Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

inisis Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

inisis commented Nov 22, 2025

1. Functional failures

2. ORT inference failures

3. ORT numerical accuracy failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

inisis commented Nov 11, 2025 •

edited

Loading