Skip to content

Conversation

@inisis
Copy link

@inisis inisis commented Oct 28, 2025

What does this PR do?

Type of change:

Add onnxslim support

Overview: Onnxslim is under active development and committed to long-time-support, it's easy to use and is dependent on very few packages.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@inisis inisis requested review from a team as code owners October 28, 2025 11:49
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 28, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

setup.py Outdated
"onnxruntime-gpu~=1.22.0 ; platform_machine != 'aarch64' and platform_system != 'Darwin' and platform_system != 'Windows'", # noqa: E501
"onnxruntime-directml==1.20.0; platform_system == 'Windows'",
"onnxscript", # For test_onnx_dynamo_export unit test
"onnxsim ; python_version < '3.12' and platform_machine != 'aarch64'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove onnxsim installation if it's no longer being used, thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@inisis
Copy link
Author

inisis commented Nov 1, 2025

@gcunhase Hi, any update here? Thanks.

@gcunhase
Copy link
Contributor

gcunhase commented Nov 3, 2025

@gcunhase Hi, any update here? Thanks.

@inisis Thank you for your contribution. I'm doing some investigation on any onnxsim vs onnxslim gaps. Will get back to you as soon as possible.

@gcunhase
Copy link
Contributor

gcunhase commented Nov 6, 2025

@inisis I'm still validating onnxslim on our end, but in the meanwhile, could you please check that switching to onnxslim doesn't break quantization of https://github.com/NVIDIA/DL4AGX/tree/master/AV-Solutions/bevformer-int8-eq?

Specifically, please check that the following CLI is still functional and performant:

$ python -m modelopt.onnx.quantization --onnx_path=/mnt/models/bevformer_tiny_epoch_24_cp2_op13.onnx \
      --trt_plugins=$PLUGIN_PATH \
      --op_types_to_exclude MatMul \
      --calibration_data_path=/workspace/BEVFormer_tensorrt/data/nuscenes/calib_data.npz \
      --simplify

Thanks!

@inisis
Copy link
Author

inisis commented Nov 11, 2025

Hi, @gcunhase it took me some time to run bevformer-int8-eq, however everything is working fine, here are the results,

Env

device: NVIDIA GeForce RTX 5090
pytorch-quantization      2.2.1
torch                     2.9.0+cu128
torchvision               0.24.0+cu128
onnx                      1.17.0
onnx_graphsurgeon         0.5.8
onnx-ir                   0.1.12
onnxconverter-common      1.16.0
onnxruntime-gpu           1.20.2
onnxscript                0.5.6
onnxsim                   0.4.36
onnxslim                  0.1.74

Without simplify

0ad9b5a8c1af53b9ec70eae75e883add

With onnxsim

c006fa7f0505eff5449b52cfba3bfca5

With onnxslim

5e5548b2f02b972ef28b9be84fc750ae

to conclude:

Method FPS Acceleration Ratio
Without Simplify 354 1.00×
With onnxsim 371 1.05×
With onnxslim 381 1.08×

Well, in terms of GPU Compute Time (median, ms), onnxsim is slightly faster, I compared two models using

onnxslim --inspect /mnt/models/bevformer_tiny_epoch_24_cp2_op13.quant_sim.onnx /mnt/models/bevformer_tiny_epoch_24_cp2_op13.quant_slim.onnx
+------------------------------+------------------------------------------+------------------------------------------+
|          Model Name          | bevformer_tiny_epoch_24_cp2_op13.quant_s | bevformer_tiny_epoch_24_cp2_op13.quant_s |
|                              |                 im.onnx                  |                 lim.onnx                 |
+------------------------------+------------------------------------------+------------------------------------------+
|          Model Info          |       Op Set: 13 / IR Version: 10        |       Op Set: 13 / IR Version: 10        |
+------------------------------+------------------------------------------+------------------------------------------+
|          IN: image           |       float32: (1, 6, 3, 480, 800)       |       float32: (1, 6, 3, 480, 800)       |
|         IN: prev_bev         |         float32: (2500, 1, 256)          |         float32: (2500, 1, 256)          |
|       IN: use_prev_bev       |              float32: (1,)               |              float32: (1,)               |
|         IN: can_bus          |              float32: (18,)              |              float32: (18,)              |
|        IN: lidar2img         |          float32: (1, 6, 4, 4)           |          float32: (1, 6, 4, 4)           |
|        OUT: bev_embed        |         float32: (2500, 1, 256)          |         float32: (2500, 1, 256)          |
|     OUT: outputs_classes     |         float32: (6, 1, 900, 10)         |         float32: (6, 1, 900, 10)         |
|     OUT: outputs_coords      |         float32: (6, 1, 900, 10)         |         float32: (6, 1, 900, 10)         |
+------------------------------+------------------------------------------+------------------------------------------+
|             Add              |                   318                    |                   185                    |
|             Atan             |                    1                     |                    1                     |
|             Clip             |                    26                    |                    26                    |
|            Concat            |                    16                    |                    16                    |
|             Conv             |                    55                    |                    55                    |
|             Cos              |                    1                     |                    1                     |
|       DequantizeLinear       |                   175                    |                   393                    |
|             Div              |                    67                    |                    67                    |
|            Gather            |                    14                    |                    14                    |
|             Gemm             |                    7                     |                   140                    |
|           Greater            |                    3                     |                    3                     |
|             Less             |                    2                     |                    2                     |
|             Log              |                    15                    |                    15                    |
|            MatMul            |                   142                    |                    11                    |
|             Max              |                    1                     |                    1                     |
|           MaxPool            |                    1                     |                    1                     |
|             Mul              |                    81                    |                    81                    |
| MultiScaleDeformableAttnTRT2 |                    12                    |                    12                    |
|             Pow              |                    41                    |                    41                    |
|        QuantizeLinear        |                   175                    |                   393                    |
|          ReduceMean          |                    81                    |                    81                    |
|          ReduceProd          |                    1                     |                    1                     |
|          ReduceSum           |                    4                     |                    4                     |
|             Relu             |                    96                    |                    96                    |
|           Reshape            |                   105                    |                   269                    |
|          RotateTRT2          |                    1                     |                    1                     |
|          ScatterND           |                    58                    |                    58                    |
|           Sigmoid            |                    18                    |                    18                    |
|             Sign             |                    2                     |                    2                     |
|             Sin              |                    1                     |                    1                     |
|            Slice             |                    84                    |                    84                    |
|           Softmax            |                    5                     |                    5                     |
|            Split             |                    1                     |                    0                     |
|             Sqrt             |                    40                    |                    40                    |
|           Squeeze            |                    1                     |                    1                     |
|             Sub              |                    59                    |                    59                    |
|             Tile             |                    6                     |                    6                     |
|          Transpose           |                    36                    |                    36                    |
|          Unsqueeze           |                    30                    |                    30                    |
|            Where             |                    5                     |                    5                     |
+------------------------------+------------------------------------------+------------------------------------------+
|          Model Size          |                158.77 MB                 |                158.90 MB                 |
+------------------------------+------------------------------------------+------------------------------------------+

Onnxslim will merge Matmul + Add into Gemm, this is not in favor when using --op_types_to_exclude MatMul

@gcunhase
Copy link
Contributor

Add | 318 | 185 |
| Atan | 1 | 1 |
| Clip | 26 | 26 |
| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?

I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.

Thanks!

@inisis
Copy link
Author

inisis commented Nov 11, 2025

Add | 318 | 185 |

| Atan | 1 | 1 |

| Clip | 26 | 26 |

| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?

I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.

Thanks!

@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one.

@gcunhase
Copy link
Contributor

Add | 318 | 185 |

| Atan | 1 | 1 |

| Clip | 26 | 26 |

| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?
I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.
Thanks!

@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one.

Add | 318 | 185 |

| Atan | 1 | 1 |

| Clip | 26 | 26 |

| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?
I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.
Thanks!

@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one.

No problem, let me try to verify the accuracy on my end. Thank you!

@inisis
Copy link
Author

inisis commented Nov 18, 2025

Hi @gcunhase , is there any update? Thanks

@gcunhase
Copy link
Contributor

@inisis we appreciate your contribution and wanted to make sure that there are no regressions before merging this PR. We've investigated potential risks in ~150 models and compiled a list of issues, divided into 3 categories, that would need to be solved before merging.

All mentioned models and scripts are in the zip file: repro.zip

1. Functional failures

Error logs

Error 1: repro_io_tensors_shape_dtype.onnx

Graph input and output tensors must include dtype information. Please set the dtype attribute for: Variable (NMS): (shape=None, dtype=None)) 

Error 2: repro_mode_error_mobilenetv1.onnx

Fail - onnxSLIM (onnxSLIM: 'mode') 

How to repro

import onnx
import onnxslim

model = onnx.load(input_model_path)
simplified_model = onnxslim.slim(model)

2. ORT inference failures

Error logs

Error 1: repro_mul_incompatible_dimensions.onnx

Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from repro_mul_incompatible_dimensions.onnx failed:Node (/stages.1/stages.1.0/Mul) Op (Mul) [ShapeInferenceError] Incompatible dimensions 

Error 2: repro_gemm_invalid_shape.onnx

Fail: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gemm node. Name:'/transformer/decoder/layers.0/attentions.1/attn/Gemm' Status Message: Gemm: Invalid bias shape for broadcast 

How to repro

Run the check_ort_failures.py python script (update input_model_path as needed).

3. ORT numerical accuracy failures

Error logs

The simplified versions of the following models do not produce the same outputs as the original model for the same input data:

  • issue3_repro_conv_bn_fusion.onnx
    • WAR: skip_fusion_patterns=["FusionConvBN"]
  • issue3_repro_conv_resize_issue.onnx
    • WAR: none found.

How to repro

Run the check_ort_failures.py python script (update input_model_path as needed).

--
Please let us know if there's any additional questions on any of the items.
Thanks!

try:
model_simp, check = onnxsim.simplify(onnx_model)
if check:
model_simp = onnxslim.slim(onnx_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to verify that BEVFormer is compatible with onnxSLIM as long as we skip Gemm Fusion optimizations (skip_fusion_patterns=["FusionGemm"]). Otherwise, perf and accuracy degradation is observed.

Please update this line accordingly, thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@inisis
Copy link
Author

inisis commented Nov 22, 2025

@gcunhase So much appreciation for your comprehensive testing, which has helped us improve onnxslim. All the issues you mentioned have been resolved in version 0.1.75 of onnxslim, and these models have also been added to onnxslim’s daily CI. Many thanks again.

Here are some details when solving the issues:

1. Functional failures

If model is ended with custom opertor as output, onnxslim is unable to do symbolic shape inference for it, so it will lose dtype and shape, we improved it by using the info already stored in the original model.
but users can provide custom shape inference logic for theirs own function, onnxslim supports it and has a template for it.

2. ORT inference failures

In onnxslim the shape inference for the outputs of resize node is aligned with official onnx documentation
https://onnx.ai/onnx/operators/onnx__Resize.html#summary
in the official doc, the output size is floored

output_dimension = floor(input_dimension * (roi_end - roi_start) * scale)

where is onnxruntime, the output size if rounded,
https://github.com/microsoft/onnxruntime/blob/977efe4788b2ee24371523b5fa14dd02efcd4942/onnxruntime/core/providers/cpu/tensor/upsample.cc#L70

so there is a mismatch, and in some cases, there will be an incompatible_dimensions issue, now we are aligned with ort.

3. ORT numerical accuracy failures

there is a precision issue with issue3_repro_conv_resize_issue.onnx
in check_ort_failures.py, it uses np.array_equal, which is very strict, I check the maximum diff which is 3.5762787e-07, and if tested with

opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED

the np.array_equal is passed,
so I guess there maybe some ort optimizaion which result in this numerical diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants