Fix onnx Attention and torch SDPA quantization handling #3751

ruro · 2025-11-21T08:23:31Z

Changes

added ONNXAttentionMetatype for the opset 23 Attention ONNX node
fixed scaled_dot_product_attention quantization in torch2 for the case when Q, K and V are parallel edges coming from the same input node

Reason for changes

See #3750

Related tickets

Fixes #3750

Tests

tests/onnx/quantization/test_graphs.py::test_synthetic_models_graph[AttentionModel]
tests/torch2/function_hook/quantization/test_quantized_graphs.py::test_quantized_graphs[unbind_scaled_dot_product_attention_model]

ruro · 2025-11-21T10:41:50Z

Hm. iirc onnx added support for opset 23 in version 1.18.0. So the new test is currently failing in CI due to

onnx==1.17.0; python_version < '3.13'
onnx==1.18.0; python_version >= '3.13'

Do you have any preferences if I should mark this test as

@pytest.mark.skipif(
    version.parse(onnx.__version__) < version.parse("1.18.0"),
    reason="Opset 23 was added in onnx 1.18.0",
)

or bump the version or something else?

ruro added 4 commits November 21, 2025 12:26

add AttentionModel onnx quantization test

4072ce0

add ONNXAttentionMetatype

d034abd

add UnbindScaledDotProductModel torch2 quantization test

0d5db82

handle parallel inputs during default PreHookInsertionPoint generation

791f962

ruro force-pushed the fix_onnx_attention_torch_sdpa_handling branch from 734ba64 to 791f962 Compare November 21, 2025 09:26

github-actions bot added the NNCF ONNX Pull requests that updates NNCF ONNX label Nov 21, 2025

ruro marked this pull request as ready for review November 21, 2025 09:36

ruro requested a review from a team as a code owner November 21, 2025 09:36

MaximProshin requested review from AlexanderDokuchaev, andrey-churkin and ljaljushkin and removed request for ljaljushkin November 21, 2025 10:37

andrey-churkin self-assigned this Nov 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix onnx Attention and torch SDPA quantization handling #3751

Fix onnx Attention and torch SDPA quantization handling #3751

ruro commented Nov 21, 2025 •

edited

Loading

Uh oh!

ruro commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix onnx Attention and torch SDPA quantization handling #3751

Are you sure you want to change the base?

Fix onnx Attention and torch SDPA quantization handling #3751

Conversation

ruro commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason for changes

Related tickets

Tests

Uh oh!

ruro commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ruro commented Nov 21, 2025 •

edited

Loading

ruro commented Nov 21, 2025 •

edited

Loading