[WebGPU] Fix SkipSimplifiedLayerNormalization bias#28427
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a WebGPU-specific input indexing bug in com.microsoft.SkipSimplifiedLayerNormalization where the optional 1-D bias was read from the wrong input slot (and could lead to incorrect results/NaNs), bringing WebGPU behavior in line with the operator schema and other EPs.
Changes:
- Correct WebGPU
SkipSimplifiedLayerNormalizationoptional input handling sobiasis read from slot 3 (andbetais absent for the simplified variant). - Strengthen WebGPU shader program caching by including
hasBeta/hasBiasin the program cache hint to avoid cache collisions across optional-input combinations. - Add a regression unit test that exercises SkipSimplifiedLayerNormalization with
biasin FP16.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| onnxruntime/contrib_ops/webgpu/bert/skip_layer_norm.cc | Fixes simplified-vs-non-simplified beta/bias input slot selection and updates shader cache hint to account for optional inputs. |
| onnxruntime/test/contrib_ops/skiplayernorm_op_test.cc | Adds an FP16 regression test validating correct bias handling for SkipSimplifiedLayerNormalization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
DirectML EP is not happy with the new unit test. |
|
Any pointers on how to disable it only for DirectML? 👀 |
|
yeap, copy this line: |
Head branch was pushed to by a user without write access
|
thanks 👍 936cafb should hopefully fix that. |
Description
The op spec for com.microsoft.SkipSimplifiedLayerNormalization puts the optional 1-D bias at input slot 3 (the simplified variant has no beta), but onnxruntime/contrib_ops/webgpu/bert/skip_layer_norm.cc hardcodes beta = Input(3); bias = Input(4) regardless of the simplified template parameter — so the user-supplied bias is silently consumed as beta and the kernel reads slot 4 (out of range for this op) for the actual bias. The CPU EP handles the slot correctly; only WebGPU diverges (and produces NaN in larger graphs where slot 4's memory is non-zero).
Motivation and Context
Closes #28424