[`Attn Masks`] Lift bidirectional mask restriction on eager #42325

vasqu · 2025-11-21T14:01:16Z

Given that the torch compiler fuses our eager attention to sdpa automatically under compile, we should take more advantage and allow no masks on bidirectional masks without padding == fa backend. This restriction is also unnecessary and I'm not sure why I had this one to begin with.

Fuse strategies can be seen here:

https://github.com/pytorch/pytorch/blob/main/torch/_inductor/fx_passes/fuse_attention.py

vasqu · 2025-11-21T14:02:01Z

src/transformers/masking_utils.py

-        allow_torch_fix (`bool`, optional):
-            Whether to update the mask in case a query is not attending to any tokens, to solve a bug in torch's older
-            versions. We need an arg to skip it when using eager. By default `True`.


Doubled comment, unrelated but should be removed nonetheless. Slipped through

HuggingFaceDocBuilderDev · 2025-11-21T14:11:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

is there a test we can add for this? Otherwise lgtm!

vasqu · 2025-11-21T14:47:23Z

Good point, let me check if we properly skip the mask in a test here.

vasqu · 2025-11-21T17:12:24Z

run-slow: bart,bert,llama

github-actions · 2025-11-21T17:13:29Z

This comment contains run-slow, running the specified jobs:

models: ["models/bart", "models/bert", "models/llama"]
quantizations: []

github-actions · 2025-11-21T17:27:24Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

remove restriction

526d03c

vasqu commented Nov 21, 2025

View reviewed changes

Merge branch 'main' into remove-eager-bidirectional-mask-restriction

8a5a75a

vasqu requested review from ArthurZucker and Cyrilvallez November 21, 2025 14:06

fix

ee52fbf

ArthurZucker approved these changes Nov 21, 2025

View reviewed changes

vasqu and others added 4 commits November 21, 2025 17:26

Merge branch 'main' into remove-eager-bidirectional-mask-restriction

fa30fdf

add test and refactor tests

ec14b70

style

4a59cfa

add docstring

f5ba28d

vasqu merged commit bdee088 into main Nov 21, 2025
25 checks passed

vasqu deleted the remove-eager-bidirectional-mask-restriction branch November 21, 2025 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[`Attn Masks`] Lift bidirectional mask restriction on eager #42325

[`Attn Masks`] Lift bidirectional mask restriction on eager #42325

vasqu commented Nov 21, 2025

Uh oh!

vasqu Nov 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 21, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

vasqu commented Nov 21, 2025

Uh oh!

vasqu commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Attn Masks] Lift bidirectional mask restriction on eager #42325

[Attn Masks] Lift bidirectional mask restriction on eager #42325

Conversation

vasqu commented Nov 21, 2025

Uh oh!

vasqu Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Nov 21, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented Nov 21, 2025

Uh oh!

vasqu commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

CI Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[`Attn Masks`] Lift bidirectional mask restriction on eager #42325

[`Attn Masks`] Lift bidirectional mask restriction on eager #42325