Question about dimension in Disparity Transformer

Hi @wenbowen123 , thanks for the great work!

I have a question about the Disparity Transformer. In FlashMultiheadAttention.forward, after projecting Q/K/V: 
https://github.com/NVlabs/FoundationStereo/blob/6e8806816b533e4d13ddbb95ffa907b797060a62/core/submodule.py#L216

<img width="3023" height="1497" alt="Image" src="https://github.com/user-attachments/assets/36173753-f35b-48b5-bd4f-6413bf455c4b" />

Since F.scaled_dot_product_attention expects (..., num_heads, seq_len, embed_dim), it seems like attention is being computed over heads=4 as the sequence dimension, rather than over disparity candidates.

Am I missing something here? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about dimension in Disparity Transformer #238

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question about dimension in Disparity Transformer #238

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions