Skip to content

Question about dimension in Disparity Transformer #238

@Jiaxin0630

Description

@Jiaxin0630

Hi @wenbowen123 , thanks for the great work!

I have a question about the Disparity Transformer. In FlashMultiheadAttention.forward, after projecting Q/K/V:

Q = Q.view(Q.size(0), Q.size(1), self.num_heads, self.head_dim)

Image

Since F.scaled_dot_product_attention expects (..., num_heads, seq_len, embed_dim), it seems like attention is being computed over heads=4 as the sequence dimension, rather than over disparity candidates.

Am I missing something here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions