Skip to content

Could you provide the code for visualizing attention in Figure 2, or help us identify if there are any issues with our approach? #90

@yuhkalhic

Description

@yuhkalhic

Thank you for your excellent work. We have a question regarding the attention visualization in Figure 2 of your paper.

We attempted to reproduce the visualization using the following approach:

  • Taking the last transformer layer
  • Summing across all attention heads

And an example of our method is below. However, our results differ significantly from yours. Even when I set my code to be the first head of the first transformer layer, I can't get such a high score and such a significant distribution pattern like yours.

Could you help us understand, if there might be any specific preprocess or normalization steps we're missing?

To help us better understand and reproduce your results, would it be possible to share the visualization code you used? This would be incredibly helpful for our research.

Thank you for your time and assistance.

屏幕截图 2024-12-11 145016

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions