Attention loss

```
Ground Truth Cross-Attention

1) We define the cross-attention ground truth for tokens as the L2-normalized vector, where:
       a) A value of 1 indicates that the word is active according to the word-level ground truth timestamp.
       b) A value of 0 indicates that no attention should be paid.
2) To account for small inaccuracies in the ground truth timestamps, we apply a linear interpolation of 4 steps (8 milliseconds) on both sides of the ground truth vector, transitioning smoothly from 0 to 1.
```

Here 4 steps corresponds to 80 milliseconds right? 1 frame of encoder corresponds to 20 milliseconds right?

30 * 1000 / 1500 ? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attention loss #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Attention loss #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions