Skip to content

Conversation

@AlexanderLavelle
Copy link

A small modification to decrease the number of embeddings to gather.

This probably makes more of a difference when this calculation is on another device -- tpu embedding lookup or in SPMD or something. Or if/when masking is small compared to sequence dim max_len.

Might not be that helpful, but I figured "see something say something"

a small modification to decrease the number of embeddings to gather
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant