Skip to content

[Parallelism] Implement vocabulary parallelism #680

@casper-hansen

Description

@casper-hansen

Balancing Pipeline Parallelism with Vocabulary Parallelism introduces a way to handle vocabulary scaling together with PP. While context parallelism splits the sequence dimension across devices, Vocabulary Parallelism specifically splits the vocabulary dimension of embedding layers.

image

Results:

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions