Skip to content

Llama 4 issue tracking #1118

@tianyu-l

Description

@tianyu-l

High priority

Not high priority for now

  • for-loop implementation of MoE
    • with DTensor TP: sharding propagation overhead due to dynamic shapes
      • need to lift cache hit criteria in DTensor sharding prop
      • may be needed by Loss Parallel for per-sequence loss as well
    • with torch.compile: branching on “unbacked” symbolic ints
      • static padding of DTensor may solve this

Not llama4 specific

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions