Skip to content

Slow Dataloader should use num_worker > 1 #2073

@hypnopump

Description

@hypnopump

I am trying to use torchtitan with procedurally generated data (data augmentation). This process is CPU-intensive and I strongly do not want to store each sample before. Under this setup, torchtitan is really slow to train and I'm seeing my MFU dropping by 4-5x compared to unbottlenecked dataloader (no data augmentation).

I have seen a related problem reported here with some caveats on how to do multiprocess dataloader effectively. It would be cool to have an official implementation of multiprocess dataloader with num_worker>1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions