Slow Dataloader should use num_worker > 1

I am trying to use torchtitan with procedurally generated data (data augmentation). This process is CPU-intensive and I strongly do not want to store each sample before. Under this setup, `torchtitan` is really slow to train and I'm seeing my MFU dropping by 4-5x compared to unbottlenecked dataloader (no data augmentation). 

I have seen a related problem reported [here](https://github.com/pytorch/torchtitan/issues/1663) with some caveats on how to do multiprocess dataloader effectively. It would be cool to have an official implementation of multiprocess dataloader with `num_worker>1`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow Dataloader should use num_worker > 1 #2073

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slow Dataloader should use num_worker > 1 #2073

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions