Skip to content

namedtuple is incompatible with num_workers>0 in PyTorch DataLoader #32

@KonstantinWilleke

Description

@KonstantinWilleke

The __getitem__ of the datasets can not handle a namedtuple when there are multiple parallel workers. The parallel workers are needed to reach high data loading speeds for powerful compute nodes.
This is a fundamental pytorch issue: because each worker instantiates a Dataset object, and the namedtuple is instantiated in each one, the parallel workers can't collate the batches in a "custom" named tuple.

Potential workarounds:

  • write a custom collate_fn for the DataLoader which converts the entire batch into a namedtuple.
  • each batch returns a dictionary, as in {device_name: value_tensor}

Metadata

Metadata

Labels

needs-to-be-checkedIs this issue still relevant and aligned with our current goals?

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions