Skip to content

🐛[BUG]: ERA5 DALI datapipe hangs indefinitely in multi-GPU/multi-Node setting if the datapipe size is not selected correctly.  #102

@ktangsali

Description

@ktangsali

Version

0.2.0

On which installation method(s) does this occur?

Docker

Describe the issue

This can mostly be fixed by modifying the number of samples in the datapipe (for example here) to be divisible by the number of processors/GPUs.

A long term fix would be to automatically avoid failure cases where the size is not exactly divisible by the number of GPUs.

Minimum reproducible example

No response

Relevant log output

No response

Environment details

No response

Metadata

Metadata

Labels

0 - BacklogIn queue waiting for assignmentbugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions