🐛[BUG]: ERA5 DALI datapipe hangs indefinitely in multi-GPU/multi-Node setting if the datapipe size is not selected correctly. 

### Version

0.2.0

### On which installation method(s) does this occur?

Docker

### Describe the issue

This can mostly be fixed by modifying the number of samples in the datapipe (for example [here](https://github.com/NVIDIA/modulus-launch/blob/main/examples/weather/fcn_afno/train_era5.py#L119)) to be divisible by the number of processors/GPUs. 

A long term fix would be to automatically avoid failure cases where the size is not exactly divisible by the number of GPUs. 

### Minimum reproducible example

_No response_

### Relevant log output

_No response_

### Environment details

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛[BUG]: ERA5 DALI datapipe hangs indefinitely in multi-GPU/multi-Node setting if the datapipe size is not selected correctly. #102

Version

On which installation method(s) does this occur?

Describe the issue

Minimum reproducible example

Relevant log output

Environment details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛[BUG]: ERA5 DALI datapipe hangs indefinitely in multi-GPU/multi-Node setting if the datapipe size is not selected correctly. #102

Description

Version

On which installation method(s) does this occur?

Describe the issue

Minimum reproducible example

Relevant log output

Environment details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions