Skip to content

Numerical Discrepancies Between CUDA-Aware MPI and Standard MPI on Multi-GPU Runs #1

@ia267

Description

@ia267

We're observing numerical discrepancies when running PyFR with CUDA-aware MPI compared to standard MPI. These differences persist across different machines and configurations, raising concerns about the consistency of numerical results when using CUDA-aware MP

Expected Behavior:

Numerical results should be consistent between CUDA-aware MPI and standard MPI, with differences limited to floating-point round-off error (e.g., last few digits).

Actual Behavior:

  • Differences are observed beyond typical floating-point round-off tolerance.
  • Discrepancies are more pronounced in multi-GPU runs.
  • The differences persist across multiple machines with different hardware configurations.

Suggested Next Steps:

  • Investigate whether different code paths are taken in PyFR when using CUDA-aware MPI vs. standard MPI.
  • Check if asynchronous communication or stream handling differs between the two MPI modes.
  • Confirm if the issue is related to order of operations or numerical reductions performed across GPUs.
  • Provide guidance on recommended MPI configurations for reproducible numerical results with CUDA.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions