Numerical Discrepancies Between CUDA-Aware MPI and Standard MPI on Multi-GPU Runs

We're observing numerical discrepancies when running PyFR with CUDA-aware MPI compared to standard MPI. These differences persist across different machines and configurations, raising concerns about the consistency of numerical results when using CUDA-aware MP

### Expected Behavior:
Numerical results should be consistent between CUDA-aware MPI and standard MPI, with differences limited to floating-point round-off error (e.g., last few digits).

### Actual Behavior:
- Differences are observed beyond typical floating-point round-off tolerance.
- Discrepancies are more pronounced in multi-GPU runs.
- The differences persist across multiple machines with different hardware configurations.

### Suggested Next Steps:
- Investigate whether different code paths are taken in PyFR when using CUDA-aware MPI vs. standard MPI.
- Check if asynchronous communication or stream handling differs between the two MPI modes.
- Confirm if the issue is related to order of operations or numerical reductions performed across GPUs.
- Provide guidance on recommended MPI configurations for reproducible numerical results with CUDA.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerical Discrepancies Between CUDA-Aware MPI and Standard MPI on Multi-GPU Runs #1

Expected Behavior:

Actual Behavior:

Suggested Next Steps:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Numerical Discrepancies Between CUDA-Aware MPI and Standard MPI on Multi-GPU Runs #1

Description

Expected Behavior:

Actual Behavior:

Suggested Next Steps:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions