Introducing `reduce` / `all_reduce` in RapidsMPF

RapidsMPF needs two kinds of reduction operations:

1. Fixed-size reduction
2. Dynamic-size reduction over packed data

These two cases have different constraints and therefore belong at different layers of the stack.

### Fixed-size reduction

Fixed-size reductions operate on buffers with a known size and layout (scalars, POD structs, fixed arrays).
Both MPI and UCX+UCC provide native `reduce` / `allreduce` for this case.

Because of this, fixed-size `reduce` / `all_reduce` should be part of the `Communicator` API:

* MPI backend → `MPI_Allreduce` / `MPI_Reduce`
* UCX/UCC backend → UCC `ALLREDUCE` collectives

This gives us efficient, backend-optimized collectives for simple fixed-size data.

However, UCXX currently does not expose UCC support, so enabling UCC-based collectives may require significant work (maybe a lot of work, @pentschev?). As an intermediate step, the UCXX communicator in RapidsMPF could implement `reduce` using the existing send/recv primitives.

### Dynamic-size reduction (packed data)

Packed data varies in both size and structure, and no backend (MPI or UCX/UCC) supports all-reduce for variable-length payloads.

Supporting this case requires a custom protocol implemented above the communicator:

* exchange sizes
* reserve/allocate buffers (with spilling if needed)
* apply a user-provided reduction operator

For this reason, dynamic-size reductions should *not* be part of the low-level `Communicator`, but instead implemented in higher-level RapidsMPF logic.


### Questions

* Do we agree that **fixed-size reduction** is the highest priority?
* Do we need **user-defined reduction operators** for fixed-size reduction, or is it sufficient to provide basic built-in operators (ADD, MUL, MIN, MAX) on fundamental datatypes such as `int` and `float`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introducing `reduce` / `all_reduce` in RapidsMPF #671

Fixed-size reduction

Dynamic-size reduction (packed data)

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Introducing reduce / all_reduce in RapidsMPF #671

Description

Fixed-size reduction

Dynamic-size reduction (packed data)

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Introducing `reduce` / `all_reduce` in RapidsMPF #671