Skip to content

Introducing reduce / all_reduce in RapidsMPF #671

@madsbk

Description

@madsbk

RapidsMPF needs two kinds of reduction operations:

  1. Fixed-size reduction
  2. Dynamic-size reduction over packed data

These two cases have different constraints and therefore belong at different layers of the stack.

Fixed-size reduction

Fixed-size reductions operate on buffers with a known size and layout (scalars, POD structs, fixed arrays).
Both MPI and UCX+UCC provide native reduce / allreduce for this case.

Because of this, fixed-size reduce / all_reduce should be part of the Communicator API:

  • MPI backend → MPI_Allreduce / MPI_Reduce
  • UCX/UCC backend → UCC ALLREDUCE collectives

This gives us efficient, backend-optimized collectives for simple fixed-size data.

However, UCXX currently does not expose UCC support, so enabling UCC-based collectives may require significant work (maybe a lot of work, @pentschev?). As an intermediate step, the UCXX communicator in RapidsMPF could implement reduce using the existing send/recv primitives.

Dynamic-size reduction (packed data)

Packed data varies in both size and structure, and no backend (MPI or UCX/UCC) supports all-reduce for variable-length payloads.

Supporting this case requires a custom protocol implemented above the communicator:

  • exchange sizes
  • reserve/allocate buffers (with spilling if needed)
  • apply a user-provided reduction operator

For this reason, dynamic-size reductions should not be part of the low-level Communicator, but instead implemented in higher-level RapidsMPF logic.

Questions

  • Do we agree that fixed-size reduction is the highest priority?
  • Do we need user-defined reduction operators for fixed-size reduction, or is it sufficient to provide basic built-in operators (ADD, MUL, MIN, MAX) on fundamental datatypes such as int and float?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions