Skip to content

Inconsistent map of hardware threads from v3 onwards #63

@fgava90

Description

@fgava90

There is an inconsistency in mapping hardware threads from prrte v3 onwards.

I have a 2 sockets, 64 cores per socket, 2 threads per core system.

When using openmpi v4 options:
mpirun -np 4 --verbose --bind-to cpu-list:ordered --cpu-list 0,1,128,129 --report-bindings hostname
i obtain:

MCW rank 0 bound to socket 0[core 0[hwt 0]]
MCW rank 1 bound to socket 0[core 1[hwt 0]]
MCW rank 2 bound to socket 0[core 0[hwt 1]]
MCW rank 3 bound to socket 0[core 1[hwt 1]]

which i deem correct as other tools (hwloc, htop, ...) also consider 0-127 to be logical hwt 0, and 128-255 to be hwt 1.
Also /proc/cpuinfo gives:

...
...
processor   : 128
...
core id     : 0
...
...
processor   : 129
...
core id     : 1
....

When attempting the same with openmpi v5 i get this:
mpirun -np 4 --verbose --bind-to hwt --map-by pe-list=0,1,128,129:ordered:hwtcpus --report-bindings --mca mca_base_verbose debug --mca rmaps_base_verbose 100 hostname

Rank 0 bound to package[0][hwt:0]
Rank 1 bound to package[0][hwt:128]
Rank 2 bound to package[1][hwt:64]
Rank 3 bound to package[1][hwt:192]

To me this looks like the mapping logic assumes:

0 = core 0, hwt 0
1 = core 0, hwt 1
128 = core 64, hwt 0
129 = core 64, hwt 1

instead of the correct map:

0 = core 0, hwt 0
1 = core 1, hwt 0
128 = core 0, hwt 1
129 = core 1, hwt 1

I've tested this with the current mpi main branch (last release is 5.0.8) and:

  1. internal pmix + prrte v3.0.11
  2. pmix master+ prrte v4.0.0

Below the info for the 2nd case as it's the current commit/release on everything

ompi_info
                Open MPI: 5.1.0a1
  Open MPI repo revision: cfb5505
   Open MPI release date: Unreleased developer copy
                 MPI API: 3.1.0
            Ident string: 5.1.0a1
pmix_info
                    PMIX: 7.0.0a1
      PMIX repo revision: 5ee562a
       PMIX release date: Unreleased developer copy
           PMIX Standard: 4.1
       PMIX Standard ABI: Stable (0.0), Provisional (0.0)
prte_info
                    PRTE: 4.0.0a1v4.0.0
      PRTE repo revision: v4.0.0
       PRTE release date: @PMIX_RELEASE_DATE@
                    PMIx: OpenPMIx 7.0.0a1, repo rev: 5ee562a (PMIx Standard: 4.1, Stable ABI: 0.0, Provisional ABI: 0.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions