-
Notifications
You must be signed in to change notification settings - Fork 3
Description
There is an inconsistency in mapping hardware threads from prrte v3 onwards.
I have a 2 sockets, 64 cores per socket, 2 threads per core system.
When using openmpi v4 options:
mpirun -np 4 --verbose --bind-to cpu-list:ordered --cpu-list 0,1,128,129 --report-bindings hostname
i obtain:
MCW rank 0 bound to socket 0[core 0[hwt 0]]
MCW rank 1 bound to socket 0[core 1[hwt 0]]
MCW rank 2 bound to socket 0[core 0[hwt 1]]
MCW rank 3 bound to socket 0[core 1[hwt 1]]
which i deem correct as other tools (hwloc, htop, ...) also consider 0-127 to be logical hwt 0, and 128-255 to be hwt 1.
Also /proc/cpuinfo gives:
...
...
processor : 128
...
core id : 0
...
...
processor : 129
...
core id : 1
....
When attempting the same with openmpi v5 i get this:
mpirun -np 4 --verbose --bind-to hwt --map-by pe-list=0,1,128,129:ordered:hwtcpus --report-bindings --mca mca_base_verbose debug --mca rmaps_base_verbose 100 hostname
Rank 0 bound to package[0][hwt:0]
Rank 1 bound to package[0][hwt:128]
Rank 2 bound to package[1][hwt:64]
Rank 3 bound to package[1][hwt:192]
To me this looks like the mapping logic assumes:
0 = core 0, hwt 0
1 = core 0, hwt 1
128 = core 64, hwt 0
129 = core 64, hwt 1
instead of the correct map:
0 = core 0, hwt 0
1 = core 1, hwt 0
128 = core 0, hwt 1
129 = core 1, hwt 1
I've tested this with the current mpi main branch (last release is 5.0.8) and:
- internal pmix + prrte v3.0.11
- pmix master+ prrte v4.0.0
Below the info for the 2nd case as it's the current commit/release on everything
ompi_info
Open MPI: 5.1.0a1
Open MPI repo revision: cfb5505
Open MPI release date: Unreleased developer copy
MPI API: 3.1.0
Ident string: 5.1.0a1
pmix_info
PMIX: 7.0.0a1
PMIX repo revision: 5ee562a
PMIX release date: Unreleased developer copy
PMIX Standard: 4.1
PMIX Standard ABI: Stable (0.0), Provisional (0.0)
prte_info
PRTE: 4.0.0a1v4.0.0
PRTE repo revision: v4.0.0
PRTE release date: @PMIX_RELEASE_DATE@
PMIx: OpenPMIx 7.0.0a1, repo rev: 5ee562a (PMIx Standard: 4.1, Stable ABI: 0.0, Provisional ABI: 0.0)