-
Notifications
You must be signed in to change notification settings - Fork 65
Description
I have been testing verbs IMB-IO (imb 2021.8) with impi 2021.14 + libfabric v2.0.0, and with ompi 5.0.6 + libfabric v2.0.0. Both of these MPIs show a high variable of time for every test ranging from 1450 t_avg[usec] to 10,000+ t_avg[usec] and many times will abort due to timeout. My timeout is set to 14400 however these tests never used to hit this timeout limit.
I am running with slurm using slurm for node management with this environment and run commands.
PATH=libfabric/bin:mpi/bin:$PATH
LD_LIBRARY_PATH=libfabric/lib:mpi/lib:$LD_LIBRARY_PATH
FI_PROVIDER=verbs;ofi_rxm
I_MPI_JOB_TIMEOUT="14400"
I_MPI_PIN_ORDER="bunch"
I_MPI_PIN_PROCESSOR_LIST="allcores"
IMPI: mpirun -ppn 2 -n 4 /path/to/IMB-IO C_IRead_Expl -aggregate_mode non_aggregate -npmin 4
OMPI: mpirun -n 4 --mca opal_common_ofi_provider_include "verbs;ofi_rxm" --mca mtl ofi --mca btl ^tcp,ofi,vader,openib --bind-to core --map-by ppr:2:node --oversubscribe /path/to/IMB-IO P_IWrite_Shared -aggregate_mode non_aggregate -npmin 4
Is this test expected to be this slow? Is there an option that needs to be set to make it faster?
Timeout error is: "time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit."