Skip to content

IMB-IO very slow #62

@zachdworkin

Description

@zachdworkin

I have been testing verbs IMB-IO (imb 2021.8) with impi 2021.14 + libfabric v2.0.0, and with ompi 5.0.6 + libfabric v2.0.0. Both of these MPIs show a high variable of time for every test ranging from 1450 t_avg[usec] to 10,000+ t_avg[usec] and many times will abort due to timeout. My timeout is set to 14400 however these tests never used to hit this timeout limit.

I am running with slurm using slurm for node management with this environment and run commands.
PATH=libfabric/bin:mpi/bin:$PATH
LD_LIBRARY_PATH=libfabric/lib:mpi/lib:$LD_LIBRARY_PATH
FI_PROVIDER=verbs;ofi_rxm
I_MPI_JOB_TIMEOUT="14400"
I_MPI_PIN_ORDER="bunch"
I_MPI_PIN_PROCESSOR_LIST="allcores"

IMPI: mpirun -ppn 2 -n 4 /path/to/IMB-IO C_IRead_Expl -aggregate_mode non_aggregate -npmin 4

OMPI: mpirun -n 4 --mca opal_common_ofi_provider_include "verbs;ofi_rxm" --mca mtl ofi --mca btl ^tcp,ofi,vader,openib --bind-to core --map-by ppr:2:node --oversubscribe /path/to/IMB-IO P_IWrite_Shared -aggregate_mode non_aggregate -npmin 4

Is this test expected to be this slow? Is there an option that needs to be set to make it faster?
Timeout error is: "time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit."

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions