Skip to content

Conversation

@yqin
Copy link

@yqin yqin commented Mar 23, 2020

This PR should address the difference between what PCI-e link layer reports and driver reports.

@Obihoernchen
Copy link

Works very well!

@kcgthb
Copy link

kcgthb commented Apr 8, 2021

It works very well for HCAs using the mlx5_core module, indeed.

But AFAICT, mlx4_ib doesn't provide the /sys/class/infiniband/<device>/device/current_link_{speed,width} files, which make the check fail:

# nhc -d -e 'check_hw_ib 56'
DEBUG:  Debugging activated via -d option.
DEBUG:  Evaluating single check line:  check_hw_ib 56
[...]
/etc/nhc/scripts/lbnl_hw.nhc: line 134: /sys/class/infiniband/mlx4_0/device/current_link_speed: No such file or directory
/etc/nhc/scripts/lbnl_hw.nhc: line 137: /sys/class/infiniband/mlx4_0/device/current_link_width: No such file or directory
[1617908392] - DEBUG:  Found ACTIVE (LinkUp) IB Port mlx4_0:1 (56 Gb/sec) with PCI-e link 56x 56 GT/s

This is with a ConnectX-3 card: "Mellanox Technologies MT27500 Family [ConnectX-3]"

@mej mej self-assigned this Apr 18, 2021
@mej mej added this to the 1.5 Release milestone Apr 18, 2021
@mej
Copy link
Owner

mej commented Apr 18, 2021

This looks great; thanks, Yong!

I will address @kcgthb's comments by checking for the existence of the file prior to merge. So thanks to you both! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants