Hello ~
This system occured unexpect reboot.
I saw some logs before unexpected reboot in /var/log/syslog.
Dec 20 18:48:09 A100-42 kernel: nv_mem nv_get_p2p_free_callback:155 nv_get_p2p_free_callback -- invalid dma_mapping
Dec 20 18:48:09 A100-42 kernel: nv_mem nv_get_p2p_free_callback:155 nv_get_p2p_free_callback -- invalid dma_mapping
What is these logs mean?
Do that logs have relationship with unexpected reboot?
[ENV]
OS: ubuntu 20.04
Kernel : 5.4.0-42-generic
H/W : Supermicro AS-4124GO-NART (like DGX A100)
[GPU : 8ea]
NVIDIA A100-SXM4-80GB
Driver Version : 470.103.01
CUDA Version : 11.4
[IB : 8ea]
Ofed ver : OFED-5.6.0.1.6.1
nv_peer_mem : v1.0
CA 'mlx5_0'
CA type: MT4123
Number of ports: 1
Firmware version: 20.32.1010
Hardware version: 0
Node GUID: 0x08c0eb0300c8ff40
System image GUID: 0x08c0eb0300c8ff40
Port 1:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 173
LMC: 0
SM lid: 233
Capability mask: 0x2651e848
Port GUID: 0x08c0eb0300c8ff40
Link layer: InfiniBand
Thanks ~