Question on Host Device Networking Setup

Hi everyone, I'm trying to get K8s working with RoCE for a distributed, multi-node pytorch training job.

I'm struggling to get the Nvidia Network Operator running. From my understanding, I need to do two things:

1. Get RDMA verbs passed through to the containers
2. Get a secondary network working so that my RoCE NICs can talk to each other

My setup:

6 baremetal nodes of 8xH200s with RoCE Interconnects. Each node has 8 RoCE NICs
I have confirmed that the RoCE works using ib_send_bw on the bare metal nodes
I have K8s running on the nodes with gpu operator running. The cluster is already working as expected with single node training


My goal:

I'm trying to get host device network with RDMA working with the instructions from this doc:

https://docs.nvidia.com/networking/display/kubernetes2570/quick-start/host-device-rdma.html

I've installed the yaml configs shown in the document and now have hostdev: 10.

I've create containers on two nodes and I'm able to see the net1 IP on both nodes:

Pod A on node 1

547: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc mq state UP group default qlen 1000
    link/ether c4:70:bd:be:30:22 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.60/24 brd 192.168.3.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::c670:bdff:febe:3022/64 scope link 
       valid_lft forever preferred_lft forever

Pod B on node 2

543: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc mq state UP group default qlen 1000
    link/ether c4:70:bd:bd:6f:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.210/24 brd 192.168.3.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::c670:bdff:febd:6f8e/64 scope link 
       valid_lft forever preferred_lft forever

However, if I try to ping Pod B from Pod A, I get the following error.

[root@hostdev-test-5c9cdd96cf-bhg7h /]# ping 192.168.3.210
PING 192.168.3.210 (192.168.3.210) 56(84) bytes of data.
From 192.168.3.60 icmp_seq=1 Destination Host Unreachable
From 192.168.3.60 icmp_seq=2 Destination Host Unreachable
From 192.168.3.60 icmp_seq=3 Destination Host Unreachable


I believe it has to do with my bare metal node's netplan networking:

root@worker03:~# cat /etc/netplan/50-cloud-init.yaml
---
network:
  version: 2
  renderer: networkd
  bonds:
    bond0:
      addresses:
      - 10.2.201.28/24
      dhcp4: false
      dhcp6: false
      interfaces:
      - enp86s0f0
      - enp86s0f1
      nameservers:
        addresses:
        - 1.1.1.1
        - 8.8.8.8
      optional: false
      parameters:
        lacp-rate: fast
        mode: 802.3ad
        transmit-hash-policy: layer3+4
      routes:
      - to: default
        via: 10.2.201.254
  ethernets:
    enp86s0f0:
      dhcp4: false
      dhcp6: false
      optional: true
    enp86s0f1:
      dhcp4: false
      dhcp6: false
      optional: true
    rail1:
      ignore-carrier: true
      addresses: [172.16.0.44/31]
      routes:
      - to: 172.16.0.0/15
        via: 172.16.0.45
      mtu: 9216
    rail5:
      ignore-carrier: true
      addresses: [172.24.0.44/31]
      routes:
      - to: 172.24.0.0/15
        via: 172.24.0.45
      mtu: 9216
    rail2:
      ignore-carrier: true
      addresses: [172.18.0.44/31]
      routes:
      - to: 172.18.0.0/15
        via: 172.18.0.45
      mtu: 9216
    rail6:
      ignore-carrier: true
      addresses: [172.26.0.44/31]
      routes:
      - to: 172.26.0.0/15
        via: 172.26.0.45
      mtu: 9216
    rail3:
      ignore-carrier: true
      addresses: [172.20.0.44/31]
      routes:
      - to: 172.20.0.0/15
        via: 172.20.0.45
      mtu: 9216
    rail7:
      ignore-carrier: true
      addresses: [172.28.0.44/31]
      routes:
      - to: 172.28.0.0/15
        via: 172.28.0.45
      mtu: 9216
    rail4:
      ignore-carrier: true
      addresses: [172.22.0.44/31]
      routes:
      - to: 172.22.0.0/15
        via: 172.22.0.45
      mtu: 9216
    rail8:
      ignore-carrier: true
      addresses: [172.30.0.44/31]
      routes:
      - to: 172.30.0.0/15
        via: 172.30.0.45
      mtu: 9216

I'm pretty sure the issue is due to the netplan and advertising the IPs but I'm not sure how to get it to work with Nvidia Network Operator. Any help or pointers would be appreciated!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question on Host Device Networking Setup #1773

root@worker03:~# cat /etc/netplan/50-cloud-init.yaml

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question on Host Device Networking Setup #1773

Description

root@worker03:~# cat /etc/netplan/50-cloud-init.yaml

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions