|
| 1 | +# SriovNetworkNodePolicy API Reference |
| 2 | + |
| 3 | +The SriovNetworkNodePolicy CRD is the key component of the SR-IOV network operator. This custom resource instructs the operator to: |
| 4 | + |
| 5 | +1. Render the spec of SriovNetworkNodeState CR for selected nodes to configure SR-IOV interfaces |
| 6 | +2. Deploy SR-IOV CNI plugin and device plugin on selected nodes |
| 7 | +3. Generate the configuration of SR-IOV device plugin |
| 8 | + |
| 9 | +**NOTE:** In virtual deployments, the VF interface is read-only and some fields have different behavior. |
| 10 | + |
| 11 | +## Basic SriovNetworkNodePolicy Example |
| 12 | + |
| 13 | +```yaml |
| 14 | +apiVersion: sriovnetwork.openshift.io/v1 |
| 15 | +kind: SriovNetworkNodePolicy |
| 16 | +metadata: |
| 17 | + name: policy-1 |
| 18 | + namespace: sriov-network-operator |
| 19 | +spec: |
| 20 | + deviceType: vfio-pci |
| 21 | + mtu: 1500 |
| 22 | + nicSelector: |
| 23 | + deviceID: "1583" |
| 24 | + rootDevices: |
| 25 | + - 0000:86:00.0 |
| 26 | + vendor: "8086" |
| 27 | + nodeSelector: |
| 28 | + feature.node.kubernetes.io/network-sriov.capable: "true" |
| 29 | + numVfs: 4 |
| 30 | + priority: 90 |
| 31 | + resourceName: intelnics |
| 32 | +``` |
| 33 | +
|
| 34 | +This example configures Intel XL710 NICs (vendor 8086, device 1583) on nodes labeled with `network-sriov.capable=true`, creating 4 VFs each with vfio-pci driver and MTU 1500. |
| 35 | + |
| 36 | +## SriovNetworkNodePolicy Spec Fields |
| 37 | + |
| 38 | +### Required Fields |
| 39 | + |
| 40 | +| Field | Type | Description | |
| 41 | +|-------|------|-------------| |
| 42 | +| `nodeSelector` | map[string]string | Kubernetes node selector to target specific nodes | |
| 43 | +| `resourceName` | string | Name for the device plugin resource pool | |
| 44 | + |
| 45 | +### NIC Selection |
| 46 | + |
| 47 | +| Field | Type | Description | |
| 48 | +|-------|------|-------------| |
| 49 | +| `nicSelector.vendor` | string | PCI vendor ID (e.g., "8086" for Intel) | |
| 50 | +| `nicSelector.deviceID` | string | PCI device ID | |
| 51 | +| `nicSelector.pfName` | []string | Physical function names (e.g., ["eno1", "eno2"]) | |
| 52 | +| `nicSelector.rootDevices` | []string | PCI addresses (e.g., ["0000:86:00.0"]) | |
| 53 | +| `nicSelector.netFilter` | string | Network interface name filter | |
| 54 | + |
| 55 | +### VF Configuration |
| 56 | + |
| 57 | +| Field | Type | Description | Virtual Deployment Notes | |
| 58 | +|-------|------|-------------|--------------------------| |
| 59 | +| `numVfs` | integer | Number of Virtual Functions to create | No effect (always 1 VF) | |
| 60 | +| `deviceType` | string | Driver to bind VFs ("netdevice", "vfio-pci") | Depends on underlying device capabilities | |
| 61 | +| `mtu` | integer | MTU size for VFs | Cannot be changed (set by platform) | |
| 62 | + |
| 63 | +### Advanced Configuration |
| 64 | + |
| 65 | +| Field | Type | Description | |
| 66 | +|-------|------|-------------| |
| 67 | +| `priority` | integer | Policy priority (0 is highest) for conflict resolution | |
| 68 | +| `isRdma` | boolean | Enable RDMA capabilities | |
| 69 | +| `needVhostNet` | boolean | Enable vhost-net for virtualized workloads | |
| 70 | +| `eSwitchMode` | string | Set eSwitch mode ("legacy", "switchdev") | |
| 71 | +| `externallyManaged` | boolean | Skip VF creation (user manages VFs) | |
| 72 | + |
| 73 | +### Link Configuration |
| 74 | + |
| 75 | +| Field | Type | Description | |
| 76 | +|-------|------|-------------| |
| 77 | +| `linkType` | string | Link type ("eth", "ETH", "ib", "IB") | |
| 78 | +| `spoofChk` | string | Spoof checking ("on", "off") | |
| 79 | +| `trust` | string | VF trust mode ("on", "off") | |
| 80 | +| `linkState` | string | VF link state ("auto", "enable", "disable") | |
| 81 | +| `maxTxRate` | integer | Maximum transmit rate (Mbps) | |
| 82 | +| `minTxRate` | integer | Minimum transmit rate (Mbps) | |
| 83 | + |
| 84 | +## Virtual Deployment Considerations |
| 85 | + |
| 86 | +In virtual environments (VMs): |
| 87 | + |
| 88 | +- **MTU**: Set by the underlying virtualization platform, cannot be changed |
| 89 | +- **numVfs**: Has no effect as there is always 1 VF per policy |
| 90 | +- **deviceType**: Depends on whether the device supports native-bifurcating drivers: |
| 91 | + - Mellanox devices: Use `netdevice` (default) for native-bifurcating support |
| 92 | + - Intel devices: Use `vfio-pci` for non-bifurcating devices |
| 93 | + |
| 94 | +```yaml |
| 95 | +# Example for virtual deployment with Intel NIC |
| 96 | +apiVersion: sriovnetwork.openshift.io/v1 |
| 97 | +kind: SriovNetworkNodePolicy |
| 98 | +metadata: |
| 99 | + name: vm-policy |
| 100 | +spec: |
| 101 | + deviceType: vfio-pci # Required for Intel in VMs |
| 102 | + nicSelector: |
| 103 | + rootDevices: ["0000:00:05.0"] # VF PCI address |
| 104 | + nodeSelector: |
| 105 | + kubernetes.io/hostname: "vm-worker-1" |
| 106 | + numVfs: 1 # Ignored in VMs |
| 107 | + resourceName: intel-vf |
| 108 | +``` |
| 109 | + |
| 110 | +## Multiple Policies and Priority |
| 111 | + |
| 112 | +When multiple SriovNetworkNodePolicy CRs target the same Physical Function, the `priority` field (0 is highest priority) resolves conflicts. |
| 113 | + |
| 114 | +### Policy Processing Order |
| 115 | +1. **Priority** (lowest number first) |
| 116 | +2. **Name** (alphabetical order) |
| 117 | + |
| 118 | +### Policy Merging Rules |
| 119 | +- Policies with same **priority** are merged if they don't overlap |
| 120 | +- Policies with **non-overlapping VF groups** (using #-notation) are merged |
| 121 | +- **Overlapping policies**: Only the highest priority policy applies |
| 122 | +- **Same priority + overlapping**: Last processed policy wins |
| 123 | + |
| 124 | +### VF Group Notation |
| 125 | + |
| 126 | +Use `#` notation to specify VF ranges: |
| 127 | + |
| 128 | +```yaml |
| 129 | +spec: |
| 130 | + nicSelector: |
| 131 | + pfName: ["eno1#0-3"] # VFs 0, 1, 2, 3 |
| 132 | + numVfs: 8 |
| 133 | + resourceName: group1 |
| 134 | +--- |
| 135 | +spec: |
| 136 | + nicSelector: |
| 137 | + pfName: ["eno1#4-7"] # VFs 4, 5, 6, 7 |
| 138 | + numVfs: 8 |
| 139 | + resourceName: group2 |
| 140 | +``` |
| 141 | + |
| 142 | +## Externally Managed Virtual Functions |
| 143 | + |
| 144 | +Set `externallyManaged: true` when you want to create VFs outside the operator: |
| 145 | + |
| 146 | +```yaml |
| 147 | +apiVersion: sriovnetwork.openshift.io/v1 |
| 148 | +kind: SriovNetworkNodePolicy |
| 149 | +metadata: |
| 150 | + name: external-vfs |
| 151 | +spec: |
| 152 | + externallyManaged: true |
| 153 | + deviceType: vfio-pci |
| 154 | + nicSelector: |
| 155 | + pfName: ["eno1"] |
| 156 | + nodeSelector: |
| 157 | + feature.node.kubernetes.io/network-sriov.capable: "true" |
| 158 | + numVfs: 4 |
| 159 | + resourceName: external-intelnics |
| 160 | +``` |
| 161 | + |
| 162 | +### Externally Managed Behavior |
| 163 | +- **Operator skips**: VF creation/deletion |
| 164 | +- **Operator handles**: Driver binding and device plugin configuration |
| 165 | +- **User responsibility**: Create VFs before applying policy |
| 166 | +- **Policy removal**: VFs are NOT removed |
| 167 | + |
| 168 | +### Use Cases |
| 169 | +- VFs needed for host networking (storage, management) |
| 170 | +- VFs must exist at boot time |
| 171 | +- Integration with other VF management tools |
| 172 | + |
| 173 | +### Creating VFs Externally |
| 174 | + |
| 175 | +Example using systemd service: |
| 176 | +```bash |
| 177 | +# /etc/systemd/system/create-sriov-vfs.service |
| 178 | +[Unit] |
| 179 | +Description=Create SR-IOV VFs |
| 180 | +Before=kubelet.service |
| 181 | +
|
| 182 | +[Service] |
| 183 | +Type=oneshot |
| 184 | +ExecStart=/bin/bash -c 'echo 4 > /sys/class/net/eno1/device/sriov_numvfs' |
| 185 | +RemainAfterExit=yes |
| 186 | +
|
| 187 | +[Install] |
| 188 | +WantedBy=multi-user.target |
| 189 | +``` |
| 190 | + |
| 191 | +## RDMA Configuration |
| 192 | + |
| 193 | +For RDMA workloads, set `isRdma: true` and ensure proper RDMA mode configuration: |
| 194 | + |
| 195 | +```yaml |
| 196 | +apiVersion: sriovnetwork.openshift.io/v1 |
| 197 | +kind: SriovNetworkNodePolicy |
| 198 | +metadata: |
| 199 | + name: rdma-policy |
| 200 | +spec: |
| 201 | + deviceType: netdevice |
| 202 | + isRdma: true |
| 203 | + nicSelector: |
| 204 | + pfName: ["eno1"] |
| 205 | + nodeSelector: |
| 206 | + feature.node.kubernetes.io/network-sriov.capable: "true" |
| 207 | + numVfs: 4 |
| 208 | + priority: 90 |
| 209 | + resourceName: rdma_exclusive_device |
| 210 | +``` |
| 211 | + |
| 212 | +See [RDMA Configuration Guide](rdma-configuration.md) for complete setup. |
| 213 | + |
| 214 | +## Switchdev Mode |
| 215 | + |
| 216 | +For OVS hardware offload, configure NICs in switchdev mode: |
| 217 | + |
| 218 | +```yaml |
| 219 | +apiVersion: sriovnetwork.openshift.io/v1 |
| 220 | +kind: SriovNetworkNodePolicy |
| 221 | +metadata: |
| 222 | + name: switchdev-policy |
| 223 | +spec: |
| 224 | + deviceType: netdevice |
| 225 | + eSwitchMode: switchdev |
| 226 | + nicSelector: |
| 227 | + pfName: ["eno1"] |
| 228 | + nodeSelector: |
| 229 | + feature.node.kubernetes.io/network-sriov.capable: "true" |
| 230 | + numVfs: 4 |
| 231 | + resourceName: switchdev-nics |
| 232 | +``` |
| 233 | + |
| 234 | +## Troubleshooting |
| 235 | + |
| 236 | +### Check Policy Status |
| 237 | +```bash |
| 238 | +kubectl get sriovnetworknodepolicy -n sriov-network-operator |
| 239 | +kubectl describe sriovnetworknodepolicy <policy-name> -n sriov-network-operator |
| 240 | +``` |
| 241 | + |
| 242 | +### Verify Node State |
| 243 | +```bash |
| 244 | +kubectl get sriovnetworknodestate -n sriov-network-operator |
| 245 | +kubectl describe sriovnetworknodestate <node-name> -n sriov-network-operator |
| 246 | +``` |
| 247 | + |
| 248 | +### Common Issues |
| 249 | + |
| 250 | +1. **Webhook Validation Failures** |
| 251 | + - VF range exceeds maxVfs capability |
| 252 | + - Invalid PCI addresses or device IDs |
| 253 | + - Missing required fields |
| 254 | + |
| 255 | +2. **Policy Conflicts** |
| 256 | + - Multiple policies targeting same PF with different configs |
| 257 | + - Check priority values and VF group overlaps |
| 258 | + |
| 259 | +3. **Virtual Deployment Issues** |
| 260 | + - Wrong deviceType for VM environment |
| 261 | + - Attempting to change read-only properties (MTU, numVfs) |
| 262 | + |
| 263 | +4. **External VF Management** |
| 264 | + - VFs not created before policy application |
| 265 | + - Incorrect numVfs value vs actual VFs created |
| 266 | + |
| 267 | +### Policy Validation |
| 268 | + |
| 269 | +The operator includes admission webhooks that validate policies: |
| 270 | + |
| 271 | +```bash |
| 272 | +# Check webhook logs |
| 273 | +kubectl logs deployment/sriov-network-operator -n sriov-network-operator |
| 274 | +kubectl logs deployment/sriov-network-operator-webhook -n sriov-network-operator |
| 275 | +``` |
| 276 | + |
| 277 | +### Node-Level Troubleshooting |
| 278 | + |
| 279 | +For issues with specific nodes, check the config daemon and device plugin logs: |
| 280 | + |
| 281 | +```bash |
| 282 | +# Check config daemon logs on specific node |
| 283 | +kubectl logs daemonset/sriov-config-daemon -n sriov-network-operator --field-selector spec.nodeName=<node-name> |
| 284 | +
|
| 285 | +# Check device plugin logs on specific node |
| 286 | +kubectl logs daemonset/sriov-device-plugin -n sriov-network-operator --field-selector spec.nodeName=<node-name> |
| 287 | +
|
| 288 | +# Alternative: Get pod name first, then check logs |
| 289 | +kubectl get pods -n sriov-network-operator -l app=sriov-config-daemon --field-selector spec.nodeName=<node-name> |
| 290 | +kubectl logs <config-daemon-pod-name> -n sriov-network-operator |
| 291 | +
|
| 292 | +kubectl get pods -n sriov-network-operator -l app=sriov-device-plugin --field-selector spec.nodeName=<node-name> |
| 293 | +kubectl logs <device-plugin-pod-name> -n sriov-network-operator |
| 294 | +``` |
0 commit comments