|
| 1 | +--- |
| 2 | +title: Kubernetes Conditions Integration for SR-IOV Network Operator CRDs |
| 3 | +authors: |
| 4 | + - SR-IOV Network Operator Team |
| 5 | +reviewers: |
| 6 | + - TBD |
| 7 | +creation-date: 21-07-2025 |
| 8 | +last-updated: 21-07-2025 |
| 9 | +--- |
| 10 | + |
| 11 | +# Kubernetes Conditions Integration for SR-IOV Network Operator CRDs |
| 12 | + |
| 13 | +## Summary |
| 14 | + |
| 15 | +This proposal enhances the observability and operational transparency of the SR-IOV Network Operator by integrating standard Kubernetes conditions into the status of its key Custom Resource Definitions (CRDs). This will enable users and automated systems to easily understand the current state, progress, and health of SR-IOV network configurations and components directly through Kubernetes API objects. |
| 16 | + |
| 17 | +## Motivation |
| 18 | + |
| 19 | +Adding Kubernetes conditions to the SR-IOV Network Operator's CRDs is crucial for several reasons: |
| 20 | + |
| 21 | +* **Improved Observability:** Conditions provide a standardized, machine-readable way to convey the state of a resource, including its readiness, progress, and any encountered issues. This allows for better monitoring and debugging. |
| 22 | + |
| 23 | +* **Enhanced User Experience:** Users can quickly ascertain the health and status of their `SriovNetwork`, `SriovIBNetwork`, `OVSNetwork`, `SriovNetworkNodeState`, `SriovOperatorConfig`, and `SriovNetworkPoolConfig` resources without needing to delve into logs or complex operator-specific status fields. |
| 24 | + |
| 25 | +* **Standardized API Interaction:** Aligning with Kubernetes' best practices for API object status makes the SR-IOV operator more consistent with other Kubernetes operators and native resources, simplifying integration with existing tooling (e.g., `kubectl wait`, Prometheus alerts). |
| 26 | + |
| 27 | +* **Automated Remediation and Orchestration:** External controllers or automation tools can reliably react to changes in resource conditions, enabling more robust and intelligent orchestration workflows and automated problem resolution. |
| 28 | + |
| 29 | +* **Clearer Error Reporting:** Specific conditions can indicate different types of errors (e.g., `Degraded`, `Available`, `Progressing`), providing more granular insight into failures. |
| 30 | + |
| 31 | +* **Simplified Troubleshooting:** When a resource is not in the desired state, conditions can point directly to the reason, accelerating troubleshooting. |
| 32 | + |
| 33 | +### Use Cases |
| 34 | + |
| 35 | +1. **Network Resource Provisioning Status:** |
| 36 | + - A user creates a `SriovNetwork`, `SriovIBNetwork`, or `OVSNetwork` custom resource |
| 37 | + - Condition `Available` is set to `True` once the network is successfully provisioned and ready for use by pods |
| 38 | + - Condition `Degraded` is set to `True` with a reason if the network provisioning fails |
| 39 | + |
| 40 | +2. **Node Configuration Health:** |
| 41 | + - The operator updates the `SriovNetworkNodeState` for a node |
| 42 | + - Condition `Progressing` when the operator is applying changes to the node's SR-IOV configuration |
| 43 | + - Condition `Degraded` if a node's SR-IOV configuration is incorrect |
| 44 | + - Condition `Ready` indicating the overall readiness of the SR-IOV components on that specific node |
| 45 | + |
| 46 | +3. **Operator Configuration Status:** |
| 47 | + - An administrator modifies the `SriovOperatorConfig` |
| 48 | + - Condition `Available` indicates that the operator's components are running and healthy |
| 49 | + - Condition `Degraded` if the operator itself encounters issues |
| 50 | + |
| 51 | +4. **Pool Configuration Management:** |
| 52 | + - An administrator creates or updates a `SriovNetworkPoolConfig` |
| 53 | + - Condition `Available` indicates that the pool configuration has been successfully applied to all target nodes |
| 54 | + - Condition `Progressing` when the pool configuration is being applied to the selected nodes |
| 55 | + - Condition `Degraded` if the pool configuration fails to apply or conflicts with existing configurations |
| 56 | + |
| 57 | +### Goals |
| 58 | + |
| 59 | +* Add standard Kubernetes conditions to all major SR-IOV CRDs (`SriovNetwork`, `SriovIBNetwork`, `OVSNetwork`, `SriovNetworkNodeState`, `SriovOperatorConfig`, `SriovNetworkPoolConfig`) |
| 60 | +* Implement consistent condition types across all CRDs where applicable |
| 61 | +* Ensure conditions are updated in real-time as resource states change |
| 62 | +* Maintain backward compatibility with existing status fields |
| 63 | +* Provide comprehensive documentation and examples for condition usage |
| 64 | +* Enable `kubectl wait` functionality for all resources |
| 65 | + |
| 66 | +### Non-Goals |
| 67 | + |
| 68 | +* Modifying existing status field structures (maintaining backward compatibility) |
| 69 | +* Adding conditions to deprecated or legacy CRDs |
| 70 | +* Implementing custom condition types beyond standard Kubernetes patterns |
| 71 | +* Changing existing controller reconciliation logic beyond condition updates |
| 72 | + |
| 73 | +## Proposal |
| 74 | + |
| 75 | +### Workflow Description |
| 76 | + |
| 77 | +The implementation will follow a phased approach to add conditions to each CRD: |
| 78 | + |
| 79 | +#### Phase 1: API Definition Updates |
| 80 | +1. Update CRD status structures to include `conditions []metav1.Condition` field |
| 81 | +2. Define standard condition types and their semantics for each CRD |
| 82 | + |
| 83 | +#### Phase 2: Controller Implementation |
| 84 | +1. Modify existing controllers to set and update conditions during reconciliation |
| 85 | +2. Implement condition helper functions for consistent condition management |
| 86 | +3. Ensure conditions are updated atomically with other status changes |
| 87 | + |
| 88 | +#### Phase 3: Integration and Testing |
| 89 | +1. Add comprehensive unit and integration tests for condition behavior |
| 90 | +2. Update documentation with condition examples and usage patterns |
| 91 | +3. Validate `kubectl wait` functionality |
| 92 | + |
| 93 | +### API Extensions |
| 94 | + |
| 95 | +#### Common Condition Types |
| 96 | + |
| 97 | +The following condition types will be used consistently across applicable CRDs: |
| 98 | + |
| 99 | +```go |
| 100 | +const ( |
| 101 | + // Progressing indicates that the resource is being actively reconciled |
| 102 | + ConditionProgressing = "Progressing" |
| 103 | + |
| 104 | + // Degraded indicates that the resource is not functioning as expected |
| 105 | + ConditionDegraded = "Degraded" |
| 106 | + |
| 107 | + // Ready indicates that the resource has reached its desired state |
| 108 | + ConditionReady = "Ready" |
| 109 | +) |
| 110 | +``` |
| 111 | + |
| 112 | +#### CRD-Specific Updates |
| 113 | + |
| 114 | +##### SriovNetwork Status Enhancement |
| 115 | + |
| 116 | +```go |
| 117 | +type SriovNetworkStatus struct { |
| 118 | + // +patchMergeKey=type |
| 119 | + // +patchStrategy=merge |
| 120 | + // +listType=map |
| 121 | + // +listMapKey=type |
| 122 | + Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"` |
| 123 | +} |
| 124 | +``` |
| 125 | + |
| 126 | +**Conditions:** |
| 127 | +- `Available`: NetworkAttachmentDefinition is created |
| 128 | +- `Degraded`: NetworkAttachmentDefinition creation failed or configuration is invalid |
| 129 | + |
| 130 | +##### SriovIBNetwork Status Enhancement |
| 131 | + |
| 132 | +```go |
| 133 | +type SriovIBNetworkStatus struct { |
| 134 | + // +patchMergeKey=type |
| 135 | + // +patchStrategy=merge |
| 136 | + // +listType=map |
| 137 | + // +listMapKey=type |
| 138 | + Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"` |
| 139 | +} |
| 140 | +``` |
| 141 | + |
| 142 | +**Conditions:** |
| 143 | +- `Available`: NetworkAttachmentDefinition is created |
| 144 | +- `Degraded`: NetworkAttachmentDefinition creation failed or configuration is invalid |
| 145 | + |
| 146 | +##### OVSNetwork Status Enhancement |
| 147 | + |
| 148 | +```go |
| 149 | +type OVSNetworkStatus struct { |
| 150 | + // +patchMergeKey=type |
| 151 | + // +patchStrategy=merge |
| 152 | + // +listType=map |
| 153 | + // +listMapKey=type |
| 154 | + Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"` |
| 155 | +} |
| 156 | +``` |
| 157 | + |
| 158 | +**Conditions:** |
| 159 | +- `Available`: NetworkAttachmentDefinition is created |
| 160 | +- `Degraded`: NetworkAttachmentDefinition creation failed or configuration is invalid |
| 161 | + |
| 162 | +##### SriovNetworkNodeState Status Enhancement |
| 163 | + |
| 164 | +```go |
| 165 | +type SriovNetworkNodeStateStatus struct { |
| 166 | + Interfaces InterfaceExts `json:"interfaces,omitempty"` |
| 167 | + Bridges Bridges `json:"bridges,omitempty"` |
| 168 | + System System `json:"system,omitempty"` |
| 169 | + SyncStatus string `json:"syncStatus,omitempty"` |
| 170 | + LastSyncError string `json:"lastSyncError,omitempty"` |
| 171 | + |
| 172 | + // +patchMergeKey=type |
| 173 | + // +patchStrategy=merge |
| 174 | + // +listType=map |
| 175 | + // +listMapKey=type |
| 176 | + Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"` |
| 177 | +} |
| 178 | +``` |
| 179 | + |
| 180 | +**Conditions:** |
| 181 | +- `Ready`: Node's SR-IOV configuration is complete and functional |
| 182 | +- `Progressing`: Node is being configured (VF creation, driver loading, node draining, etc.) |
| 183 | +- `Degraded`: Node configuration failed or hardware issues detected |
| 184 | + |
| 185 | +##### SriovOperatorConfig Status Enhancement |
| 186 | + |
| 187 | +```go |
| 188 | +type SriovOperatorConfigStatus struct { |
| 189 | + Injector string `json:"injector,omitempty"` |
| 190 | + OperatorWebhook string `json:"operatorWebhook,omitempty"` |
| 191 | + |
| 192 | + // +patchMergeKey=type |
| 193 | + // +patchStrategy=merge |
| 194 | + // +listType=map |
| 195 | + // +listMapKey=type |
| 196 | + Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"` |
| 197 | +} |
| 198 | +``` |
| 199 | + |
| 200 | +**Conditions:** |
| 201 | +- `Available`: Operator components are running and healthy |
| 202 | +- `Degraded`: Operator components are failing or misconfigured |
| 203 | +- `Progressing`: Operator configuration is being applied |
| 204 | + |
| 205 | +##### SriovNetworkPoolConfig Status Enhancement |
| 206 | + |
| 207 | +```go |
| 208 | +type SriovNetworkPoolConfigStatus struct { |
| 209 | + // +patchMergeKey=type |
| 210 | + // +patchStrategy=merge |
| 211 | + // +listType=map |
| 212 | + // +listMapKey=type |
| 213 | + Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"` |
| 214 | +} |
| 215 | +``` |
| 216 | + |
| 217 | +**Conditions:** |
| 218 | +- `Available`: Pool configuration has been successfully applied to all target nodes |
| 219 | +- `Progressing`: Pool configuration is being applied to selected nodes |
| 220 | +- `Degraded`: Pool configuration failed to apply or conflicts with existing configurations |
| 221 | + |
| 222 | +### Implementation Details/Notes/Constraints |
| 223 | + |
| 224 | +#### Condition Management Helper Functions |
| 225 | + |
| 226 | +```go |
| 227 | +// ConditionManager provides helper functions for managing conditions |
| 228 | +type ConditionManager struct{} |
| 229 | + |
| 230 | +func (cm *ConditionManager) SetCondition(conditions *[]metav1.Condition, conditionType string, status metav1.ConditionStatus, reason, message string, generation int64) { |
| 231 | + condition := metav1.Condition{ |
| 232 | + Type: conditionType, |
| 233 | + Status: status, |
| 234 | + Reason: reason, |
| 235 | + Message: message, |
| 236 | + ObservedGeneration: generation, |
| 237 | + } |
| 238 | + meta.SetStatusCondition(conditions, condition) |
| 239 | +} |
| 240 | + |
| 241 | +func (cm *ConditionManager) IsConditionTrue(conditions []metav1.Condition, conditionType string) bool { |
| 242 | + condition := meta.FindStatusCondition(conditions, conditionType) |
| 243 | + return condition != nil && condition.Status == metav1.ConditionTrue |
| 244 | +} |
| 245 | +``` |
| 246 | + |
| 247 | +#### Controller Integration Pattern |
| 248 | + |
| 249 | +```go |
| 250 | +func (r *SriovNetworkReconciler) updateConditions(ctx context.Context, sriovNetwork *sriovnetworkv1.SriovNetwork) error { |
| 251 | + cm := &ConditionManager{} |
| 252 | + |
| 253 | + // Check if NetworkAttachmentDefinition exists and is valid |
| 254 | + if nad, err := r.getNetworkAttachmentDefinition(ctx, sriovNetwork); err != nil { |
| 255 | + cm.SetCondition(&sriovNetwork.Status.Conditions, |
| 256 | + ConditionAvailable, metav1.ConditionFalse, |
| 257 | + "NetworkAttachmentDefinitionNotFound", err.Error(), |
| 258 | + sriovNetwork.Generation) |
| 259 | + cm.SetCondition(&sriovNetwork.Status.Conditions, |
| 260 | + ConditionDegraded, metav1.ConditionTrue, |
| 261 | + "ProvisioningFailed", err.Error(), |
| 262 | + sriovNetwork.Generation) |
| 263 | + } else { |
| 264 | + cm.SetCondition(&sriovNetwork.Status.Conditions, |
| 265 | + ConditionAvailable, metav1.ConditionTrue, |
| 266 | + "NetworkAvailable", "Network is successfully provisioned", |
| 267 | + sriovNetwork.Generation) |
| 268 | + cm.SetCondition(&sriovNetwork.Status.Conditions, |
| 269 | + ConditionDegraded, metav1.ConditionFalse, |
| 270 | + "NetworkHealthy", "Network is functioning correctly", |
| 271 | + sriovNetwork.Generation) |
| 272 | + } |
| 273 | + |
| 274 | + return r.Status().Update(ctx, sriovNetwork) |
| 275 | +} |
| 276 | +``` |
| 277 | + |
| 278 | +#### Backward Compatibility |
| 279 | + |
| 280 | +* Existing status fields will be preserved |
| 281 | +* Conditions will be added as optional fields |
| 282 | +* Controllers will continue to update legacy status fields alongside conditions |
| 283 | +* Client code relying on existing status fields will not be affected |
| 284 | + |
| 285 | +#### Error Handling |
| 286 | + |
| 287 | +* Condition updates will not block main reconciliation logic |
| 288 | +* Failed condition updates will be logged but won't cause reconciliation failure |
| 289 | +* Conditions will be updated atomically with other status changes when possible |
| 290 | + |
| 291 | +### Upgrade & Downgrade considerations |
| 292 | + |
| 293 | +#### Upgrade Considerations |
| 294 | + |
| 295 | +* New CRD versions with condition fields will be backward compatible |
| 296 | +* Existing CR instances will continue to function without conditions |
| 297 | +* Controllers will start populating conditions immediately after upgrade |
| 298 | +* No manual intervention required from users |
| 299 | + |
| 300 | +#### Downgrade Considerations |
| 301 | + |
| 302 | +* Conditions will be ignored by older controller versions |
| 303 | +* Existing status fields will continue to be populated |
| 304 | +* No data loss or functionality degradation during downgrade |
| 305 | +* CRD structure remains compatible with older API versions |
| 306 | + |
| 307 | +This proposal provides a comprehensive foundation for integrating Kubernetes conditions into the SR-IOV Network Operator, significantly improving observability and operational experience while maintaining full backward compatibility. |
0 commit comments