Skip to content

Commit d6da94e

Browse files
committed
design proposal for conditions in operator CRDs
Signed-off-by: Sebastian Sch <[email protected]>
1 parent 8b60d24 commit d6da94e

File tree

1 file changed

+307
-0
lines changed

1 file changed

+307
-0
lines changed
Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
---
2+
title: Kubernetes Conditions Integration for SR-IOV Network Operator CRDs
3+
authors:
4+
- SR-IOV Network Operator Team
5+
reviewers:
6+
- TBD
7+
creation-date: 21-07-2025
8+
last-updated: 21-07-2025
9+
---
10+
11+
# Kubernetes Conditions Integration for SR-IOV Network Operator CRDs
12+
13+
## Summary
14+
15+
This proposal enhances the observability and operational transparency of the SR-IOV Network Operator by integrating standard Kubernetes conditions into the status of its key Custom Resource Definitions (CRDs). This will enable users and automated systems to easily understand the current state, progress, and health of SR-IOV network configurations and components directly through Kubernetes API objects.
16+
17+
## Motivation
18+
19+
Adding Kubernetes conditions to the SR-IOV Network Operator's CRDs is crucial for several reasons:
20+
21+
* **Improved Observability:** Conditions provide a standardized, machine-readable way to convey the state of a resource, including its readiness, progress, and any encountered issues. This allows for better monitoring and debugging.
22+
23+
* **Enhanced User Experience:** Users can quickly ascertain the health and status of their `SriovNetwork`, `SriovIBNetwork`, `OVSNetwork`, `SriovNetworkNodeState`, `SriovOperatorConfig`, and `SriovNetworkPoolConfig` resources without needing to delve into logs or complex operator-specific status fields.
24+
25+
* **Standardized API Interaction:** Aligning with Kubernetes' best practices for API object status makes the SR-IOV operator more consistent with other Kubernetes operators and native resources, simplifying integration with existing tooling (e.g., `kubectl wait`, Prometheus alerts).
26+
27+
* **Automated Remediation and Orchestration:** External controllers or automation tools can reliably react to changes in resource conditions, enabling more robust and intelligent orchestration workflows and automated problem resolution.
28+
29+
* **Clearer Error Reporting:** Specific conditions can indicate different types of errors (e.g., `Degraded`, `Available`, `Progressing`), providing more granular insight into failures.
30+
31+
* **Simplified Troubleshooting:** When a resource is not in the desired state, conditions can point directly to the reason, accelerating troubleshooting.
32+
33+
### Use Cases
34+
35+
1. **Network Resource Provisioning Status:**
36+
- A user creates a `SriovNetwork`, `SriovIBNetwork`, or `OVSNetwork` custom resource
37+
- Condition `Available` is set to `True` once the network is successfully provisioned and ready for use by pods
38+
- Condition `Degraded` is set to `True` with a reason if the network provisioning fails
39+
40+
2. **Node Configuration Health:**
41+
- The operator updates the `SriovNetworkNodeState` for a node
42+
- Condition `Progressing` when the operator is applying changes to the node's SR-IOV configuration
43+
- Condition `Degraded` if a node's SR-IOV configuration is incorrect
44+
- Condition `Ready` indicating the overall readiness of the SR-IOV components on that specific node
45+
46+
3. **Operator Configuration Status:**
47+
- An administrator modifies the `SriovOperatorConfig`
48+
- Condition `Available` indicates that the operator's components are running and healthy
49+
- Condition `Degraded` if the operator itself encounters issues
50+
51+
4. **Pool Configuration Management:**
52+
- An administrator creates or updates a `SriovNetworkPoolConfig`
53+
- Condition `Available` indicates that the pool configuration has been successfully applied to all target nodes
54+
- Condition `Progressing` when the pool configuration is being applied to the selected nodes
55+
- Condition `Degraded` if the pool configuration fails to apply or conflicts with existing configurations
56+
57+
### Goals
58+
59+
* Add standard Kubernetes conditions to all major SR-IOV CRDs (`SriovNetwork`, `SriovIBNetwork`, `OVSNetwork`, `SriovNetworkNodeState`, `SriovOperatorConfig`, `SriovNetworkPoolConfig`)
60+
* Implement consistent condition types across all CRDs where applicable
61+
* Ensure conditions are updated in real-time as resource states change
62+
* Maintain backward compatibility with existing status fields
63+
* Provide comprehensive documentation and examples for condition usage
64+
* Enable `kubectl wait` functionality for all resources
65+
66+
### Non-Goals
67+
68+
* Modifying existing status field structures (maintaining backward compatibility)
69+
* Adding conditions to deprecated or legacy CRDs
70+
* Implementing custom condition types beyond standard Kubernetes patterns
71+
* Changing existing controller reconciliation logic beyond condition updates
72+
73+
## Proposal
74+
75+
### Workflow Description
76+
77+
The implementation will follow a phased approach to add conditions to each CRD:
78+
79+
#### Phase 1: API Definition Updates
80+
1. Update CRD status structures to include `conditions []metav1.Condition` field
81+
2. Define standard condition types and their semantics for each CRD
82+
83+
#### Phase 2: Controller Implementation
84+
1. Modify existing controllers to set and update conditions during reconciliation
85+
2. Implement condition helper functions for consistent condition management
86+
3. Ensure conditions are updated atomically with other status changes
87+
88+
#### Phase 3: Integration and Testing
89+
1. Add comprehensive unit and integration tests for condition behavior
90+
2. Update documentation with condition examples and usage patterns
91+
3. Validate `kubectl wait` functionality
92+
93+
### API Extensions
94+
95+
#### Common Condition Types
96+
97+
The following condition types will be used consistently across applicable CRDs:
98+
99+
```go
100+
const (
101+
// Progressing indicates that the resource is being actively reconciled
102+
ConditionProgressing = "Progressing"
103+
104+
// Degraded indicates that the resource is not functioning as expected
105+
ConditionDegraded = "Degraded"
106+
107+
// Ready indicates that the resource has reached its desired state
108+
ConditionReady = "Ready"
109+
)
110+
```
111+
112+
#### CRD-Specific Updates
113+
114+
##### SriovNetwork Status Enhancement
115+
116+
```go
117+
type SriovNetworkStatus struct {
118+
// +patchMergeKey=type
119+
// +patchStrategy=merge
120+
// +listType=map
121+
// +listMapKey=type
122+
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"`
123+
}
124+
```
125+
126+
**Conditions:**
127+
- `Available`: NetworkAttachmentDefinition is created
128+
- `Degraded`: NetworkAttachmentDefinition creation failed or configuration is invalid
129+
130+
##### SriovIBNetwork Status Enhancement
131+
132+
```go
133+
type SriovIBNetworkStatus struct {
134+
// +patchMergeKey=type
135+
// +patchStrategy=merge
136+
// +listType=map
137+
// +listMapKey=type
138+
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"`
139+
}
140+
```
141+
142+
**Conditions:**
143+
- `Available`: NetworkAttachmentDefinition is created
144+
- `Degraded`: NetworkAttachmentDefinition creation failed or configuration is invalid
145+
146+
##### OVSNetwork Status Enhancement
147+
148+
```go
149+
type OVSNetworkStatus struct {
150+
// +patchMergeKey=type
151+
// +patchStrategy=merge
152+
// +listType=map
153+
// +listMapKey=type
154+
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"`
155+
}
156+
```
157+
158+
**Conditions:**
159+
- `Available`: NetworkAttachmentDefinition is created
160+
- `Degraded`: NetworkAttachmentDefinition creation failed or configuration is invalid
161+
162+
##### SriovNetworkNodeState Status Enhancement
163+
164+
```go
165+
type SriovNetworkNodeStateStatus struct {
166+
Interfaces InterfaceExts `json:"interfaces,omitempty"`
167+
Bridges Bridges `json:"bridges,omitempty"`
168+
System System `json:"system,omitempty"`
169+
SyncStatus string `json:"syncStatus,omitempty"`
170+
LastSyncError string `json:"lastSyncError,omitempty"`
171+
172+
// +patchMergeKey=type
173+
// +patchStrategy=merge
174+
// +listType=map
175+
// +listMapKey=type
176+
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"`
177+
}
178+
```
179+
180+
**Conditions:**
181+
- `Ready`: Node's SR-IOV configuration is complete and functional
182+
- `Progressing`: Node is being configured (VF creation, driver loading, node draining, etc.)
183+
- `Degraded`: Node configuration failed or hardware issues detected
184+
185+
##### SriovOperatorConfig Status Enhancement
186+
187+
```go
188+
type SriovOperatorConfigStatus struct {
189+
Injector string `json:"injector,omitempty"`
190+
OperatorWebhook string `json:"operatorWebhook,omitempty"`
191+
192+
// +patchMergeKey=type
193+
// +patchStrategy=merge
194+
// +listType=map
195+
// +listMapKey=type
196+
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"`
197+
}
198+
```
199+
200+
**Conditions:**
201+
- `Available`: Operator components are running and healthy
202+
- `Degraded`: Operator components are failing or misconfigured
203+
- `Progressing`: Operator configuration is being applied
204+
205+
##### SriovNetworkPoolConfig Status Enhancement
206+
207+
```go
208+
type SriovNetworkPoolConfigStatus struct {
209+
// +patchMergeKey=type
210+
// +patchStrategy=merge
211+
// +listType=map
212+
// +listMapKey=type
213+
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"`
214+
}
215+
```
216+
217+
**Conditions:**
218+
- `Available`: Pool configuration has been successfully applied to all target nodes
219+
- `Progressing`: Pool configuration is being applied to selected nodes
220+
- `Degraded`: Pool configuration failed to apply or conflicts with existing configurations
221+
222+
### Implementation Details/Notes/Constraints
223+
224+
#### Condition Management Helper Functions
225+
226+
```go
227+
// ConditionManager provides helper functions for managing conditions
228+
type ConditionManager struct{}
229+
230+
func (cm *ConditionManager) SetCondition(conditions *[]metav1.Condition, conditionType string, status metav1.ConditionStatus, reason, message string, generation int64) {
231+
condition := metav1.Condition{
232+
Type: conditionType,
233+
Status: status,
234+
Reason: reason,
235+
Message: message,
236+
ObservedGeneration: generation,
237+
}
238+
meta.SetStatusCondition(conditions, condition)
239+
}
240+
241+
func (cm *ConditionManager) IsConditionTrue(conditions []metav1.Condition, conditionType string) bool {
242+
condition := meta.FindStatusCondition(conditions, conditionType)
243+
return condition != nil && condition.Status == metav1.ConditionTrue
244+
}
245+
```
246+
247+
#### Controller Integration Pattern
248+
249+
```go
250+
func (r *SriovNetworkReconciler) updateConditions(ctx context.Context, sriovNetwork *sriovnetworkv1.SriovNetwork) error {
251+
cm := &ConditionManager{}
252+
253+
// Check if NetworkAttachmentDefinition exists and is valid
254+
if nad, err := r.getNetworkAttachmentDefinition(ctx, sriovNetwork); err != nil {
255+
cm.SetCondition(&sriovNetwork.Status.Conditions,
256+
ConditionAvailable, metav1.ConditionFalse,
257+
"NetworkAttachmentDefinitionNotFound", err.Error(),
258+
sriovNetwork.Generation)
259+
cm.SetCondition(&sriovNetwork.Status.Conditions,
260+
ConditionDegraded, metav1.ConditionTrue,
261+
"ProvisioningFailed", err.Error(),
262+
sriovNetwork.Generation)
263+
} else {
264+
cm.SetCondition(&sriovNetwork.Status.Conditions,
265+
ConditionAvailable, metav1.ConditionTrue,
266+
"NetworkAvailable", "Network is successfully provisioned",
267+
sriovNetwork.Generation)
268+
cm.SetCondition(&sriovNetwork.Status.Conditions,
269+
ConditionDegraded, metav1.ConditionFalse,
270+
"NetworkHealthy", "Network is functioning correctly",
271+
sriovNetwork.Generation)
272+
}
273+
274+
return r.Status().Update(ctx, sriovNetwork)
275+
}
276+
```
277+
278+
#### Backward Compatibility
279+
280+
* Existing status fields will be preserved
281+
* Conditions will be added as optional fields
282+
* Controllers will continue to update legacy status fields alongside conditions
283+
* Client code relying on existing status fields will not be affected
284+
285+
#### Error Handling
286+
287+
* Condition updates will not block main reconciliation logic
288+
* Failed condition updates will be logged but won't cause reconciliation failure
289+
* Conditions will be updated atomically with other status changes when possible
290+
291+
### Upgrade & Downgrade considerations
292+
293+
#### Upgrade Considerations
294+
295+
* New CRD versions with condition fields will be backward compatible
296+
* Existing CR instances will continue to function without conditions
297+
* Controllers will start populating conditions immediately after upgrade
298+
* No manual intervention required from users
299+
300+
#### Downgrade Considerations
301+
302+
* Conditions will be ignored by older controller versions
303+
* Existing status fields will continue to be populated
304+
* No data loss or functionality degradation during downgrade
305+
* CRD structure remains compatible with older API versions
306+
307+
This proposal provides a comprehensive foundation for integrating Kubernetes conditions into the SR-IOV Network Operator, significantly improving observability and operational experience while maintaining full backward compatibility.

0 commit comments

Comments
 (0)