Skip to content

Commit de84294

Browse files
committed
Update documentation
Updates documentation to explain the new timeout and retry features. Signed-off-by: Tao Liu <[email protected]>
1 parent 0d0efbb commit de84294

File tree

2 files changed

+53
-3
lines changed

2 files changed

+53
-3
lines changed

docs/user-guide/cluster-configuration.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,48 @@ This guide provides instructions for Day‑2 configuration change use cases, to
2222

2323
## Use Cases
2424

25+
Day 2 configuration changes are supported for both hardware configuration updates and policy parameter changes. The system supports retry scenarios even after previous configuration attempts have timed out or failed.
26+
27+
### Hardware Configuration Timeouts and Retry
28+
29+
When a configuration operation times out or fails, the system supports retry through spec changes.
30+
31+
#### Retry Mechanism
32+
33+
* **Configuration timeouts/failures**: Can be retried by updating the ProvisioningRequest spec
34+
* **Provisioning timeouts/failures**: Cannot be retried; the ProvisioningRequest must be deleted and recreated
35+
* **Retry mechanism**: Uses `ConfigTransactionId` (set to ProvisioningRequest generation) to track
36+
configuration changes. When the ProvisioningRequest spec changes, the generation increments, creating
37+
a new `ConfigTransactionId`. The system compares this with `ObservedConfigTransactionId` to detect
38+
spec changes and trigger new configuration attempts.
39+
* **Terminal state override**: The system allows clearing terminal states (timeout/failed) when the ProvisioningRequest is in pending state due to spec changes, **except for hardware provisioning timeouts/failures which require deleting and recreating the ProvisioningRequest**.
40+
41+
#### Troubleshooting Configuration Timeouts
42+
43+
To troubleshoot:
44+
45+
1. **Check configuration status**:
46+
47+
```console
48+
oc get provisioningrequest <UUID> -o yaml
49+
```
50+
51+
Look for `HardwareConfigured` condition with `reason: TimedOut`
52+
53+
2. **Check Metal3 plugin logs**:
54+
55+
```console
56+
oc logs -n <metal3-plugin-namespace> -l app=metal3-hardwareplugin-server -f
57+
```
58+
59+
3. **Retry configuration**:
60+
* Update the ProvisioningRequest spec to trigger a new configuration attempt
61+
* The system will clear the terminal state and start a new configuration
62+
* **Before retrying, check BareMetalHost (BMH) state**:
63+
* If BMH is in `servicing` state, wait for it to complete first before retrying
64+
* If BMH is in `servicing error` state, retry might not work, especially for consistent power management errors
65+
* Use `oc get bmh -n <namespace>` to check BMH status
66+
2567
### Updates to the clusterInstanceParameters field under ProvisioningRequest spec.templateParameters
2668

2769
A ProvisioningRequest can be edited to update:

docs/user-guide/cluster-provisioning.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -337,24 +337,32 @@ Default timeouts:
337337

338338
- Hardware provisioning: 90m
339339
- Cluster installation: 90m
340-
- Cluster configuration 30m
340+
- Cluster configuration: 30m
341341

342-
These timeouts can be configured in their respective ConfigMaps or resource spec fields. The timeout value should be a duration string. For example:
342+
#### Hardware Provisioning Timeout
343343

344-
For hardware provisioning, set in the `spec.templates.hwTemplate` hardware template resource:
344+
The timeout is configured in the `HardwareTemplate` resource.
345+
346+
Configure hardware provisioning timeout in the `spec.templates.hwTemplate` hardware template resource:
345347

346348
``` yaml
347349
spec:
348350
hardwareProvisioningTimeout: "100m"
349351
```
350352

353+
If not specified, the default timeout value (90m) will be applied.
354+
355+
#### Cluster Installation Timeout
356+
351357
For cluster installation, set in the `spec.templates.clusterInstanceDefaults` ConfigMap:
352358

353359
``` yaml
354360
data:
355361
clusterInstallationTimeout: "100m"
356362
```
357363
364+
#### Cluster Configuration Timeout
365+
358366
For cluster configuration, set in the `spec.templates.policyTemplateDefaults` ConfigMap:
359367

360368
``` yaml

0 commit comments

Comments
 (0)