-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Problem
When users update fields in the DocumentDBSpec after a cluster is created, the changes are silently ignored for most fields. There is no validation webhook to reject immutable field changes, and no reconciliation logic to propagate mutable field changes to the underlying CNPG cluster. This creates confusion users think their changes took effect when they didn't.
Related: #306 (reconciliation loop does not propagate spec changes)
Proposed Solution
We need to make an alignment decision for every spec field: either it should be immutable (reject changes via webhook validation with a clear error) or mutable (ensure changes propagate correctly to the CNPG cluster).
Field-by-Field Analysis
Fields That Currently Propagate
| Field | Mechanism | Notes |
|---|---|---|
documentDBImage |
upgradeDocumentDBIfNeeded() JSON Patch |
Image upgrade path works |
gatewayImage |
upgradeDocumentDBIfNeeded() + restart annotation |
Gateway rolling restart works |
postgresImage |
upgradeDocumentDBIfNeeded() JSON Patch |
PostgreSQL image upgrade works |
tls.gateway.* |
TLS secret sync in reconcile loop | Secret name synced to CNPG plugin params |
clusterReplication.primary |
TryUpdateCluster() JSON Patch |
Only for replicated clusters |
clusterReplication.clusterList |
TryUpdateCluster() JSON Patch |
Only for replicated clusters |
Fields That Do NOT Propagate (Silently Ignored)
| Field | Maps to CNPG | Proposed Policy | Rationale |
|---|---|---|---|
instancesPerNode |
spec.instances |
Mutable propagate | Core scaling operation, CNPG supports it |
resource.storage.pvcSize |
spec.storage.size |
Mutable propagate | PVC resize is supported by CNPG and most CSI drivers |
resource.storage.storageClass |
spec.storage.storageClass |
Immutable reject | Cannot change storage class of existing PVCs |
resource.storage.persistentVolumeReclaimPolicy |
Applied to PVs | Mutable propagate | Can update PV reclaim policy on existing volumes |
logLevel |
spec.logLevel |
Mutable propagate | Operational change, CNPG supports it |
timeouts.stopDelay |
spec.maxStopDelay |
Mutable propagate | Operational tuning parameter |
affinity |
spec.affinity |
Mutable propagate | Scheduling changes, CNPG supports rolling updates |
featureGates |
PostgreSQL parameters (e.g., wal_level) |
Mutable propagate | Some may require restart; document which ones |
documentDbCredentialSecret |
Plugin params | Discuss mutable or immutable? | Changing credential secret name mid-life is risky |
backup.retentionDays |
Backup configuration | Mutable propagate | Operational setting |
Fields That Should Be Immutable After Creation
| Field | Rationale |
|---|---|
nodeCount |
Currently hardcoded to 1 (min=1, max=1); changing would require multi-node support |
bootstrap |
Only used during initial cluster creation; changing after has no effect |
sidecarInjectorPluginName |
Plugin identity should not change; could break running sidecars |
walReplicaPluginName |
Plugin identity should not change |
exposeViaService.serviceType |
Changing from LoadBalancerClusterIP may require service recreation; discuss if this should be mutable |
environment |
Determines cloud-specific annotations; changing after deployment may cause issues |
Fields Needing Discussion
| Field | Question |
|---|---|
documentDBVersion |
Currently derived into image fields. Should it be directly mutable or only through image fields? |
clusterReplication (turning on/off) |
TryUpdateCluster() explicitly skips this case with a // FOR NOW comment. What's the plan? |
exposeViaService.serviceType |
Could we support switching service types with a rolling update? |
Root Cause
The main gap is in TryUpdateCluster() (physical_replication.go L320-323):
if current.Spec.ReplicaCluster == nil || desired.Spec.ReplicaCluster == nil {
// FOR NOW assume that we aren't going to turn on or off physical replication
return nil, -1
}For non-replicated clusters (the common case), this returns immediately so NO spec changes are compared or patched. The function only handles ReplicaCluster.Primary and ExternalClusters changes for replicated clusters.
Implementation Plan
-
Phase 1: Immutability validation Add a validating webhook (or CEL validation rules) to reject changes to fields designated as immutable. This is the quick win that prevents user confusion.
-
Phase 2: Mutable field propagation Extend
TryUpdateCluster()(or add a new update mechanism) to diff and patch all mutable fields on the CNPG cluster. This addresses Reconciliation loop does not propagate spec changes to existing CNPG clusters #306. -
Phase 3: Status reporting Add status conditions that report when a spec change is pending propagation or was rejected.
Acceptance Criteria
- Every DocumentDB spec field has a documented mutability policy (immutable or mutable)
- Immutable fields return a clear validation error when users try to change them
- Mutable fields propagate correctly to the underlying CNPG cluster
- No spec changes are silently ignored