Skip to content

Define and enforce field mutability policy for DocumentDB spec #308

@WentingWu666666

Description

@WentingWu666666

Problem

When users update fields in the DocumentDBSpec after a cluster is created, the changes are silently ignored for most fields. There is no validation webhook to reject immutable field changes, and no reconciliation logic to propagate mutable field changes to the underlying CNPG cluster. This creates confusion users think their changes took effect when they didn't.

Related: #306 (reconciliation loop does not propagate spec changes)

Proposed Solution

We need to make an alignment decision for every spec field: either it should be immutable (reject changes via webhook validation with a clear error) or mutable (ensure changes propagate correctly to the CNPG cluster).

Field-by-Field Analysis

Fields That Currently Propagate

Field Mechanism Notes
documentDBImage upgradeDocumentDBIfNeeded() JSON Patch Image upgrade path works
gatewayImage upgradeDocumentDBIfNeeded() + restart annotation Gateway rolling restart works
postgresImage upgradeDocumentDBIfNeeded() JSON Patch PostgreSQL image upgrade works
tls.gateway.* TLS secret sync in reconcile loop Secret name synced to CNPG plugin params
clusterReplication.primary TryUpdateCluster() JSON Patch Only for replicated clusters
clusterReplication.clusterList TryUpdateCluster() JSON Patch Only for replicated clusters

Fields That Do NOT Propagate (Silently Ignored)

Field Maps to CNPG Proposed Policy Rationale
instancesPerNode spec.instances Mutable propagate Core scaling operation, CNPG supports it
resource.storage.pvcSize spec.storage.size Mutable propagate PVC resize is supported by CNPG and most CSI drivers
resource.storage.storageClass spec.storage.storageClass Immutable reject Cannot change storage class of existing PVCs
resource.storage.persistentVolumeReclaimPolicy Applied to PVs Mutable propagate Can update PV reclaim policy on existing volumes
logLevel spec.logLevel Mutable propagate Operational change, CNPG supports it
timeouts.stopDelay spec.maxStopDelay Mutable propagate Operational tuning parameter
affinity spec.affinity Mutable propagate Scheduling changes, CNPG supports rolling updates
featureGates PostgreSQL parameters (e.g., wal_level) Mutable propagate Some may require restart; document which ones
documentDbCredentialSecret Plugin params Discuss mutable or immutable? Changing credential secret name mid-life is risky
backup.retentionDays Backup configuration Mutable propagate Operational setting

Fields That Should Be Immutable After Creation

Field Rationale
nodeCount Currently hardcoded to 1 (min=1, max=1); changing would require multi-node support
bootstrap Only used during initial cluster creation; changing after has no effect
sidecarInjectorPluginName Plugin identity should not change; could break running sidecars
walReplicaPluginName Plugin identity should not change
exposeViaService.serviceType Changing from LoadBalancerClusterIP may require service recreation; discuss if this should be mutable
environment Determines cloud-specific annotations; changing after deployment may cause issues

Fields Needing Discussion

Field Question
documentDBVersion Currently derived into image fields. Should it be directly mutable or only through image fields?
clusterReplication (turning on/off) TryUpdateCluster() explicitly skips this case with a // FOR NOW comment. What's the plan?
exposeViaService.serviceType Could we support switching service types with a rolling update?

Root Cause

The main gap is in TryUpdateCluster() (physical_replication.go L320-323):

if current.Spec.ReplicaCluster == nil || desired.Spec.ReplicaCluster == nil {
    // FOR NOW assume that we aren't going to turn on or off physical replication
    return nil, -1
}

For non-replicated clusters (the common case), this returns immediately so NO spec changes are compared or patched. The function only handles ReplicaCluster.Primary and ExternalClusters changes for replicated clusters.

Implementation Plan

  1. Phase 1: Immutability validation Add a validating webhook (or CEL validation rules) to reject changes to fields designated as immutable. This is the quick win that prevents user confusion.

  2. Phase 2: Mutable field propagation Extend TryUpdateCluster() (or add a new update mechanism) to diff and patch all mutable fields on the CNPG cluster. This addresses Reconciliation loop does not propagate spec changes to existing CNPG clusters #306.

  3. Phase 3: Status reporting Add status conditions that report when a spec change is pending propagation or was rejected.

Acceptance Criteria

  • Every DocumentDB spec field has a documented mutability policy (immutable or mutable)
  • Immutable fields return a clear validation error when users try to change them
  • Mutable fields propagate correctly to the underlying CNPG cluster
  • No spec changes are silently ignored

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions