-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
The DocumentDB operator does not currently expose configuration options needed for safe Kubernetes node maintenance (OS updates, hardware replacement, K8s version upgrades). This enhancement requests adding first-class support for node maintenance scenarios through CRD configuration and documentation.
Problem
When users need to drain nodes running DocumentDB pods, they may encounter issues:
- Single-instance clusters blocked by PDBs: With
instancesPerNode: 1, the operator-created PodDisruptionBudgets (PDBs) blockkubectl drain, causing it to hang indefinitely. There is no way to disable or override PDBs from the DocumentDB CRD. - No
enablePDBconfiguration: Users cannot toggle PDB creation from the DocumentDB custom resource, which is necessary for single-instance dev/test clusters where node drains are common. - No
nodeMaintenanceWindowsettings: There is no way to signal to the operator that a planned maintenance is in progress, which would disable self-healing and allow pods to be safely evicted and rescheduled. - No documented guidance: Users performing node maintenance have no official documentation on how to safely drain nodes running DocumentDB workloads.
What CNPG Provides (Reference)
CloudNativePG has comprehensive support for node maintenance via:
spec.enablePDBToggle PodDisruptionBudgets on/off (needed for single-instance dev clusters)spec.nodeMaintenanceWindow.inProgressDisables self-healing during planned maintenancespec.nodeMaintenanceWindow.reusePVCControls whether to wait for the node to come back (true, reuses existing PVCs) or re-clone data to a new node (false)
Reference: https://github.com/cloudnative-pg/cloudnative-pg/blob/main/docs/src/kubernetes_upgrade.md
Proposed Scope
- Expose
enablePDBin the DocumentDB CRD Allow users to disable PDB creation, or handle it automatically based oninstancesPerNode(e.g., skip PDB wheninstancesPerNode: 1). - Consider exposing
nodeMaintenanceWindowsettings For advanced use cases such as bare-metal clusters or local storage setups, allow users to setinProgressandreusePVCflags. - Add documentation for Kubernetes node maintenance procedures Document the recommended steps for safely performing node maintenance (drain, cordon, upgrade, uncordon) with DocumentDB clusters, covering both single-instance and multi-instance configurations.
Current Workaround
- Multi-instance clusters (2+ instances): Users can drain nodes safely since PDBs allow one pod to be evicted at a time with automatic failover. The operator handles rescheduling automatically.
- Single-instance clusters: There is no workaround without manually editing the underlying CNPG
Clusterresource to setenablePDB: falseor configure the maintenance window, bypassing the DocumentDB operator's reconciliation.
Additional Context
This is particularly important for:
- AKS/EKS/GKE users who need to perform regular node pool upgrades
- Dev/test environments running single-instance DocumentDB clusters
- Bare-metal / on-prem deployments where node maintenance is manually managed
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request