Support Kubernetes node upgrade and maintenance operations

## Summary

The DocumentDB operator does not currently expose configuration options needed for safe Kubernetes node maintenance (OS updates, hardware replacement, K8s version upgrades). This enhancement requests adding first-class support for node maintenance scenarios through CRD configuration and documentation.

## Problem

When users need to drain nodes running DocumentDB pods, they may encounter issues:

- **Single-instance clusters blocked by PDBs:** With `instancesPerNode: 1`, the operator-created PodDisruptionBudgets (PDBs) block `kubectl drain`, causing it to hang indefinitely. There is no way to disable or override PDBs from the DocumentDB CRD.
- **No `enablePDB` configuration:** Users cannot toggle PDB creation from the DocumentDB custom resource, which is necessary for single-instance dev/test clusters where node drains are common.
- **No `nodeMaintenanceWindow` settings:** There is no way to signal to the operator that a planned maintenance is in progress, which would disable self-healing and allow pods to be safely evicted and rescheduled.
- **No documented guidance:** Users performing node maintenance have no official documentation on how to safely drain nodes running DocumentDB workloads.

## What CNPG Provides (Reference)

CloudNativePG has comprehensive support for node maintenance via:

- **`spec.enablePDB`**  Toggle PodDisruptionBudgets on/off (needed for single-instance dev clusters)
- **`spec.nodeMaintenanceWindow.inProgress`**  Disables self-healing during planned maintenance
- **`spec.nodeMaintenanceWindow.reusePVC`**  Controls whether to wait for the node to come back (`true`, reuses existing PVCs) or re-clone data to a new node (`false`)

Reference: https://github.com/cloudnative-pg/cloudnative-pg/blob/main/docs/src/kubernetes_upgrade.md

## Proposed Scope

1. **Expose `enablePDB` in the DocumentDB CRD**  Allow users to disable PDB creation, or handle it automatically based on `instancesPerNode` (e.g., skip PDB when `instancesPerNode: 1`).
2. **Consider exposing `nodeMaintenanceWindow` settings**  For advanced use cases such as bare-metal clusters or local storage setups, allow users to set `inProgress` and `reusePVC` flags.
3. **Add documentation for Kubernetes node maintenance procedures**  Document the recommended steps for safely performing node maintenance (drain, cordon, upgrade, uncordon) with DocumentDB clusters, covering both single-instance and multi-instance configurations.

## Current Workaround

- **Multi-instance clusters (2+ instances):** Users can drain nodes safely since PDBs allow one pod to be evicted at a time with automatic failover. The operator handles rescheduling automatically.
- **Single-instance clusters:** There is **no workaround** without manually editing the underlying CNPG `Cluster` resource to set `enablePDB: false` or configure the maintenance window, bypassing the DocumentDB operator's reconciliation.

## Additional Context

This is particularly important for:
- **AKS/EKS/GKE users** who need to perform regular node pool upgrades
- **Dev/test environments** running single-instance DocumentDB clusters
- **Bare-metal / on-prem deployments** where node maintenance is manually managed


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Kubernetes node upgrade and maintenance operations #305

Summary

Problem

What CNPG Provides (Reference)

Proposed Scope

Current Workaround

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Kubernetes node upgrade and maintenance operations #305

Description

Summary

Problem

What CNPG Provides (Reference)

Proposed Scope

Current Workaround

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions