Skip to content

fix: add restart annotation when TLS gateway secret name changes#319

Open
WentingWu666666 wants to merge 1 commit intodocumentdb:mainfrom
WentingWu666666:wentingwu/fix-tls-secret-restart
Open

fix: add restart annotation when TLS gateway secret name changes#319
WentingWu666666 wants to merge 1 commit intodocumentdb:mainfrom
WentingWu666666:wentingwu/fix-tls-secret-restart

Conversation

@WentingWu666666
Copy link
Collaborator

@WentingWu666666 WentingWu666666 commented Mar 19, 2026

Problem

When a user changes spec.tls.gateway.provided.secretName on an existing DocumentDB cluster, the operator updates the gatewayTLSSecret plugin parameter on the CNPG Cluster object but does not trigger a pod restart.

The sidecar injector mounts the TLS secret as a Kubernetes volume (SecretVolumeSource in sidecar-injector/internal/lifecycle/lifecycle.go), so changing the secret name requires pods to be recreated to mount the new secret. CNPG does not auto-restart pods for plugin parameter changes — only for PodSpec divergence like ImageVolume changes (see comment at documentdb_controller.go:867).

This means the gateway continues serving with the old TLS certificate until something else causes a pod restart.

Fix

Replace the documentdb.io/gateway-tls-rev annotation (which had no functional effect) with kubectl.kubernetes.io/restartedAt — the annotation CNPG watches to trigger rolling restarts. This is the same mechanism already used for gateway-only image updates (documentdb_controller.go:876).

The annotation is set in the same Update() call that writes the new plugin parameter, keeping it atomic.

Testing

  • Added unit test: should add restart annotation when TLS secret name changes
  • All 148 existing controller tests pass

Changes

  • operator/src/internal/controller/documentdb_controller.go — Replace documentdb.io/gateway-tls-rev with kubectl.kubernetes.io/restartedAt when gatewayTLSSecret changes
  • operator/src/internal/controller/documentdb_controller_test.go — Add test verifying restart annotation is set on TLS secret name change

Copilot AI review requested due to automatic review settings March 19, 2026 19:18
@WentingWu666666 WentingWu666666 force-pushed the wentingwu/fix-tls-secret-restart branch from 0795224 to 60713ed Compare March 19, 2026 19:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ensures a CNPG rolling restart is triggered when the configured gateway TLS Secret name changes, so gateway pods remount the new Secret volume and start serving the updated certificate.

Changes:

  • Update CNPG Cluster plugin parameter gatewayTLSSecret and add a rolling-restart annotation when the TLS Secret name changes.
  • Add a controller unit test that verifies the restart annotation is applied on TLS Secret changes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
operator/src/internal/controller/documentdb_controller.go Adds logic to force a CNPG rolling restart when gatewayTLSSecret changes.
operator/src/internal/controller/documentdb_controller_test.go Adds a unit test asserting gatewayTLSSecret update and restart annotation behavior.
Comments suppressed due to low confidence (2)

operator/src/internal/controller/documentdb_controller.go:248

  • If adding the restart annotation patch fails (marshal or Patch error), the reconciler still returns success and requeues. Because gatewayTLSSecret was already updated, the next reconcile will see no change and will not attempt to add the restart annotation again, leaving pods stuck mounting the old Secret. Treat patch failures as reconcile errors (or set kubectl.kubernetes.io/restartedAt on currentCnpgCluster.Annotations before the Update so it’s part of the same write) so the restart request is reliably applied.

	if slices.Contains(currentCnpgCluster.Status.InstancesStatus[cnpgv1.PodHealthy], currentCnpgCluster.Status.CurrentPrimary) && replicationContext.IsPrimary() {
		// Check if permissions have already been granted
		checkCommand := "SELECT 1 FROM pg_roles WHERE rolname = 'streaming_replica' AND pg_has_role('streaming_replica', 'documentdb_admin_role', 'USAGE');"
		output, err := r.SQLExecutor(ctx, currentCnpgCluster, checkCommand)
		if err != nil {
			logger.Error(err, "Failed to check if permissions already granted")
			return ctrl.Result{RequeueAfter: RequeueAfterLong}, nil

operator/src/internal/controller/documentdb_controller.go:240

  • The restart-annotation patch construction here duplicates the same MergePatch logic used in upgradeDocumentDBIfNeeded (gateway-only update). Consider extracting a small helper to apply the rolling-restart annotation so future changes (annotation key, timestamp format, patch type, error handling) stay consistent across both call sites.
					return ctrl.Result{RequeueAfter: RequeueAfterShort}, nil
				} else {
					logger.Error(err, "Failed to update CNPG Cluster with TLS settings")
				}
			}
		}
	}

When spec.tls.gateway.provided.secretName changes, the operator
updates the gatewayTLSSecret plugin parameter on the CNPG Cluster
but does not force a pod restart. CNPG does not auto-restart pods
for plugin parameter changes (only for PodSpec divergence like
ImageVolume changes). Since the sidecar injector mounts the TLS
secret as a volume, pods must be restarted to pick up the new
secret name.

Add kubectl.kubernetes.io/restartedAt annotation (the same mechanism
used for gateway-only image updates) to force a rolling restart when
the TLS secret name changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Wenting Wu <wentingwu@microsoft.com>
@WentingWu666666 WentingWu666666 force-pushed the wentingwu/fix-tls-secret-restart branch from 60713ed to d70d6d6 Compare March 19, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants