-
Notifications
You must be signed in to change notification settings - Fork 317
Description
While performing Disaster Recovery testing, we found an issue with restoring Ingress GCE resources after a full cluster deletion.
We are using Velero to back up our Kubernetes cluster. The behavior changes depending on the recovery scenario:
- Case 1 (namespace deletion): When we delete a namespace that contains an Ingress GCE, restoring with Velero works as expected. The Ingress is restored and synced correctly.
- Case 2 (cluster deletion): When we delete the entire cluster and then restore, the Ingress resource fails to sync. Instead, we see the following error:
Error syncing to GCP: error running load balancer syncing routine: loadbalancer XYZ does not exist: googleapi: Error 400: Invalid value for field 'resource.IPAddress': 'xx.xx.xxx.xxx'. Specified IP address is in-use and would result in a conflict., invalid
The only workaround we’ve found is to manually delete the LoadBalancer and related resources in GCP.
Currently, we are tied to using Ingress GCE because the Google Gateway API does not yet support CDN features.
Expected behavior:
Ingress GCE should be able to fully recover after a cluster deletion and restore process without requiring manual cleanup of LoadBalancer resources.
Environment:
- Ingress Type:
External - Kubernetes version:
v1.32.2-gke.1182003 - Velero version:
v1.9.2 - Cloud provider:
GCP
Additional context / Question:
Is there a proper way to handle this scenario so that recovery works without manual intervention?
I saw a related closed issue (#1057), but I didn’t think it made sense to reopen it since this case involves recovery tooling (Velero) and the responses there don’t seem to address this situation.