-
Notifications
You must be signed in to change notification settings - Fork 701
Description
What steps did you take and what happened:
Seemingly out of nowhere, we had increased flakiness in our web applications, where requests would fail with either 503 or 403 requests. Having both of those errors made it extra hard to track down what was going on.
After finding CERTIFICATE_VERIFY_FAILED in the envoy logs, I looked at some resources that matched what I was seeing. We had last deployed Contour 380 days ago, so it appears as though the cert expired. We went ahead and upgraded from v1.30.1 to v1.33.0 and the problem was resolved.
Related resources:
- StreamEndpoints gRPC config stream to contour closed since #6014
- https://stackoverflow.com/questions/78791347/why-are-my-envoy-pods-failing-with-reset-reason-connection-failure-transport
- https://knowledge.broadcom.com/external/article/375521/the-renewed-ca-certificate-dont-get-upda.html
What did you expect to happen:
The envoy pods didn't restart, so it didn't trigger any restart notifications. I'm not sure there's a ton that contour itself could/should do, since alerting is out of scope. Maybe retrigger the cert job on a cron?
Anything else you would like to add:
Here are a sample of the envoy logs. I'm going to close this issue, but I wanted to file it as a reference for anyone else who might find this problem.
[warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:214] StreamClusters gRPC config stream to contour closed since 54424s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end
[warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:214] StreamListeners gRPC config stream to contour closed since 54413s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end
[warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:214] StreamRuntime gRPC config stream to contour closed since 54455s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end
Environment:
- Contour version: v1.30.1
- Kubernetes version: (use
kubectl version): Server Version: v1.33.5-gke.1125000 - Kubernetes installer & version: GKE
- Cloud provider or hardware configuration: GCP
- OS (e.g. from
/etc/os-release):