-
Notifications
You must be signed in to change notification settings - Fork 23
fix: grpc server not shutting down gracefully #2155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…fic spikes Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes graceful shutdown of gRPC servers to prevent request failures when pods receive SIGTERM signals. The current issue was that while HTTP servers shut down gracefully, gRPC servers did not, causing instant failures during pod termination.
Key changes:
- Implemented proper gRPC graceful shutdown with parallel server coordination
- Extended shutdown timeout from 10s to 20s to accommodate GCP Spot VM constraints
- Added Envoy coordination mechanism through
/internal/shutdown-ready
endpoint
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
pkg/rpc/server.go | Core gRPC graceful shutdown implementation with HTTP/gRPC coordination and Envoy endpoint |
pkg/web/cmd/server/server.go | Parallel gRPC server shutdown with wait groups and structured shutdown sequence |
pkg/api/cmd/server.go | Graceful shutdown coordination for API service with proper server ordering |
pkg/batch/cmd/server/server.go | Batch service shutdown improvements with parallel server handling |
pkg/subscriber/cmd/server/server.go | Subscriber service shutdown optimization for PubSub message processing |
pkg/metrics/metrics.go | Updated Prometheus collectors and error handling for metrics server shutdown |
manifests/bucketeer/charts/*/templates/deployment.yaml | Kubernetes deployment updates with termination grace periods and Envoy preStop hooks |
manifests/bucketeer/charts/api/values.yaml | Health check probe timing adjustments for faster readiness detection |
manifests/bucketeer/charts/api/templates/envoy-configmap.yaml | Extended Envoy timeout values to accommodate graceful shutdown timing |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
fdb962a
to
f393032
Compare
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
f393032
to
cfdfa18
Compare
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Fix #2154
The HTTP server is shutting down gracefully, but the gRPC is not, causing requests to fail instantly when the pods get the SIGTERM signal.