Skip to content

Conversation

cre8ivejp
Copy link
Member

Fix #2154

The HTTP server is shutting down gracefully, but the gRPC is not, causing requests to fail instantly when the pods get the SIGTERM signal.

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
@cre8ivejp cre8ivejp requested a review from Copilot October 9, 2025 02:30
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes graceful shutdown of gRPC servers to prevent request failures when pods receive SIGTERM signals. The current issue was that while HTTP servers shut down gracefully, gRPC servers did not, causing instant failures during pod termination.

Key changes:

  • Implemented proper gRPC graceful shutdown with parallel server coordination
  • Extended shutdown timeout from 10s to 20s to accommodate GCP Spot VM constraints
  • Added Envoy coordination mechanism through /internal/shutdown-ready endpoint

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/rpc/server.go Core gRPC graceful shutdown implementation with HTTP/gRPC coordination and Envoy endpoint
pkg/web/cmd/server/server.go Parallel gRPC server shutdown with wait groups and structured shutdown sequence
pkg/api/cmd/server.go Graceful shutdown coordination for API service with proper server ordering
pkg/batch/cmd/server/server.go Batch service shutdown improvements with parallel server handling
pkg/subscriber/cmd/server/server.go Subscriber service shutdown optimization for PubSub message processing
pkg/metrics/metrics.go Updated Prometheus collectors and error handling for metrics server shutdown
manifests/bucketeer/charts/*/templates/deployment.yaml Kubernetes deployment updates with termination grace periods and Envoy preStop hooks
manifests/bucketeer/charts/api/values.yaml Health check probe timing adjustments for faster readiness detection
manifests/bucketeer/charts/api/templates/envoy-configmap.yaml Extended Envoy timeout values to accommodate graceful shutdown timing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
@cre8ivejp cre8ivejp marked this pull request as ready for review October 9, 2025 03:16
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
@cre8ivejp cre8ivejp force-pushed the fix-gracefull-shutdown branch 2 times, most recently from fdb962a to f393032 Compare October 9, 2025 07:05
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
@cre8ivejp cre8ivejp force-pushed the fix-gracefull-shutdown branch from f393032 to cfdfa18 Compare October 9, 2025 07:57
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix: grpc server not shutting down gracefully
1 participant