fix: grpc server not shutting down gracefully #2155

cre8ivejp · 2025-10-09T02:28:49Z

The HTTP server is shutting down gracefully, but the gRPC is not, causing requests to fail instantly when the pods get the SIGTERM signal.

…fic spikes Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

Copilot

Pull Request Overview

This PR fixes graceful shutdown of gRPC servers to prevent request failures when pods receive SIGTERM signals. The current issue was that while HTTP servers shut down gracefully, gRPC servers did not, causing instant failures during pod termination.

Key changes:

Implemented proper gRPC graceful shutdown with parallel server coordination
Extended shutdown timeout from 10s to 20s to accommodate GCP Spot VM constraints
Added Envoy coordination mechanism through /internal/shutdown-ready endpoint

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pkg/rpc/server.go	Core gRPC graceful shutdown implementation with HTTP/gRPC coordination and Envoy endpoint
pkg/web/cmd/server/server.go	Parallel gRPC server shutdown with wait groups and structured shutdown sequence
pkg/api/cmd/server.go	Graceful shutdown coordination for API service with proper server ordering
pkg/batch/cmd/server/server.go	Batch service shutdown improvements with parallel server handling
pkg/subscriber/cmd/server/server.go	Subscriber service shutdown optimization for PubSub message processing
pkg/metrics/metrics.go	Updated Prometheus collectors and error handling for metrics server shutdown
manifests/bucketeer/charts/*/templates/deployment.yaml	Kubernetes deployment updates with termination grace periods and Envoy preStop hooks
manifests/bucketeer/charts/api/values.yaml	Health check probe timing adjustments for faster readiness detection
manifests/bucketeer/charts/api/templates/envoy-configmap.yaml	Extended Envoy timeout values to accommodate graceful shutdown timing

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pkg/metrics/metrics.go

pkg/rpc/server.go

pkg/web/cmd/server/server.go

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

cre8ivejp added 14 commits October 9, 2025 10:31

fix: improve timeout handling and health check resilience during traf…

b8337b9

…fic spikes Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

chore: set mising timeout settings for api server

e616e3d

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

chore: adjust timeout settings

286b464

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

feat: add metrics to monitor grpc server shutdown

e2f1699

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

chore: implement prometheus push gateway

5618c14

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

fix: metrics service name

59fcb8e

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

chore: set prometheusPushGatewayURL for all services

0a4e7d3

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

fix: deprecated collector functions

d4e3fe1

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

fix: grouping label conflict

72941f4

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

chore: split server and service labels

d8bc12e

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

fix: missing server label

2649024

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

chore: remove shutdown metrics

c8c4fa3

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

chore: rebase

6047110

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

fix: gracefull shutdown for all services

cb90076

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

cre8ivejp requested a review from Copilot October 9, 2025 02:30

Copilot AI reviewed Oct 9, 2025

View reviewed changes

pkg/metrics/metrics.go Show resolved Hide resolved

pkg/rpc/server.go Outdated Show resolved Hide resolved

pkg/web/cmd/server/server.go Outdated Show resolved Hide resolved

cre8ivejp added 2 commits October 9, 2025 11:38

chore: set the timeout for k8s readiness and liveness

95e1943

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

fix: lint error

abe9498

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

cre8ivejp marked this pull request as ready for review October 9, 2025 03:16

cre8ivejp requested review from hvn2k1, nnnkkk7 and t-kikuc as code owners October 9, 2025 03:16

fix: shutting down process

b66f813

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

cre8ivejp force-pushed the fix-gracefull-shutdown branch 2 times, most recently from fdb962a to f393032 Compare October 9, 2025 07:05

fix: shutdown order

cfdfa18

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

cre8ivejp force-pushed the fix-gracefull-shutdown branch from f393032 to cfdfa18 Compare October 9, 2025 07:57

cre8ivejp added 3 commits October 9, 2025 17:40

chore: remove drain_listeners from envoy prestop

cc11c98

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

fix: 503 errors when shutting down the server

1f51d03

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

chore: remove internal shutdown ready handler

e910a7b

Signed-off-by: Alessandro Yuichi Okimoto <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: grpc server not shutting down gracefully #2155

fix: grpc server not shutting down gracefully #2155

cre8ivejp commented Oct 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix: grpc server not shutting down gracefully #2155

Are you sure you want to change the base?

fix: grpc server not shutting down gracefully #2155

Conversation

cre8ivejp commented Oct 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!