fix(sandbox-manager): use postCtx for InitRuntime after resume wait#328
fix(sandbox-manager): use postCtx for InitRuntime after resume wait#328deepak0x wants to merge 1 commit into
Conversation
The postCtx fallback at sandbox.go:299-305 builds a fresh 30s context when the original ctx deadline is consumed by the resume wait. Three of the four post-wait callsites already use postCtx (InplaceRefresh, resolveCSIMountConfigs, ProcessCSIMounts); runtime.InitRuntime was missed and still received the expired ctx, so its /init request to agent-runtime failed with context deadline exceeded before reaching the sidecar. A retry-loop quirk in InitRuntime swallows the error, making this a silent failure: Resume() returns nil while the runtime sidecar was never re-initialized. Add a regression test that arranges the deadline-boundary case and asserts the /init request lands on a fake envd HTTP server.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #328 +/- ##
==========================================
+ Coverage 74.65% 74.69% +0.04%
==========================================
Files 141 141
Lines 9836 9836
==========================================
+ Hits 7343 7347 +4
+ Misses 2183 2181 -2
+ Partials 310 308 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@deepak0x: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Ⅰ. Describe what this PR does
Changes line 318 of
pkg/sandbox-manager/infra/sandboxcr/sandbox.goto usepostCtxinstead of the originalctxwhen callingruntime.InitRuntime, matching the three sister callsites (InplaceRefresh at L306, resolveCSIMountConfigs at L336, ProcessCSIMounts at L341) that already usepostCtx.Adds
sandbox_resume_postctx_test.go, a regression test that arranges the deadline-boundary scenario and asserts the/initrequest lands on a fake envd HTTP server. Without the fix the request fails on the client side withcontext deadline exceededbefore reaching the sidecar; with the fix it lands on the sidecar and Resume returns nil cleanly.Ⅱ. Does this pull request fix one issue?
fixes #327
Ⅲ. Describe how to verify it
Test passes after the fix and fails on the previous
master(was the bug-finder repro). Full sandboxcr package tests stay green (go test ./pkg/sandbox-manager/infra/sandboxcr/).Ⅳ. Special notes for reviews
One-token change to make line 318 consistent with the three callsites updated in
f2be4665. The new test follows the existingTestSandbox_Resume_ContextExpiredAfterWaitpattern inpause_resume_test.go.Note (out of scope for this PR, mentioned in #327):
runtime.InitRuntime's retry loop appears to swallow the final error and return nil even when all attempts fail. That behavior is what made the line-318 bug a silent failure rather than a loud one — happy to send a separate PR for it once this lands.