test(facade): de-flake TestEchoRestEndToEnd on macOS CI#23
Merged
Conversation
b320f0e to
201e06c
Compare
'sidecar never became healthy' recurred on macOS CI (runs 28355782166, 28237181198, 28182411906) even after e076578 bumped the deadline 5s→30s. The bump only masked slow starts; it could not fix a crash. Root cause: Go pre-allocated an ephemeral port, closed its listener, and handed the number to Python via SIDECAR_PORT. On a busy runner the port could be re-taken in that gap; Python then exited EADDRINUSE. cmd.Stderr = io.Discard swallowed the crash, so the test only saw a 30s timeout with zero diagnostics. Fix: - Sidecar binds 127.0.0.1:0 itself and prints PORT=<n> to stdout; Go reads the actual bound port. Eliminates the port-reuse window. - Capture stderr to a strings.Builder and include it in every failure message. Never go blind on a sidecar crash again. - Warm the Python interpreter up front (python3 -c '...'). The first exec of python3 on a fresh macOS runner can take >30s while the kernel validates the binary's code signature; paying that cost before spawning the sidecar keeps the port-read deadline honest. - Bump the port-read deadline 30s -> 120s for a genuinely cold interpreter that the warm-up somehow missed. Verified: 20x -race clean locally. On macOS CI: 2/3 runs green; the 1 failure surfaced the new diagnostic (empty stderr -> Python stalled before binding), which this warm-up + wider deadline addresses. Signed-off-by: Niclas Hülsmann <niclas.huelsmann@tngtech.com>
066b061 to
50e9353
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
TestEchoRestEndToEndflaked repeatedly on macOS CI withsidecar never became healthyafter 30s:Commit e076578 bumped the health-check deadline 5s → 30s, but the flake recurred — the bump only masked slow starts; it could not fix a crash.
Root cause
SIDECAR_PORT. On a busy macOS runner another process could grab the port in that gap → Python exited withEADDRINUSE.cmd.Stderr = io.Discardswallowed the crash, so the test only saw a 30s timeout with zero diagnostics — the actual failure mode was invisible.python3exec on a fresh macOS runner can take >30s while the kernel validates the binary's code signature / notarization ticket. The 30s port-read deadline hit before Python even reached theThreadingHTTPServer(...)constructor — stderr was empty.Fix (
internal/facade/integration_test.go, +78/-17)127.0.0.1:0and printsPORT=<n>to stdout; Go reads the actual bound port from the child. Eliminates the port-reuse window entirely.strings.Builder; include it in every failure message (did not report its port,never bound a port,never became healthy). Never go blind on a sidecar crash again.python3 -c '...'). Pays the code-signing validation cost before spawning the sidecar, so the port-read deadline is measured against a warm interpreter.Verification
-raceclean locally (~0.2s per run; was 30s+ on failure).gofmt -lclean,go vetclean, full suite green.sidecar never bound a port within 30s (stderr: )→ empty stderr, Python stalled before binding — exactly what the warm-up + wider deadline addresses.Not in scope
TestEmbeddedSourceMatchesCanonicalflaked on both OSes on a different branch (runs 28171502590 / 28171406470) — separate issue, not the macOS flake addressed here.