sidecar sync: retry SSH on boot, simplify transient error detection, resolve base from sidecar#315
Open
schurchleycci wants to merge 4 commits into
Open
sidecar sync: retry SSH on boot, simplify transient error detection, resolve base from sidecar#315schurchleycci wants to merge 4 commits into
schurchleycci wants to merge 4 commits into
Conversation
z00b
reviewed
May 8, 2026
|
|
||
| // openSessionWithRetry calls OpenSession, retrying on transient errors to give | ||
| // a newly-created sidecar time to finish booting before its SSH service is ready. | ||
| func openSessionWithRetry(ctx context.Context, client *circleci.Client, sidecarID, identityFile, authSock string, status iostream.StatusFunc) (*Session, error) { |
Contributor
There was a problem hiding this comment.
@danmux or @pete-woods do we have a standard pattern or lib for retries? this feels just like something we would have solved elsewhere...
Contributor
Author
There was a problem hiding this comment.
This is using the standard lib for retries now! Thanks @danmux for the link
8cda1bc to
f19d8bf
Compare
Retry opening the SSH session up to 12 times (5s apart) so a freshly created sidecar has time for its SSH service to become ready. Run git fetch origin on the sidecar before git reset --hard so the merge base commit is always available, even when the sidecar was booted from an older snapshot. Add a status message for the fetch step so users see progress rather than an unexplained pause. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace blocklist isTransientSSHError with net.Error allowlist (safer default) - Drop git fetch before sync; use rev-parse origin/HEAD on sidecar instead of local MergeBase - Emit status message on first SSH retry so users know why the CLI is waiting - Add tests for isTransientSSHError Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e test fixture - Replace hand-rolled retry loop in openSessionWithRetry with backoff.RetryNotify using ExponentialBackOff (2s→15s, 90s cap); non-transient errors wrapped in backoff.Permanent for immediate failure - Fall back to git rev-parse HEAD when origin/HEAD is not configured on the sidecar - Add SetResultFunc to fakes.SSHServer for per-command results; update TestSync_NonApplyFailureReturnsImmediately to trigger RemoteBaseError via a failing rev-parse rather than the removed local MergeBase() call Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2721b6b to
47421ad
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Syncnow retriesOpenSessionwith exponential backoff (2s initial, 15s max interval, 90s max elapsed) on transientnet.Errorfailures, so freshly-created sidecars don't require the user to re-run the command while SSH is still starting. Emits a status message on the first retry so users know what's happening.isTransientSSHError: replaced the blocklist approach (exclude known-permanent errors, retry everything else) with an allowlist (net.Erroronly). The allowlist has a safer default — unknown errors are not retried.origin/HEADas sync base: replaced localgitutil.MergeBase()withgit rev-parse origin/HEADon the sidecar, falling back toHEADiforigin/HEADis not configured. This gives a more reliable base when the sidecar was freshly cloned, and removes thegit fetchstep that was adding latency on every sync.isTransientSSHErrorcovering transient and non-transient cases.Test plan
task testpasses (860 tests, 0 failures) — verified on sidecarTestIsTransientSSHErrorcovers timeout, connection refused, wrapped net error, auth failure, status error, key-not-found, and generic error cases🤖 Generated with Claude Code