Skip to content

sidecar sync: retry SSH on boot, simplify transient error detection, resolve base from sidecar#315

Open
schurchleycci wants to merge 4 commits into
mainfrom
sidecar-sync-retry-fetch
Open

sidecar sync: retry SSH on boot, simplify transient error detection, resolve base from sidecar#315
schurchleycci wants to merge 4 commits into
mainfrom
sidecar-sync-retry-fetch

Conversation

@schurchleycci
Copy link
Copy Markdown
Contributor

@schurchleycci schurchleycci commented May 7, 2026

Summary

  • SSH retry on boot: Sync now retries OpenSession with exponential backoff (2s initial, 15s max interval, 90s max elapsed) on transient net.Error failures, so freshly-created sidecars don't require the user to re-run the command while SSH is still starting. Emits a status message on the first retry so users know what's happening.
  • Simpler isTransientSSHError: replaced the blocklist approach (exclude known-permanent errors, retry everything else) with an allowlist (net.Error only). The allowlist has a safer default — unknown errors are not retried.
  • Remote origin/HEAD as sync base: replaced local gitutil.MergeBase() with git rev-parse origin/HEAD on the sidecar, falling back to HEAD if origin/HEAD is not configured. This gives a more reliable base when the sidecar was freshly cloned, and removes the git fetch step that was adding latency on every sync.
  • Tests for isTransientSSHError covering transient and non-transient cases.

Test plan

  • task test passes (860 tests, 0 failures) — verified on sidecar
  • New TestIsTransientSSHError covers timeout, connection refused, wrapped net error, auth failure, status error, key-not-found, and generic error cases

🤖 Generated with Claude Code

@schurchleycci schurchleycci changed the title sidecar sync: retry SSH on boot, use remote origin/HEAD as base sidecar sync: retry SSH on boot, simplify transient error detection, resolve base from sidecar May 8, 2026
Comment thread internal/sidecar/sync.go

// openSessionWithRetry calls OpenSession, retrying on transient errors to give
// a newly-created sidecar time to finish booting before its SSH service is ready.
func openSessionWithRetry(ctx context.Context, client *circleci.Client, sidecarID, identityFile, authSock string, status iostream.StatusFunc) (*Session, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danmux or @pete-woods do we have a standard pattern or lib for retries? this feels just like something we would have solved elsewhere...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the standard lib for retries now! Thanks @danmux for the link

schurchleycci and others added 4 commits May 14, 2026 09:53
Retry opening the SSH session up to 12 times (5s apart) so a freshly
created sidecar has time for its SSH service to become ready. Run
git fetch origin on the sidecar before git reset --hard so the merge
base commit is always available, even when the sidecar was booted from
an older snapshot. Add a status message for the fetch step so users
see progress rather than an unexplained pause.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace blocklist isTransientSSHError with net.Error allowlist (safer default)
- Drop git fetch before sync; use rev-parse origin/HEAD on sidecar instead of local MergeBase
- Emit status message on first SSH retry so users know why the CLI is waiting
- Add tests for isTransientSSHError

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e test fixture

- Replace hand-rolled retry loop in openSessionWithRetry with backoff.RetryNotify
  using ExponentialBackOff (2s→15s, 90s cap); non-transient errors wrapped in
  backoff.Permanent for immediate failure
- Fall back to git rev-parse HEAD when origin/HEAD is not configured on the sidecar
- Add SetResultFunc to fakes.SSHServer for per-command results; update
  TestSync_NonApplyFailureReturnsImmediately to trigger RemoteBaseError via a
  failing rev-parse rather than the removed local MergeBase() call

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@schurchleycci schurchleycci force-pushed the sidecar-sync-retry-fetch branch from 2721b6b to 47421ad Compare May 14, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants