Skip to content

fix: stabilize live linux and windows e2e agent flows#5

Merged
Microck merged 12 commits into
mainfrom
fix/live-e2e-followups
Apr 9, 2026
Merged

fix: stabilize live linux and windows e2e agent flows#5
Microck merged 12 commits into
mainfrom
fix/live-e2e-followups

Conversation

@Microck
Copy link
Copy Markdown
Owner

@Microck Microck commented Apr 9, 2026

Summary

  • fix the installed agent command shape on Linux and Windows to use the supported agent --once path
  • keep the live E2E coverage on real Tailscale-backed credentials in isolated environments
  • add unit coverage so installer-generated agent commands cannot regress back to the unsupported form

Verification

  • go test ./...
  • bash -n tests/live/linux-live-e2e.sh
  • PowerShell parse check for tests/live/windows-live-e2e.ps1
  • GitHub Actions live matrix: 24214688856
    • Linux: 70692222375
    • Windows: 70692222401

Root Cause

  • the Linux systemd unit and Windows scheduled-task launcher were emitting agent run ..., but the CLI only supports agent ...
  • on Linux that caused the startup agent to ignore the intended runtime flags, read the wrong state, and self-remove during the post-enroll assertions
  • Windows had the same latent installer bug even though earlier fixes had already stabilized the observable test flow

Microck added 12 commits April 9, 2026 20:29
Use a short Windows launcher script for scheduled tasks so the real hosted E2E run stays under schtasks /TR limits, and delete the launcher during agent self-removal.

Fix the Linux live wrapper generation by preventing heredoc expansion while writing the isolated tailscale wrapper, then assert the Windows launcher exists and is removed in the live test.
Generate short-lived Linux and Windows auth keys from the Tailscale API during the live workflow so reruns do not depend on stale or one-shot stored auth keys.

Also move the Linux live workdir to /var/tmp and recreate it after package installation, then update the docs to match the new secret requirements.
Fix the Linux wrapper script so its internal variables expand at runtime, and switch the Windows live test to verify scheduled tasks through schtasks instead of Get-ScheduledTask.
Avoid starting the Linux oneshot agent service during enrollment so the timer does not race the initial lease state, and treat a 404 device delete as already cleaned for ephemeral Windows nodes.
Add per-request timeouts and explicit progress markers to the Windows live E2E script so stalled Tailscale API calls do not hang the workflow without actionable output.
Wrap the Windows live E2E CLI invocations with explicit subprocess timeouts and captured output so hangs report the exact failing phase instead of consuming the entire job timeout.
Use a detached PowerShell Start-Process launcher for delayed Windows self-deletion so agent --once can return after cleanup instead of blocking on the cmd start chain.
@Microck Microck merged commit a3ca0a7 into main Apr 9, 2026
5 checks passed
@Microck Microck deleted the fix/live-e2e-followups branch April 9, 2026 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant