Skip to content

test: harden retry delay timing assertions#78

Merged
tmustier merged 1 commit intomainfrom
fix/ci-min-retry-interval-flake
Apr 8, 2026
Merged

test: harden retry delay timing assertions#78
tmustier merged 1 commit intomainfrom
fix/ci-min-retry-interval-flake

Conversation

@tmustier
Copy link
Copy Markdown
Owner

@tmustier tmustier commented Apr 8, 2026

Summary

  • replace fixed 50ms sleeps in retry-delay assertions with polling helpers that wait for the retry entry
  • assert retry delay from the immediate observed remaining time instead of measuring after an arbitrary sleep
  • cover the continuation, failure backoff, min retry interval, and rate-limit floor cases with the same helper

Why

CI run 24133141504 failed on a flaky lower-bound assertion in min_retry_interval_ms enforces a floor on failure retry delays. The test was measuring remaining time after an arbitrary sleep, so scheduler latency on a busy runner could drop the observed value below the threshold even when the retry was scheduled correctly.

Validation

  • make check-elixir
  • mise exec -- mix test test/symphony_elixir/core_test.exs
  • repeated mix test test/symphony_elixir/core_test.exs:1003 (10x)

@tmustier tmustier force-pushed the fix/ci-min-retry-interval-flake branch from d10285c to 42f3529 Compare April 8, 2026 12:16
@tmustier
Copy link
Copy Markdown
Owner Author

tmustier commented Apr 8, 2026

Findings

  • [P2] Scheduling-window assertion can miss wrong retry delays
    orchestrator/elixir/test/symphony_elixir/core_test.exs:1378
    assert_due_scheduled_between/5 compares due_at_ms to the broad [window_start_ms, window_end_ms] interval rather than to the time the retry was actually scheduled. Any latency while handling {:DOWN, ...} can mask a bad delay on either side: for example, a continuation retry scheduled for 500ms will still satisfy the new 1000ms lower bound if the handler spends ~500ms before updating state.
    Assert against the observation time returned by wait_for_retry_attempt/2 (due_at_ms - observed_at_ms) or capture the schedule timestamp explicitly, so handler latency cannot hide a retry-delay regression.

Verdict

needs attention — the new helper removes the flaky sleep, but it also weakens the timing checks enough that incorrect retry delays can now pass.

@tmustier
Copy link
Copy Markdown
Owner Author

tmustier commented Apr 8, 2026

Findings

  • [P2] Retry-delay tests no longer verify when the retry was scheduled
    orchestrator/elixir/test/symphony_elixir/core_test.exs:689
    The new pattern waits until retry_attempts[issue_id] exists and only then measures due_at_ms - observed_at_ms. That means a regression that delays scheduling by hundreds of milliseconds still passes, because the assertion starts the clock after the late state update instead of at the {:DOWN, ...} event.
    Capture a timestamp immediately around send(pid, {:DOWN, ...}) and assert due_at_ms relative to that, or add a separate upper bound on how long wait_for_retry_attempt/2 is allowed to take.

  • [P2] The new 250ms tolerance is still tight enough to keep these tests flaky
    orchestrator/elixir/test/symphony_elixir/core_test.exs:1363
    assert_remaining_delay/3 allows only ±250ms, but observed_at_ms is taken after a synchronous :sys.get_state/1 round-trip inside a polling loop. A correct implementation will still fail whenever CI takes >250ms to observe the updated state, especially on the 1s continuation-retry case.
    Widen the slack materially, or make these assertions one-sided floor checks instead of exact-delay checks.

Verdict

needs attention — the follow-up change weakens the timing coverage and still leaves the tests sensitive to scheduler jitter.

@tmustier tmustier merged commit ef98021 into main Apr 8, 2026
3 checks passed
@tmustier tmustier deleted the fix/ci-min-retry-interval-flake branch April 8, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant