You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
replace fixed 50ms sleeps in retry-delay assertions with polling helpers that wait for the retry entry
assert retry delay from the immediate observed remaining time instead of measuring after an arbitrary sleep
cover the continuation, failure backoff, min retry interval, and rate-limit floor cases with the same helper
Why
CI run 24133141504 failed on a flaky lower-bound assertion in min_retry_interval_ms enforces a floor on failure retry delays. The test was measuring remaining time after an arbitrary sleep, so scheduler latency on a busy runner could drop the observed value below the threshold even when the retry was scheduled correctly.
Validation
make check-elixir
mise exec -- mix test test/symphony_elixir/core_test.exs
repeated mix test test/symphony_elixir/core_test.exs:1003 (10x)
[P2] Scheduling-window assertion can miss wrong retry delays orchestrator/elixir/test/symphony_elixir/core_test.exs:1378 assert_due_scheduled_between/5 compares due_at_ms to the broad [window_start_ms, window_end_ms] interval rather than to the time the retry was actually scheduled. Any latency while handling {:DOWN, ...} can mask a bad delay on either side: for example, a continuation retry scheduled for 500ms will still satisfy the new 1000ms lower bound if the handler spends ~500ms before updating state.
Assert against the observation time returned by wait_for_retry_attempt/2 (due_at_ms - observed_at_ms) or capture the schedule timestamp explicitly, so handler latency cannot hide a retry-delay regression.
Verdict
needs attention — the new helper removes the flaky sleep, but it also weakens the timing checks enough that incorrect retry delays can now pass.
[P2] Retry-delay tests no longer verify when the retry was scheduled orchestrator/elixir/test/symphony_elixir/core_test.exs:689
The new pattern waits until retry_attempts[issue_id] exists and only then measures due_at_ms - observed_at_ms. That means a regression that delays scheduling by hundreds of milliseconds still passes, because the assertion starts the clock after the late state update instead of at the {:DOWN, ...} event.
Capture a timestamp immediately around send(pid, {:DOWN, ...}) and assert due_at_ms relative to that, or add a separate upper bound on how long wait_for_retry_attempt/2 is allowed to take.
[P2] The new 250ms tolerance is still tight enough to keep these tests flaky orchestrator/elixir/test/symphony_elixir/core_test.exs:1363 assert_remaining_delay/3 allows only ±250ms, but observed_at_ms is taken after a synchronous :sys.get_state/1 round-trip inside a polling loop. A correct implementation will still fail whenever CI takes >250ms to observe the updated state, especially on the 1s continuation-retry case.
Widen the slack materially, or make these assertions one-sided floor checks instead of exact-delay checks.
Verdict
needs attention — the follow-up change weakens the timing coverage and still leaves the tests sensitive to scheduler jitter.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why
CI run 24133141504 failed on a flaky lower-bound assertion in
min_retry_interval_ms enforces a floor on failure retry delays. The test was measuring remaining time after an arbitrary sleep, so scheduler latency on a busy runner could drop the observed value below the threshold even when the retry was scheduled correctly.Validation
make check-elixirmise exec -- mix test test/symphony_elixir/core_test.exsmix test test/symphony_elixir/core_test.exs:1003(10x)