Skip to content

Conversation

popojk
Copy link

@popojk popojk commented Aug 15, 2025

Why are these changes needed?

The current e2e test raycluster_autoscaler_part2_test is flaky. In the test case TestRayClusterAutoscalerV2IdleTimeout, logs show that the autoscaler takes ~84 seconds to scale down(10 seconds idle + 74 seconds scale down), whereas the test script sets a 40-second timeout(10 seconds idle + 30 seconds buffer), causing failures. This PR increases the timeout buffer from 30 seconds to 100 seconds to reduce flakiness.

截圖 2025-08-15 上午11 20 27

Related issue number

Closes #3940

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@popojk popojk changed the title fix: increase timeout buffer to prevent flaky test [Flaky] Test Autoscaler E2E Part 2 (nightly operator) is flaky Aug 15, 2025
Copy link
Contributor

@owenowenisme owenowenisme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s an issue #3909 regarding the scaling speed (I think @Future-Outlier is working on that). Maybe we should wait for it to be resolved before changing the timeout value?

@popojk
Copy link
Author

popojk commented Aug 15, 2025

There’s an issue #3909 regarding the scaling speed (I think @Future-Outlier is working on that). Maybe we should wait for it to be resolved before changing the timeout value?

Hi @owenowenisme , thanks for pointing this out! Yes, it makes sense to wait until the scaling speed PR is completed first.

cc @win5923

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flaky] "Test Autoscaler E2E Part 2 (nightly operator)" is flaky
2 participants