forked from rapidsai/devcontainers
-
Notifications
You must be signed in to change notification settings - Fork 0
add retries for robustness when using mitmproxy #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
msarahan
wants to merge
9
commits into
trxcllnt:fix/vendor-common-utils
Choose a base branch
from
msarahan:feature_scripts_retries
base: fix/vendor-common-utils
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
add retries for robustness when using mitmproxy #18
msarahan
wants to merge
9
commits into
trxcllnt:fix/vendor-common-utils
from
msarahan:feature_scripts_retries
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
`cl14.44` has an internal compiler error building stdexec
## Summary With help from `claude`, I am implementing my favorite feature from the old `rapids-compose`: `test-*` commands. - Add `test-<lib>-cpp` commands that run C++ test scripts (e.g., `test-cudf-cpp` runs `ci/run_cudf_ctests.sh`) - Add `test-<lib>-python` commands that run Python test scripts (e.g., `test-rmm-python` runs `ci/run_pytests.sh`) - Add `test-<repo>` commands that run both C++ and Python tests for a repo - Add `test-all`, `test-all-cpp`, and `test-all-python` commands Test script paths are specified in `manifest.yaml` since they vary across repos. All arguments are forwarded to the underlying scripts. ## Test plan - [x] Rebuild devcontainer and verify `test-*` commands are generated - [x] Run `test-rmm-cpp -h` to verify help text - [x] Run `test-cudf-cpp` (via script content inspection) to verify C++ tests invoke correct CI scripts - [x] Run `test-rmm-python` to verify Python tests invoke correct CI scripts - [x] Run `test-all -h` to verify aggregated command works ## Testing notes Validated using `rapidsai/devcontainers:26.02-cpp-rapids-build-utils-ubuntu24.04` with modified `rapids-build-utils` mounted: ``` # Commands generated correctly $ ls /usr/bin/test-rmm* /usr/bin/test-rmm /usr/bin/test-rmm-cpp /usr/bin/test-rmm-python # Help text works $ test-rmm-cpp -h Usage: test-rmm-cpp [OPTION]... Run rmm C++ tests. Boolean options: -h,--help Print this text. # test-rmm-cpp correctly invokes ci/run_ctests.sh $ test-rmm-cpp ./ci/run_ctests.sh: line 8: cd: /usr/bin/gtests/librmm/: No such file or directory # test-rmm-python correctly invokes ci/run_pytests.sh $ test-rmm-python ./ci/run_pytests.sh: line 10: pytest: command not found # test-all commands work $ test-all-cpp -h Usage: test-all-cpp [OPTION]... Runs test-<repo>-cpp for each repo in 'rmm' 'ucxx' ... ```
…nt proxies Added retry logic and better error handling to wget commands in devcontainer features to handle intermittent network issues when building through proxies: Ninja feature (features/src/ninja/install.sh): - Added --tries=3 --timeout=30 to wget for ninja binary download - Made bash-completion download non-fatal (continues on failure) - Added echo statements to show download progress Utils feature (features/src/utils/install.sh): - Added --tries=3 --timeout=30 to wget for gh-nv-gha-aws download - Added explicit error handling with exit 1 on failure - Added echo statement to show download progress These changes ensure builds can complete successfully even with transient network issues in CI environments using mitmproxy for traffic capture.
cmake feature (features/src/cmake/install.sh): - Added --tries=3 --timeout=30 to both wget commands - Prevents indefinite hangs when downloading CMake installers from GitHub rapids-build-utils feature (features/src/rapids-build-utils/install.sh): - Added --tries=3 --timeout=30 to wget command for yq download - Prevents indefinite hangs when downloading yq from GitHub These timeouts prevent the 5-minute SSL connection timeouts observed with git-lfs and ensure builds fail fast rather than hanging.
Changes to the gh-nv-gha-aws download: - Replace -q with --verbose for detailed connection debugging - Add --ca-certificate=/etc/ssl/certs/ca-certificates.crt to explicitly use CA bundle - Add --waitretry=5 to wait 5 seconds between retry attempts (reduces hammering) - Add --dns-timeout=10 for faster DNS failure detection - Add --connect-timeout=30 separate from read timeout - Improve error message to show the actual URL being attempted This helps diagnose SSL/connection issues when traffic goes through mitmproxy's transparent proxy, and makes retries more robust with delays between attempts.
…ainers into feature_scripts_retries
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I found these testing locally. They may not be a problem remotely, but the changes seem nondisruptive either way.