Skip to content

Conversation

@msarahan
Copy link

I found these testing locally. They may not be a problem remotely, but the changes seem nondisruptive either way.

trxcllnt and others added 4 commits December 9, 2025 14:42
`cl14.44` has an internal compiler error building stdexec
## Summary

With help from `claude`, I am implementing my favorite feature from the
old `rapids-compose`: `test-*` commands.

- Add `test-<lib>-cpp` commands that run C++ test scripts (e.g.,
`test-cudf-cpp` runs `ci/run_cudf_ctests.sh`)
- Add `test-<lib>-python` commands that run Python test scripts (e.g.,
`test-rmm-python` runs `ci/run_pytests.sh`)
- Add `test-<repo>` commands that run both C++ and Python tests for a
repo
- Add `test-all`, `test-all-cpp`, and `test-all-python` commands

Test script paths are specified in `manifest.yaml` since they vary
across repos. All arguments are forwarded to the underlying scripts.

## Test plan

- [x] Rebuild devcontainer and verify `test-*` commands are generated
- [x] Run `test-rmm-cpp -h` to verify help text
- [x] Run `test-cudf-cpp` (via script content inspection) to verify C++
tests invoke correct CI scripts
- [x] Run `test-rmm-python` to verify Python tests invoke correct CI
scripts
- [x] Run `test-all -h` to verify aggregated command works

## Testing notes

Validated using
`rapidsai/devcontainers:26.02-cpp-rapids-build-utils-ubuntu24.04` with
modified `rapids-build-utils` mounted:

```
# Commands generated correctly
$ ls /usr/bin/test-rmm*
/usr/bin/test-rmm
/usr/bin/test-rmm-cpp
/usr/bin/test-rmm-python

# Help text works
$ test-rmm-cpp -h
Usage:
 test-rmm-cpp [OPTION]...

Run rmm C++ tests.

Boolean options:
 -h,--help  Print this text.

# test-rmm-cpp correctly invokes ci/run_ctests.sh
$ test-rmm-cpp
./ci/run_ctests.sh: line 8: cd: /usr/bin/gtests/librmm/: No such file or directory

# test-rmm-python correctly invokes ci/run_pytests.sh
$ test-rmm-python
./ci/run_pytests.sh: line 10: pytest: command not found

# test-all commands work
$ test-all-cpp -h
Usage:
 test-all-cpp [OPTION]...

Runs test-<repo>-cpp for each repo in 'rmm' 'ucxx' ...
```
…nt proxies

Added retry logic and better error handling to wget commands in devcontainer
features to handle intermittent network issues when building through proxies:

Ninja feature (features/src/ninja/install.sh):
- Added --tries=3 --timeout=30 to wget for ninja binary download
- Made bash-completion download non-fatal (continues on failure)
- Added echo statements to show download progress

Utils feature (features/src/utils/install.sh):
- Added --tries=3 --timeout=30 to wget for gh-nv-gha-aws download
- Added explicit error handling with exit 1 on failure
- Added echo statement to show download progress

These changes ensure builds can complete successfully even with transient
network issues in CI environments using mitmproxy for traffic capture.
@msarahan msarahan changed the base branch from main to fix/vendor-common-utils December 11, 2025 17:44
msarahan and others added 5 commits December 11, 2025 10:01
cmake feature (features/src/cmake/install.sh):
- Added --tries=3 --timeout=30 to both wget commands
- Prevents indefinite hangs when downloading CMake installers from GitHub

rapids-build-utils feature (features/src/rapids-build-utils/install.sh):
- Added --tries=3 --timeout=30 to wget command for yq download
- Prevents indefinite hangs when downloading yq from GitHub

These timeouts prevent the 5-minute SSL connection timeouts observed with
git-lfs and ensure builds fail fast rather than hanging.
Changes to the gh-nv-gha-aws download:
- Replace -q with --verbose for detailed connection debugging
- Add --ca-certificate=/etc/ssl/certs/ca-certificates.crt to explicitly use CA bundle
- Add --waitretry=5 to wait 5 seconds between retry attempts (reduces hammering)
- Add --dns-timeout=10 for faster DNS failure detection
- Add --connect-timeout=30 separate from read timeout
- Improve error message to show the actual URL being attempted

This helps diagnose SSL/connection issues when traffic goes through mitmproxy's
transparent proxy, and makes retries more robust with delays between attempts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants