-
Notifications
You must be signed in to change notification settings - Fork 31
Use sccache-dist build cluster for conda and wheel builds
#341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sccache-dist build cluster for conda and wheel builds
#341
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR enables the use of RAPIDS' autoscaling cloud build cluster (sccache-dist) to accelerate conda and wheel builds across all CI workflows.
Key Changes:
- Added
sccache-dist-token-secret-name: GIST_REPO_READ_ORG_GITHUB_TOKENto all conda and wheel build/test jobs in workflows to authenticate with the distributed build cluster - Added
node_type: cpu8to C++ and wheel build jobs to specify appropriate node types for the build cluster - Set
SCCACHE_NO_DIST_COMPILE=1incmake/rapids_config.cmaketo disable distributed compilation for CMake's compiler tests (optimization to avoid connection overhead for trivial test compilations) - Added
NVCC_APPEND_FLAGSenvironment variable with default empty string in conda recipes to prevent build failures when the variable is unset in the build environment
The changes are consistent across all three workflow files (build, PR, test) and align with RAPIDS' broader infrastructure initiative to use GCC 14 and improve build performance.
Confidence Score: 5/5
- This PR is safe to merge with minimal risk
- All changes are configuration-only, following established RAPIDS patterns for sccache-dist integration. The additions are non-breaking:
node_typeandsccache-dist-token-secret-nameare optional parameters handled by shared workflows,SCCACHE_NO_DIST_COMPILEis a standard optimization, and theNVCC_APPEND_FLAGSdefault value fix prevents potential build failures. No logic changes or new code paths introduced. - No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| .github/workflows/build.yaml | 5/5 | Added sccache-dist-token-secret-name and node_type: cpu8 to all conda and wheel build jobs for distributed compilation support |
| .github/workflows/pr.yaml | 5/5 | Added sccache-dist-token-secret-name and node_type: cpu8 to all build and test jobs for distributed compilation in PR workflows |
| cmake/rapids_config.cmake | 5/5 | Set SCCACHE_NO_DIST_COMPILE=1 to disable sccache-dist for CMake compiler tests, preventing connection overhead for simple test compilations |
| conda/recipes/libwholegraph/recipe.yaml | 5/5 | Added NVCC_APPEND_FLAGS environment variable with default empty string to prevent build failures when variable is unset |
Sequence Diagram
sequenceDiagram
participant GH as GitHub Actions
participant Node as Build Node (cpu8)
participant SCC as sccache-dist Cluster
participant S3 as S3 Cache
GH->>GH: Authenticate with GIST_REPO_READ_ORG_GITHUB_TOKEN
GH->>Node: Start build job on cpu8 node
Node->>Node: Set SCCACHE_NO_DIST_COMPILE=1 (CMake config)
Node->>Node: Run CMake compiler tests (local only)
Node->>Node: Unset SCCACHE_NO_DIST_COMPILE
Node->>SCC: Connect to sccache-dist cluster
SCC->>S3: Check cache for compiled objects
alt Cache Hit
S3-->>SCC: Return cached objects
SCC-->>Node: Deliver cached compilation results
else Cache Miss
Node->>SCC: Submit compilation tasks
SCC->>SCC: Distribute compilation across cluster
SCC-->>Node: Return compiled objects
SCC->>S3: Store results in cache
end
Node->>Node: Link final artifacts
Node-->>GH: Upload build artifacts
6 files reviewed, no comments
This comment was marked as outdated.
This comment was marked as outdated.
…on in the conda build starts it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, no comments
This reverts commit 59696cd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, no comments
…fea/use-sccache-build-cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI changes are consistent with those seen in other repos undergoing sccache-dist work.
CI errors appear unrelated to this PR.
|
Yeah, @alexbarghi-nv mentioned they're being fixed in another PR and that it's fine to merge this. |
Description
RAPIDS has deployed an autoscaling cloud build cluster that can be used to accelerate building large RAPIDS projects.
This PR updates the conda and wheel builds to use the build cluster.
This contributes to rapidsai/build-planning#228.