-
Notifications
You must be signed in to change notification settings - Fork 29
[CI] issue: HPCINFRA-3388 Run all CI steps on containers #488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5da7cb9 to
b522a02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR migrates the weekly disaster-recovery CI pipeline from executing on the Jenkins master node to running inside Kubernetes-managed containers. The change removes runs_on_agents: master and replaces it with a kubernetes configuration block (cloud, namespace, resource limits/requests, architecture-specific node selectors) plus a runs_on_dockers entry pointing to quay.io/podman/stable:v5.0.2. The agentSelector: master directives are removed from both the "Determine release_tag" and "pipeline_stop" steps. This aligns with the existing CI framework documented in .ci/README.md, which describes a matrix-driven approach where all steps run in Docker containers defined in the runs_on_dockers catalog.
Potential Issues
-
Git command execution in container: Line 43 runs
git describe --tags --abbrev=0to determine the release tag. Thequay.io/podman/stable:v5.0.2image may not havegitinstalled, which would cause the step to fail silently or error out. You should verify that the container includesgitor switch to an image that does (similar to how other pipelines use purpose-built CI images fromharbor.mellanox.com). -
Jenkins API access: Line53 uses
Jenkins.instance.getItemByFullName('libxlio/LibXLIO-release')to retrieve job metadata. This requires the container to have access to Jenkins internals, which may not be possible from a standard Podman container. Typically, Jenkins API calls require the script to run with elevated privileges or on the Jenkins master. This could break the "pipeline_stop" step. -
Resource limits may be too restrictive: The
limitsare set to 256Mi memory and 100m CPU. If the Groovy script performs any non-trivial operations (parsing Jenkins job metadata, running git commands, etc.), these limits may be insufficient and cause OOM kills or throttling. Consider profiling the actual resource usage or aligning with other pipelines that use more generous limits. -
Inconsistency with other CI files: Other matrix files in
.ci/(e.g.,matrix_job.yaml,release_matrix_job.yaml) use purpose-built Docker images fromharbor.mellanox.comwith pre-installed toolchains and theswx-jenkinsuser. Using a generic Podman image here diverges from that pattern and may introduce environment inconsistencies or credential issues. -
No
jnlpImagespecified for the Docker step: Thekubernetessection definesjnlpImage: 'harbor.mellanox.com/toolbox/c3po-jnlp:latest'for the agent, but theruns_on_dockersentry does not specify a separate JNLP sidecar. If the pipeline framework expects each Docker step to have its own JNLP container, this may cause connectivity issues.
Confidence Score: 2/5
The change correctly addresses the security requirement to move off the master node, but there are multiple execution risks (git availability, Jenkins API access, resource limits) that could cause the pipeline to fail. The use of a generic Podman image instead of a purpose-built CI image is inconsistent with the rest of the .ci/ directory and suggests the change may not have been fully tested in the target environment.
1 file reviewed, no comments
b522a02 to
0b7d027
Compare
|
Swapped image to |
0b7d027 to
0da0ff0
Compare
0da0ff0 to
10701bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR migrates the disaster-recovery (DR) weekly CI pipeline from executing on the Jenkins master node to running in Kubernetes containers. The runs_on_agents section with nodeLabel: 'master' has been replaced with a runs_on_dockers configuration that deploys a c3po-jnlp Jenkins agent image in a Kubernetes pod (namespace runai-hpcx-nightly-prod-sw, cloud kubernetes). Additionally, all agentSelector references pointing to the "base" category have been removed from pipeline steps, since the unified Docker-based approach no longer requires explicit agent selection. This change eliminates the security risk of running CI steps directly on the master node, which is scheduled for decommissioning per HPCINFRA-3388.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
.ci/pipeline/dr_weekly_matrix.yaml |
4/5 | Replaced master node agent with Kubernetes pod running c3po-jnlp container and removed agentSelector references from pipeline steps |
Confidence score: 4/5
- This PR is safe to merge with careful attention to the container image URL
- Score reflects a straightforward configuration migration with clear intent, but relies on an external dependency (the c3po-jnlp image) and a previous review noted the Harbor URL may not work during disaster-recovery scenarios requiring geoDNS URLs
- Pay close attention to the Docker image reference at line 17 to ensure it resolves correctly in both normal operations and disaster-recovery situations
Sequence Diagram
sequenceDiagram
participant User
participant Jenkins
participant Kubernetes
participant GroovyContainer
participant ReleaseJob as LibXLIO-release Job
participant Email as Email System
User->>Jenkins: Trigger LIBXLIO-dr-weekly-launcher
Jenkins->>Kubernetes: Request pod with groovy container
Kubernetes->>GroovyContainer: Deploy nbu-harbor.gtm.nvidia.com/toolbox/c3po-jnlp:latest
alt MAIL_TO parameter provided
GroovyContainer->>Email: Send "weekly DR build started" email
end
GroovyContainer->>GroovyContainer: Determine release_tag from params or git
GroovyContainer->>Jenkins: Check if LibXLIO-release job is enabled
alt Job is disabled
GroovyContainer->>ReleaseJob: Enable job
end
GroovyContainer->>ReleaseJob: Trigger build with parameters (release_tag, release_folder, notification_email, sha1)
ReleaseJob-->>GroovyContainer: Return build result
alt Build failed
GroovyContainer->>Jenkins: Set currentBuild.result = 'FAILURE'
end
alt MAIL_TO parameter provided
GroovyContainer->>Email: Send "weekly DR build ended" email with result
end
Jenkins-->>User: Build complete
1 file reviewed, no comments
10701bb to
be27701
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR migrates the weekly disaster-recovery (DR) CI pipeline away from the Jenkins master node to container-based execution for security compliance (HPCINFRA-3388). The changes delete obsolete JJB and matrix files that were tied to the master node, add a new do_dr_build parameter to the release job to optionally trigger DR builds from the containerized pipeline, update the release matrix to auto-determine release tags for DR runs, and ensure the release script checks out tags explicitly when running in ephemeral containers. The net effect is that all DR and release workflows now execute exclusively in Docker containers instead of the persistent master environment, which is scheduled for decommission.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| .ci/pipeline/dr_weekly_jjb.yaml | 5/5 | Deleted entire JJB template that ran daily DR launcher on master node |
| .ci/pipeline/dr_weekly_matrix.yaml | 4/5 | Removed weekly DR matrix file with master nodeLabel and job orchestration logic |
| .ci/pipeline/release_jjb.yaml | 5/5 | Added new boolean parameter do_dr_build to release job template |
| .ci/pipeline/release_matrix_job.yaml | 3/5 | Updated jnlpImage registry and added conditional release_tag determination for DR builds |
| .ci/do_release.sh | 4/5 | Added explicit git checkout of release tag before version extraction |
Confidence score: 3/5
- This PR requires careful review as critical DR automation logic is being restructured with potential gaps in coverage.
- Score reflects the deletion of substantial orchestration and notification logic in dr_weekly_matrix.yaml without clear evidence it was recreated elsewhere, the use of a harbor registry URL that a previous reviewer flagged as incompatible with DR failover scenarios (needs geoDNS URL), and the missing error handling in do_release.sh for the new git checkout command. The auto-tag determination logic in release_matrix_job.yaml also risks picking the wrong tag if the repository state is unexpected.
- Pay close attention to .ci/pipeline/release_matrix_job.yaml (registry URL must support geoDNS for DR), dr_weekly_matrix.yaml (verify email notifications and job orchestration are preserved elsewhere), and .ci/do_release.sh (add error handling for tag checkout).
Sequence Diagram
sequenceDiagram
participant User
participant Jenkins
participant Git
participant Docker
participant NFS as "NFS Release Storage"
User->>Jenkins: "Trigger release job with release_tag, revision, release_folder"
Jenkins->>Jenkins: "Execute pipeline_start (set displayName, check do_dr_build)"
Jenkins->>Docker: "Start privileged container (rhel8.6)"
Docker->>Git: "git clone libdpcp repository"
Git-->>Docker: "libdpcp source code"
Docker->>Docker: "Build libdpcp (autogen, configure, make install)"
Docker->>Git: "git checkout tags/${release_tag}"
Git-->>Docker: "Repository at release_tag"
Docker->>Docker: "Parse version from configure.ac"
Docker->>Docker: "Validate release_tag matches configure.ac version"
Docker->>Docker: "Build source RPM and tarball (build_pkg.sh -s -t)"
Docker->>Docker: "Check if do_release=true"
alt do_release is true
Docker->>NFS: "Create destination directory ${release_folder}/${release_tag}"
Docker->>NFS: "Copy ${pkg_name} to ${DST_DIR}"
Docker->>NFS: "Copy ${tarball_name} to ${DST_DIR}"
Docker->>NFS: "Create symlink to ${pkg_name}"
NFS-->>Docker: "Release artifacts stored"
else do_release is false
Docker->>Docker: "Skip package release"
end
Docker-->>Jenkins: "Build artifacts (RPM, logs)"
Jenkins->>Jenkins: "Archive artifacts (build_pkg.log, packages/*.rpm)"
Jenkins->>Jenkins: "Execute pipeline_stop"
Jenkins->>User: "Send email notification with build status"
5 files reviewed, 1 comment
be27701 to
1431ff6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR migrates the libxlio CI pipelines from running on the Jenkins master node to running exclusively in Docker containers for security compliance. The key changes consolidate the disaster-recovery (DR) weekly pipeline into the main release pipeline by deleting the standalone dr_weekly_jjb.yaml and dr_weekly_matrix.yaml files and adding a do_dr_build boolean parameter to release_jjb.yaml. When do_dr_build=true, the release pipeline now auto-detects the Git tag via git describe in release_matrix_job.yaml and adjusts email subjects accordingly. The do_release.sh script was updated to explicitly checkout the release tag before version extraction, ensuring the packaged code matches the tagged release. The JNLP agent image reference was updated to NVIDIA's internal registry (nbu-harbor.gtm.nvidia.com).
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
.ci/pipeline/dr_weekly_jjb.yaml |
1/5 | Completely deleted the standalone disaster-recovery weekly pipeline job template that ran on master node |
.ci/pipeline/dr_weekly_matrix.yaml |
1/5 | Completely removed the weekly DR pipeline orchestration matrix with no replacement in this PR |
.ci/pipeline/release_jjb.yaml |
4/5 | Added do_dr_build boolean parameter to enable dual-purpose release/DR pipeline behavior |
.ci/pipeline/release_matrix_job.yaml |
2/5 | Migrated to new container registry, added conditional tag detection for DR builds, and customized email subjects |
.ci/do_release.sh |
2/5 | Added explicit git tag checkout before version extraction to ensure correct code is packaged |
Confidence score: 2/5
- This PR has significant issues that could break CI functionality and requires careful review before merging.
- Score reflects: (1) complete deletion of DR weekly pipeline files without visible replacement, (2) unguarded parameter references in Groovy that will throw exceptions if
do_dr_buildis undefined, (3) unvalidated git checkout operations that can fail silently, (4) inconsistent handling of the new container registry URL (prior review noted geoDNS issues for DR scenarios), and (5) missing error handling for thegit describecommand that could set empty release tags. - Pay close attention to
release_matrix_job.yaml(lines 58-60, 72) anddo_release.sh(line47) - these contain unguarded operations that will fail at runtime. Verify that the DR weekly functionality has truly been moved elsewhere or document the intentional removal.
Sequence Diagram
sequenceDiagram
participant User
participant Jenkins
participant Git
participant Docker
participant Kubernetes
participant BuildSystem
participant ReleaseFolder
participant Email
User->>Jenkins: Trigger release job (with release_tag, revision, do_release params)
Jenkins->>Jenkins: Execute pipeline_start (set displayName, get tag if DR build)
Jenkins->>Git: Clone repository at specified tag/commit
Jenkins->>Kubernetes: Request privileged pod (8Gi memory, 7 CPU)
Kubernetes->>Docker: Spin up RHEL 8.6 container
Docker->>Git: Clone libdpcp repository
Docker->>BuildSystem: Run autogen.sh, configure, make install for libdpcp
Docker->>Git: Checkout release tag
Docker->>BuildSystem: Parse version from configure.ac
Docker->>BuildSystem: Execute build_pkg.sh to create src.rpm and tarball
Docker->>Docker: Validate release_tag matches configure.ac version
alt do_release is true
Docker->>ReleaseFolder: Create destination directory as swx-jenkins user
Docker->>ReleaseFolder: Copy src.rpm to release folder
Docker->>ReleaseFolder: Copy tarball to release folder
Docker->>ReleaseFolder: Create symbolic link to package
else do_release is false
Docker->>Docker: Skip package release
end
Docker->>Jenkins: Archive build artifacts (build_pkg.log, *.rpm)
Jenkins->>Jenkins: Execute pipeline_stop
alt notification_email is set
Jenkins->>Email: Send build completion notification with status
end
Jenkins->>User: Report build result
Additional Comments (1)
-
.ci/pipeline/dr_weekly_matrix.yamllogic: the entire file was deleted - this completely breaks the weekly DR (disaster recovery) pipeline. The referenced Jira ticket HPCINFRA-3388 only mentions migrating from master nodes to containers, not removing the pipeline entirely. Unless this functionality was moved to a different file, the weekly DR builds will no longer run. Was this functionality moved to another file (e.g., merged into another matrix file), or is the weekly DR pipeline intentionally being removed?
5 files reviewed, 4 comments
1431ff6 to
fd65f49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR migrates libxlio's disaster-recovery (DR) weekly and release pipelines from master-node execution to container-based execution, addressing security requirements in HPCINFRA-3388. The changes delete two legacy DR pipeline files (dr_weekly_jjb.yaml, dr_weekly_matrix.yaml), consolidate DR and release workflows by adding a do_dr_build parameter, and modify the release matrix and JJB configurations to run exclusively in Docker containers. The do_release.sh script now explicitly checks out the release tag before version validation, making it self-contained for ephemeral container environments. These changes align with the broader .ci/ infrastructure documented in the repository, which uses Jenkins Job Builder (JJB) templates, matrix YAML files, and Docker images to define reproducible CI/CD pipelines.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
.ci/pipeline/dr_weekly_jjb.yaml |
4/5 | Entire JJB template for weekly DR launcher deleted; functionality must be reimplemented elsewhere |
.ci/do_release.sh |
3/5 | Adds explicit git checkout of release tag before version extraction, but lacks error handling for invalid tags |
.ci/pipeline/dr_weekly_matrix.yaml |
4/5 | Complete removal of weekly DR matrix definition that ran on master node |
.ci/pipeline/release_matrix_job.yaml |
3/5 | Updates harbor registry URL and adds conditional git-describe logic for DR builds, but missing error handling |
.ci/pipeline/release_jjb.yaml |
2/5 | Changes default sha1 to 'vNext' and adds do_dr_build parameter, breaking manual release workflow expectations |
Confidence score: 2/5
- This PR introduces breaking changes to the manual release workflow and has multiple unhandled edge cases that could cause silent failures in production pipelines
- Score reduced because: (1) the
sha1default change inrelease_jjb.yamldiverges from documented behavior and will cause manual releases to check out the wrong commit unless users override parameters; (2) no error handling for git operations (git checkout,git describe) that can fail silently; (3) harbor registry URL may not work in disaster-recovery scenarios (per previous review feedback); (4) missing validation fordo_dr_buildparameter existence - Pay close attention to
.ci/pipeline/release_jjb.yaml(breaking default change),.ci/do_release.sh(needs tag validation), and.ci/pipeline/release_matrix_job.yaml(needs error handling for git describe failure)
Sequence Diagram
sequenceDiagram
participant User
participant Jenkins
participant Git
participant Docker
participant Build System
participant NFS Release Folder
participant Email Service
User->>Jenkins: Trigger release job with parameters (release_tag, revision, do_release)
Jenkins->>Jenkins: Execute pipeline_start (update display name)
alt do_dr_build is true
Jenkins->>Git: Get latest tag (git describe)
Git-->>Jenkins: Return tag
end
Jenkins->>Docker: Start privileged Kubernetes pod (rhel8.6 container)
Docker-->>Jenkins: Container ready
Jenkins->>Build System: Execute Build-dpcp step
Build System->>Git: Clone libdpcp repository
Git-->>Build System: Repository cloned
Build System->>Build System: ./autogen.sh && ./configure --prefix=/usr
Build System->>Build System: sudo make install
Jenkins->>Build System: Execute Release step (.ci/do_release.sh)
Build System->>Git: Checkout release tag
Build System->>Build System: Parse configure.ac for version
Build System->>Build System: Validate tag matches version
Build System->>Build System: Build packages (contrib/build_pkg.sh -s -t)
alt do_release is true
Build System->>NFS Release Folder: Create destination directory
Build System->>NFS Release Folder: Copy src.rpm package
Build System->>NFS Release Folder: Copy tarball
Build System->>NFS Release Folder: Create symbolic link
NFS Release Folder-->>Build System: Packages released
end
Build System-->>Jenkins: Packages built and released
Jenkins->>Jenkins: Archive artifacts (build_pkg.log, *.rpm)
Jenkins->>Jenkins: Execute pipeline_stop
alt notification_email is not empty
Jenkins->>Email Service: Send notification email with build status
Email Service-->>User: Email delivered
end
Jenkins-->>User: Release job completed
5 files reviewed, 3 comments
fd65f49 to
e8dd249
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR migrates the XLIO CI pipelines to eliminate execution on the Jenkins master node, addressing security requirements per HPCINFRA-3388. The changes remove two legacy weekly disaster-recovery (DR) job definitions (dr_weekly_jjb.yaml, dr_weekly_matrix.yaml) that relied on nodeLabel: master, consolidate DR functionality into the existing release pipeline by introducing a do_dr_build parameter, migrate the JNLP agent image to the new nbu-harbor.gtm.nvidia.com registry, and add an explicit git checkout tags/${release_tag} in do_release.sh to ensure version extraction occurs from the correct commit. The DR and release workflows are now unified in release_matrix_job.yaml and release_jjb.yaml, with conditional logic to distinguish between DR builds (which auto-detect the latest tag via git describe) and manual release builds (which use the release_tag parameter).
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
.ci/pipeline/dr_weekly_jjb.yaml |
5/5 | Deleted the JJB template for the weekly DR job; functionality consolidated elsewhere |
.ci/pipeline/dr_weekly_matrix.yaml |
4/5 | Deleted the matrix definition for weekly DR launcher; orchestration logic must now exist in other files |
.ci/pipeline/release_matrix_job.yaml |
3/5 | Updated JNLP registry, added conditional logic for DR builds, customized email subjects |
.ci/pipeline/release_jjb.yaml |
2/5 | Added do_dr_build parameter but changed default branch to 'vNext', breaking release workflow expectations |
.ci/do_release.sh |
4/5 | Added explicit checkout of release tag to ensure version validation reads from correct commit |
Confidence score: 2/5
- This PR introduces significant risk due to the
release_jjb.yamlchanges that alter the default checkout behavior for release builds, potentially breaking manual release workflows if not carefully coordinated. - Score reflects three critical unresolved issues: (1) the default
sha1change to 'vNext' inrelease_jjb.yamlwill cause release builds to checkout the wrong branch by default, contradicting the parameter description; (2) the geoDNS harbor URL concern raised in previous review remains unaddressed, which will break DR builds in PDC; (3) the complete removal ofdr_weekly_matrix.yamlwithout visible replacement logic leaves uncertainty about whether email notifications, job orchestration, and release-tag determination have been properly migrated to the consolidated pipeline. - Pay close attention to
.ci/pipeline/release_jjb.yaml(verify whether 'vNext' default is intentional and whether conditional logic is missing) and.ci/pipeline/release_matrix_job.yaml(confirm the harbor URL works in all environments and that thegit describeerror handling is sufficient for empty repositories or missing tags).
Sequence Diagram
sequenceDiagram
participant User
participant Jenkins
participant KubernetesPod as "Kubernetes Pod<br/>(il-ipp-blossom-prod)"
participant DockerContainer as "Docker Container<br/>(rhel8.6)"
participant GitHub
participant ReleaseFolder as "Release Folder<br/>(/auto/sw/release)"
User->>Jenkins: "Trigger release job<br/>(release_tag, revision)"
Jenkins->>GitHub: "Checkout code from tag"
GitHub-->>Jenkins: "Repository content"
Jenkins->>KubernetesPod: "Create privileged pod<br/>(swx-media namespace)"
KubernetesPod->>DockerContainer: "Launch container<br/>(nbu-harbor registry)"
DockerContainer->>DockerContainer: "Clone libdpcp from GitHub"
DockerContainer->>DockerContainer: "Build and install libdpcp"
DockerContainer->>DockerContainer: "Execute .ci/do_release.sh"
DockerContainer->>DockerContainer: "Validate version matches tag"
DockerContainer->>DockerContainer: "Build source RPM and tarball"
alt do_release == true
DockerContainer->>ReleaseFolder: "Create release directory"
DockerContainer->>ReleaseFolder: "Copy .src.rpm and .tar.gz"
DockerContainer->>ReleaseFolder: "Create symbolic link"
end
DockerContainer-->>Jenkins: "Archive artifacts<br/>(build_pkg.log, *.rpm)"
Jenkins->>User: "Send email notification<br/>(success/failure)"
5 files reviewed, 1 comment
e8dd249 to
399ce6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR consolidates CI infrastructure to eliminate the security risk of running jobs on the master Jenkins node by deleting the DR weekly standalone pipeline and folding its functionality into the existing release pipeline via a new do_dr_build parameter.
Key changes:
- Deleted
dr_weekly_jjb.yamlanddr_weekly_matrix.yamlthat usednodeLabel: 'master' - Added
do_dr_buildparameter to release pipeline to support both regular releases and DR weekly builds - Changed default
sha1from'tags/$release_tag'to'vNext'inrelease_jjb.yaml:24 - Updated jnlpImage registry from
harbor.mellanox.comtonbu-harbor.gtm.nvidia.cominrelease_matrix_job.yaml:18 - Added logic to dynamically determine
release_tagfromgit describewhendo_dr_build=true - Disabled shallow clone and set depth=2 to support tag fetching
Critical issue:
- The added
git checkout tags/${release_tag}atrelease_matrix_job.yaml:51runs unconditionally for all builds, but will fail for regular release builds when the default branch is 'vNext' and the tag hasn't been merged there yet
Confidence Score: 2/5
- This PR has a critical logic error that will break regular release builds
- The unconditional
git checkout tags/${release_tag}will fail for normal release builds when the default branch is 'vNext' and the tag exists only on master or another branch. The change from'tags/$release_tag'to'vNext'as the default sha1 creates a mismatch where the pipeline checks out vNext but then tries to checkout a tag that may not exist in that branch's history - Pay close attention to
.ci/pipeline/release_matrix_job.yamlwhich has the logic error that needs fixing
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| .ci/pipeline/release_jjb.yaml | 3/5 | Added do_dr_build parameter, changed default sha1 from 'tags/$release_tag' to 'vNext', disabled shallow clone |
| .ci/pipeline/release_matrix_job.yaml | 2/5 | Added git checkout command, do_dr_build logic, and changed jnlpImage registry from harbor.mellanox.com to nbu-harbor.gtm.nvidia.com |
Sequence Diagram
sequenceDiagram
participant User
participant JJB as Jenkins Job Builder
participant Pipeline as Release Pipeline
participant Container as Docker Container
participant Git as Git Repository
participant Release as Release Folder
alt DR Weekly Build
User->>JJB: Trigger with do_dr_build=true
JJB->>Pipeline: Start on vNext branch
Pipeline->>Git: Clone vNext (depth=2, no shallow)
Pipeline->>Container: Start in nbu-harbor.gtm.nvidia.com/toolbox/c3po-jnlp
Container->>Git: git describe --tags --abbrev=0
Git-->>Container: Return latest tag (e.g. 3.50.3)
Container->>Container: Set release_tag env var
Container->>Git: git checkout tags/${release_tag}
Container->>Container: Run do_release.sh
Container->>Release: Copy packages to /auto/sw/release/
Pipeline->>User: Email "Weekly DR build ended"
else Regular Release Build
User->>JJB: Trigger with release_tag parameter
JJB->>Pipeline: Start on vNext branch (default)
Pipeline->>Git: Clone vNext (depth=2, no shallow)
Pipeline->>Container: Start in nbu-harbor.gtm.nvidia.com/toolbox/c3po-jnlp
Container->>Git: git checkout tags/${release_tag}
Note over Container,Git: This will fail if tag not in vNext!
Container->>Container: Run do_release.sh
Container->>Release: Copy packages to /auto/sw/release/
Pipeline->>User: Email "Release build ended"
end
4 files reviewed, 1 comment
399ce6f to
f8a1b2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR consolidates the DR weekly build pipeline into the existing release pipeline by deleting the standalone DR weekly files and adding a do_dr_build boolean parameter to control behavior.
Key Changes:
- Removed dedicated
dr_weekly_jjb.yamlanddr_weekly_matrix.yamlfiles - Added
do_dr_buildparameter to release pipeline to distinguish DR builds from regular releases - Updated container image registry from
harbor.mellanox.comtonbu-harbor.gtm.nvidia.com - Changed default
sha1checkout fromtags/$release_tagtovNext - Added explicit
git checkout tags/${release_tag}step in release script - Changed notification email subject to differentiate DR vs Release builds
Critical Issues Found:
- The unconditional
git checkout tags/${release_tag}on line 51 ofrelease_matrix_job.yamlwill fail for regular release builds because the JJB now checks outvNextby default (where the tag doesn't exist yet), breaking the normal release workflow - The change in default
sha1fromtags/$release_tagtovNextinrelease_jjb.yamlcreates a breaking change for manual release builds where users must now explicitly set both parameters
Confidence Score: 1/5
- This PR will break regular release builds due to incorrect git checkout logic
- Score reflects critical logic error where unconditional git checkout of tag will fail for non-DR builds. The default
sha1change fromtags/$release_tagtovNextcombined with unconditional checkout means regular releases will attempt to checkout a tag that doesn't exist on the vNext branch, causing immediate failure release_matrix_job.yamlandrelease_jjb.yamlrequire immediate fixes before merge
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| .ci/pipeline/release_jjb.yaml | 2/5 | Changed default sha1 from 'tags/$release_tag' to 'vNext', added do_dr_build param, disabled shallow clone - introduces breaking change for manual release builds |
| .ci/pipeline/release_matrix_job.yaml | 1/5 | Added git checkout and conditional release_tag resolution, changed image registry - has critical bug where checkout will fail for non-DR builds on vNext branch |
Sequence Diagram
sequenceDiagram
participant User
participant Jenkins
participant ReleaseJob as LibXLIO-release
participant Container as Kubernetes Container
participant Git
alt DR Weekly Build (do_dr_build=true)
User->>Jenkins: Trigger with do_dr_build=true
Jenkins->>ReleaseJob: Checkout vNext (sha1 default)
ReleaseJob->>Git: git describe --tags --abbrev=0
Git-->>ReleaseJob: returns latest tag
ReleaseJob->>Container: Start build with release_tag
Container->>Git: git checkout tags/${release_tag}
Container->>Container: Build dpcp dependency
Container->>Container: Run do_release.sh
Container-->>ReleaseJob: Build artifacts
ReleaseJob->>User: Email "Weekly DR build ended"
else Regular Release (do_dr_build=false)
User->>Jenkins: Trigger with release_tag param
Jenkins->>ReleaseJob: Checkout vNext (sha1 default)
ReleaseJob->>Container: Start build with release_tag
Container->>Git: git checkout tags/${release_tag}
Note over Container,Git: ERROR: Tag doesn't exist on vNext branch!
Container->>Container: Build dpcp dependency
Container->>Container: Run do_release.sh
Container-->>ReleaseJob: Build artifacts
ReleaseJob->>User: Email "Release build ended"
end
4 files reviewed, 2 comments
|
bot:retest |
3 similar comments
|
bot:retest |
|
bot:retest |
|
bot:retest |
f8a1b2a to
7695ee8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR consolidates DR weekly and release builds into a single pipeline that runs on Kubernetes containers instead of the master node, improving security posture.
Key changes:
- Deleted separate DR weekly pipeline files (
dr_weekly_jjb.yaml,dr_weekly_matrix.yaml) that usednodeLabel: 'master' - Added
do_dr_buildboolean parameter to existing release pipeline to handle both release and DR builds - Updated container image registry from
harbor.mellanox.comtonbu-harbor.gtm.nvidia.com - Changed git clone from shallow (
shallow-clone: true) to non-shallow (shallow-clone: false, depth: 2)
Issues found:
- Critical: refspec in
release_jjb.yamlmissing tags reference - DR builds will fail when attempting to checkouttags/${release_tag}
Confidence Score: 3/5
- This PR addresses the security concern but has a critical bug that will break DR builds
- The refactoring successfully eliminates the master node security risk, but the missing tags refspec in release_jjb.yaml:58 will cause DR builds to fail when attempting to checkout tags. Regular release builds should continue working.
- Pay close attention to
.ci/pipeline/release_jjb.yaml- the refspec must include tags for DR builds to function
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| .ci/pipeline/release_jjb.yaml | 3/5 | Added do_dr_build parameter and changed shallow-clone to false; refspec missing tags which breaks DR builds |
| .ci/pipeline/release_matrix_job.yaml | 3/5 | Added DR build logic with conditional tag checkout and dynamic release_tag; runs on Kubernetes containers instead of master node |
Sequence Diagram
sequenceDiagram
participant User
participant Jenkins
participant ReleasePipeline as Release Pipeline (Container)
participant Git
alt Regular Release Build (do_dr_build=false)
User->>Jenkins: Trigger with release_tag param
Jenkins->>Git: Checkout $sha1 (defaults to tags/$release_tag)
Jenkins->>ReleasePipeline: Run in Kubernetes container
ReleasePipeline->>ReleasePipeline: Skip tag checkout (do_dr_build=false)
ReleasePipeline->>ReleasePipeline: Run .ci/do_release.sh
ReleasePipeline->>User: Send release notification email
end
alt DR Weekly Build (do_dr_build=true)
User->>Jenkins: Trigger with do_dr_build=true
Jenkins->>Git: Checkout $sha1 (uses vNext branch)
Jenkins->>ReleasePipeline: Run in Kubernetes container
ReleasePipeline->>Git: git describe --tags to get release_tag
ReleasePipeline->>Git: git checkout tags/${release_tag}
ReleasePipeline->>ReleasePipeline: Run .ci/do_release.sh
ReleasePipeline->>User: Send Weekly DR notification email
end
Note over Jenkins,ReleasePipeline: Previously: DR builds ran on master node<br/>Now: All builds run in containers
4 files reviewed, 1 comment
7695ee8 to
ceb8e9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Consolidates DR weekly builds into the release pipeline by removing standalone DR job files and adding a do_dr_build flag to toggle between release and DR weekly builds.
Major changes:
- Deleted
dr_weekly_jjb.yamlanddr_weekly_matrix.yamlthat ran on master node (nodeLabel: 'master') - Added
do_dr_buildboolean parameter torelease_jjb.yamlfor DR build mode - Updated registry from
harbor.mellanox.comtonbu-harbor.gtm.nvidia.com - Modified release pipeline to conditionally checkout tags and determine release_tag for DR builds
Critical issues found:
- Shell variable syntax error on line 51 of
release_matrix_job.yaml-${do_dr_build}won't work in bash context - Git refspec in
release_jjb.yamlmissing tags/branches, breaking DR builds that need to checkout tags git describewill fail due to missing tags from incomplete refspec
Confidence Score: 1/5
- This PR will break DR weekly builds due to syntax errors and missing git refspec configuration
- Score reflects critical blocking issues: shell variable syntax incompatibility will cause immediate script failure, and incomplete git refspec prevents tag checkout operations required for DR builds. While the security goal of removing master node usage is correct, the implementation has fundamental errors that will prevent the DR build feature from functioning
- Pay close attention to
release_jjb.yaml(git refspec) andrelease_matrix_job.yaml(shell syntax)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| .ci/pipeline/release_jjb.yaml | 1/5 | Added do_dr_build flag, disabled shallow-clone, but refspec missing tags causing DR builds to fail |
| .ci/pipeline/release_matrix_job.yaml | 1/5 | Added DR build logic and updated registry, but shell variable syntax error will cause failures |
Sequence Diagram
sequenceDiagram
participant User
participant Jenkins
participant ReleaseJJB as release_jjb.yaml
participant ReleaseMatrix as release_matrix_job.yaml
participant Git
participant Container as Docker Container
User->>Jenkins: Trigger release pipeline (do_dr_build=true/false)
Jenkins->>ReleaseJJB: Load job config
ReleaseJJB->>Git: Fetch (refspec: PRs only)
Note over ReleaseJJB,Git: Missing tags & branches!
ReleaseJJB->>Container: Spawn container (nbu-harbor registry)
Container->>ReleaseMatrix: Execute pipeline_start
alt do_dr_build == true
ReleaseMatrix->>Git: git describe --tags --abbrev=0
Note over ReleaseMatrix,Git: Fails - no tags fetched!
Git-->>ReleaseMatrix: release_tag
end
ReleaseMatrix->>ReleaseMatrix: Build-dpcp step
ReleaseMatrix->>ReleaseMatrix: Release step
alt do_dr_build == true
ReleaseMatrix->>Git: git checkout tags/${release_tag}
Note over ReleaseMatrix,Git: Syntax error: ${do_dr_build}
end
ReleaseMatrix->>ReleaseMatrix: Execute do_release.sh
ReleaseMatrix->>User: Send email (Weekly DR / Release)
4 files reviewed, 3 comments
|
bot:retest |
ceb8e9f to
c72b3aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Consolidates DR weekly builds into the release pipeline by removing separate DR pipeline files and adding a do_dr_build boolean parameter. Also updates container registry from harbor.mellanox.com to nbu-harbor.gtm.nvidia.com.
Major changes:
- Deleted
dr_weekly_jjb.yamlanddr_weekly_matrix.yaml(previously ran on master node) - Added
do_dr_buildparameter to release pipeline to support both release and DR builds - Updated jnlp container image registry path
- Modified pipeline to conditionally checkout tags and set release_tag based on build type
- Updated email subject to differentiate between "Weekly DR" and "Release" builds
Critical issues found:
- Git refspec in
release_jjb.yaml:58only fetches PRs, missing tags and branches needed for DR builds - Shell conditional syntax error in
release_matrix_job.yaml:51- uses bash syntax${do_dr_build}for Jenkins parameter git describe --tagswill fail because tags aren't fetched by the refspec
Confidence Score: 1/5
- This PR has critical logic errors that will break DR weekly builds in production
- The DR build workflow will fail immediately due to missing git refs (tags not fetched) and incorrect shell conditional syntax. The refspec only fetches PRs but DR builds require tags for both checkout and
git describeoperations. These are blocking issues that must be fixed before merge. .ci/pipeline/release_jjb.yamland.ci/pipeline/release_matrix_job.yamlboth require fixes before DR builds will work
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| .ci/pipeline/release_jjb.yaml | 1/5 | Added do_dr_build parameter but refspec incomplete - missing tags/branches needed for DR builds to work |
| .ci/pipeline/release_matrix_job.yaml | 1/5 | Added DR build logic and updated container registry, but shell conditional syntax incorrect and depends on missing git refs |
Sequence Diagram
sequenceDiagram
participant User
participant Jenkins
participant GitRepo as Git Repository
participant Container as K8s Container
participant ReleaseFolder as Release Folder
alt DR Weekly Build (do_dr_build=true)
User->>Jenkins: Trigger with do_dr_build=true
Jenkins->>GitRepo: Fetch refs (PRs only - missing tags)
Note over Jenkins,GitRepo: ⚠️ Refspec incomplete
Jenkins->>Jenkins: Run pipeline_start
Jenkins->>Jenkins: git describe --tags --abbrev=0
Note over Jenkins: ⚠️ Fails - no tags fetched
Jenkins->>Container: Spawn build container
Container->>Container: Build libdpcp
Container->>GitRepo: git checkout tags/${release_tag}
Note over Container: ⚠️ Syntax error in conditional
Container->>Container: Run .ci/do_release.sh
Container->>ReleaseFolder: Copy packages (if do_release=true)
Container->>Jenkins: Report results
Jenkins->>User: Send "Weekly DR build ended" email
else Regular Release (do_dr_build=false)
User->>Jenkins: Trigger with release_tag
Jenkins->>GitRepo: Checkout tags/$release_tag
Jenkins->>Container: Spawn build container
Container->>Container: Build libdpcp
Container->>Container: Run .ci/do_release.sh (no git checkout)
Container->>ReleaseFolder: Copy packages
Container->>Jenkins: Report results
Jenkins->>User: Send "Release build ended" email
end
4 files reviewed, 3 comments
dpressle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your commit message is not good i want to see something about DR weekly release test job not running on master
c72b3aa to
ab05353
Compare
|
Added mention that the DR weekly now runs on the release pipeline and without running on master: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR consolidates the DR weekly build pipeline into the release pipeline by removing separate DR pipeline files that ran on the master node, and adding a do_dr_build boolean parameter to the release pipeline to support both workflows securely within containers.
Key changes:
- Removed
dr_weekly_jjb.yamlanddr_weekly_matrix.yamlfiles that usednodeLabel: 'master'(security risk) - Added
do_dr_buildparameter torelease_jjb.yamlto enable DR build mode - Updated
release_matrix_job.yamlto conditionally checkout tags and determine release tag for DR builds - Changed jnlpImage from harbor.mellanox.com to nbu-harbor.gtm.nvidia.com
Critical issue found:
The refspec in release_jjb.yaml:58 only fetches PRs (+refs/pull/*:refs/remotes/origin/pr/*), but DR builds require tags for:
git checkout tags/${release_tag}on line 52 ofrelease_matrix_job.yamlgit describe --tags --abbrev=0on line 62 ofrelease_matrix_job.yaml
Without tags in the refspec, DR builds (do_dr_build=true) will fail with "pathspec 'tags/x.x.x' did not match any file(s)" errors.
Confidence Score: 0/5
- This PR cannot be merged - DR builds will fail immediately due to missing tags
- The refspec configuration is missing tag fetching, which breaks the core DR build functionality added by this PR. The
git checkout tags/${release_tag}andgit describe --tagscommands will fail because no tags are available in the repository after checkout. .ci/pipeline/release_jjb.yamlrequires immediate attention - the refspec must include tags for DR builds to function
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| .ci/pipeline/release_jjb.yaml | 0/5 | Added do_dr_build parameter but refspec only fetches PRs - DR builds will fail when trying to checkout/describe tags |
| .ci/pipeline/release_matrix_job.yaml | 2/5 | Added DR build logic and updated jnlpImage, but git operations depend on tags that won't be fetched by JJB refspec |
Sequence Diagram
sequenceDiagram
participant User
participant JJB as Jenkins Job Builder
participant SCM as Git SCM
participant Pipeline as release_matrix_job.yaml
participant Script as do_release.sh
User->>JJB: Trigger with do_dr_build=true
JJB->>SCM: Fetch with refspec (PRs only)
Note over SCM: ❌ Tags not fetched!
SCM-->>Pipeline: Checkout $sha1
alt do_dr_build == true
Pipeline->>SCM: git checkout tags/${release_tag}
Note over SCM: ❌ Fails - tag not fetched
Pipeline->>SCM: git describe --tags --abbrev=0
Note over SCM: ❌ Fails - no tags available
end
Pipeline->>Script: Execute do_release.sh
Script->>Script: Build packages
4 files reviewed, 1 comment
Today we have a few CI pipelines that still use nodeLabel: 'master' in their matrix file, causing some steps to run on the master node. This is a security risk and we should run only on containers, the master node will be taken down in the future not allowing this type of run. Removed weekly dr job that uses nodelabel definition, move this ability to the release pipeline Weekly DR test job now runs using the release pipeline, which doesnt use the master node (runs on containers only) Signed-off-by: NirWolfer <[email protected]>
ab05353 to
dfc5491
Compare
Greptile OverviewGreptile SummaryThis PR successfully addresses the security requirement by removing Key changes:
Critical issues found:
Confidence Score: 2/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant Jenkins
participant JJB as release_jjb.yaml
participant Matrix as release_matrix_job.yaml
participant Git
participant Container as Kubernetes Pod
User->>Jenkins: Trigger release build
Note over User,Jenkins: Parameters: do_dr_build, release_tag, sha1
Jenkins->>JJB: Load job configuration
JJB->>Git: Fetch code (refspec: +refs/pull/*)
Note over JJB,Git: ⚠️ Only fetches PRs, missing tags!
Jenkins->>Matrix: Execute pipeline_start
alt do_dr_build == true
Matrix->>Git: git describe --tags --abbrev=0
Note over Matrix,Git: ⚠️ Will fail - no tags fetched!
Git-->>Matrix: Returns latest tag
Matrix->>Matrix: Set env.release_tag
end
Matrix->>Matrix: Update build display name
Jenkins->>Container: Provision Kubernetes pod
Note over Container: nodeSelector: kubernetes.io/arch=amd64<br/>Image: nbu-harbor.gtm.nvidia.com/toolbox/c3po-jnlp
Container->>Container: Build-dpcp step
Container->>Git: Clone libdpcp
Container->>Container: Build and install libdpcp
Container->>Container: Release step
alt do_dr_build == true
Container->>Git: git checkout tags/${release_tag}
Note over Container,Git: ⚠️ Will fail - tags not in refspec!
end
Container->>Container: Execute do_release.sh
Container->>Container: Archive artifacts
Jenkins->>Matrix: Execute pipeline_stop
Matrix->>User: Send notification email
Note over Matrix,User: Subject includes "Weekly DR" or "Release"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 2 comments
|
bot:retest |
Description
Today we have a few CI pipelines that still use nodeLabel: 'master' in their matrix file, causing some steps to run on the master node.
This is a security risk and we should run only on containers, the master node will be taken down in the future not allowing this type of run.
What
Replace master agent definition with docker definition
Why ?
HPCINFRA-3388
How ?
It is optional but for complex PRs please provide information about the design,
architecture, approach, etc.
Change type
What kind of change does this PR introduce?
Check list