Skip to content

Conversation

@ntsemah
Copy link

@ntsemah ntsemah commented Nov 6, 2025

Description

The compiler step currently takes a long time to run (about 20–40 minutes).

What

Add a separate step for each compiler to run in parallel

Why ?

HPCINFRA-4016

How ?

It is optional but for complex PRs please provide information about the design,
architecture, approach, etc.

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds a new specialized Docker container for the compiler CI step by copying compiler modulefiles locally to reduce NFS dependency during the build process.

  • Created Dockerfile.rhel8.6.compiler that copies modulefiles from NFS to container image
  • Updated matrix_job.yaml to define new compiler container and switch the Compiler step from toolbox to dedicated compiler image
  • Minor whitespace cleanup in DOCA installation steps

Critical Issue: The Dockerfile only copies modulefiles (metadata) but not the actual compiler binaries. When module load executes during testing, it will still need to access compiler binaries from /hpc/local/... via NFS, defeating the optimization's purpose. The modulefiles contain paths to binaries that must either be copied into the container or already present in the base image.

Confidence Score: 1/5

  • This PR has a critical logic flaw that prevents it from achieving its stated goal
  • The implementation only copies compiler modulefiles but not the actual compiler binaries, so NFS access will still be required during the compiler test execution, failing to deliver the promised performance improvement
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler requires significant changes to copy compiler binaries or use a base image that includes them

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 1/5 New Dockerfile that copies compiler modulefiles but not the actual compiler binaries, which won't achieve the goal of avoiding NFS access
.ci/matrix_job.yaml 4/5 Adds new compiler container definition and switches compiler step from toolbox to dedicated compiler container, plus minor whitespace fixes

Sequence Diagram

sequenceDiagram
    participant CI as CI Pipeline
    participant Docker as Docker Build
    participant CompilerImg as Compiler Container
    participant NFS as NFS (/hpc/local)
    participant Test as Compiler Test Script
    
    CI->>Docker: Build Dockerfile.rhel8.6.compiler
    Docker->>NFS: Access /hpc/local/etc/modulefiles/
    NFS-->>Docker: Copy modulefiles to /opt/local/modulefiles
    Docker->>CompilerImg: Create image with modulefiles
    
    CI->>CompilerImg: Run compiler step
    CompilerImg->>Test: Execute contrib/jenkins_tests/compiler.sh
    Test->>Test: Loop through compiler_list
    Test->>CompilerImg: module load dev/clang-9.0.1
    CompilerImg->>NFS: Read modulefile and access compiler binaries
    Note over CompilerImg,NFS: Still requires NFS access<br/>for actual compiler binaries
    NFS-->>CompilerImg: Return compiler paths
    CompilerImg->>Test: Run configure && make with compiler
    Test->>NFS: Access compiler binaries during compilation
    NFS-->>Test: Execute compilation
Loading

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Introduces a new specialized Docker image (compiler:20251106) to optimize the compiler CI step by embedding compiler modulefiles and binaries locally, aiming to reduce NFS access during runtime and cut the 20-40 minute execution time.

Key changes:

  • Created Dockerfile.rhel8.6.compiler that copies compiler installations from /hpc/local into the image
  • Updated matrix_job.yaml to use the new compiler container instead of toolbox for compiler tests
  • Compiler list matches contrib/jenkins_tests/compiler.sh (6 compilers: clang-9.0.1, gcc-8.3.0/9.3.0/10.1.0, ics-18.0.4/19.1.1)

Critical issue:
The Dockerfile's COPY instructions access /hpc/local on the build host during docker build, not from runtime volume mounts. If /hpc/local is NFS-mounted, the build will still hit NFS, failing to achieve the optimization goal. The build will also fail if /hpc/local isn't accessible on the build host.

Confidence Score: 2/5

  • This PR has a fundamental architectural issue that may prevent it from achieving its optimization goal or cause build failures
  • The approach has a critical flaw in how Docker COPY works - it accesses the build host's filesystem during image build, not runtime volumes. If /hpc/local is NFS-mounted on build hosts, this won't reduce NFS access as intended. Additionally, previous comments already flagged concerns about modulefiles referencing binaries and potential list mismatches. The compiler list consistency is correct, but the core optimization mechanism is questionable.
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler requires verification that the build-time COPY approach will work with the infrastructure

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 2/5 New Dockerfile that copies compiler modulefiles and binaries from /hpc/local to create a self-contained compiler image; has critical issue with build-time file access
.ci/matrix_job.yaml 4/5 Updates compiler step to use new compiler container instead of toolbox; adds Docker definition with proper tag and build args; includes minor whitespace fixes

Sequence Diagram

sequenceDiagram
    participant CI as CI Pipeline
    participant BuildHost as Build Host
    participant Registry as Harbor Registry
    participant CompilerImg as Compiler Container
    participant NFS as NFS Storage

    Note over CI,NFS: Image Build Phase (docker build)
    CI->>BuildHost: Trigger docker build
    BuildHost->>NFS: COPY /hpc/local/etc/modulefiles
    NFS-->>BuildHost: Return modulefiles
    BuildHost->>NFS: COPY /hpc/local/oss/* (compilers)
    NFS-->>BuildHost: Return compiler binaries
    BuildHost->>NFS: COPY /hpc/local/commercial/* (Intel)
    NFS-->>BuildHost: Return Intel compilers
    BuildHost->>BuildHost: Build image with local copies
    BuildHost->>Registry: Push compiler:20251106
    
    Note over CI,NFS: Runtime Phase (compiler tests)
    CI->>Registry: Pull compiler:20251106
    Registry-->>CI: Return image
    CI->>CompilerImg: Start container
    CompilerImg->>CompilerImg: module load (uses local /hpc/local)
    CompilerImg->>CompilerImg: Run compiler tests
    CompilerImg-->>CI: Test results
Loading

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR optimizes the compiler CI step by baking compilers into a Docker image instead of mounting them from NFS at runtime. A new multi-stage Dockerfile copies compilers from /hpc/local during image build and removes runtime NFS volume mounts.

Key Changes:

  • Created .ci/dockerfiles/Dockerfile.rhel8.6.compiler with multi-stage build that copies 6 compilers (clang-9.0.1, gcc-8.3.0/9.3.0/10.1.0, intel-18.0.4/19.1.1) and their modulefiles
  • Modified .ci/matrix_job.yaml to use new compiler image and removed 4 NFS volume mounts for compiler-related paths
  • Compiler step now uses dedicated image instead of generic toolbox image

Critical Issues Identified:

  1. Build-time NFS dependency: The Dockerfile's COPY instructions require /hpc/local to be available during docker build via --volume mount. This still accesses NFS during build, just shifts it from runtime to build-time.
  2. Modulefile path issues: Modulefiles typically contain hardcoded absolute paths to compiler binaries. Simply copying modulefiles won't work if they still reference original /hpc/local/... paths (which coincidentally may work here since files are copied to the same /hpc/local location).
  3. Compiler list synchronization: The compiler list in Dockerfile must stay in sync with contrib/jenkins_tests/compiler.sh:13 or tests will fail.

Runtime Optimization: If the issues above are resolved, this should successfully eliminate runtime NFS access for the compiler step, potentially reducing the 20-40 minute duration significantly.

Confidence Score: 2/5

  • This PR has high risk due to unresolved issues with build-time NFS access and potential modulefile path problems that may prevent compilers from working
  • Score of 2/5 reflects multiple critical concerns: (1) previous comments identified that COPY during docker build still requires NFS access, potentially defeating the optimization goal; (2) modulefiles may contain hardcoded paths that won't work after copying (though copying to same /hpc/local location may mitigate this); (3) build will fail if /hpc/local isn't available on build host; (4) compiler list synchronization risk. The approach is sound in principle but execution has fundamental issues that need resolution before merging.
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler requires verification that: (1) build host has /hpc/local available locally (not via NFS), (2) modulefiles work after copying, (3) all compiler binaries are self-contained

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 2/5 New multi-stage Dockerfile that copies compilers from /hpc/local into image to eliminate runtime NFS access. Critical issues: relies on build-time NFS access, modulefiles may contain hardcoded paths that break after copying.
.ci/matrix_job.yaml 4/5 Removes compiler-related NFS volume mounts and switches compiler step to use new dedicated compiler container image. Changes are consistent with optimization goal, minor whitespace fixes included.

Sequence Diagram

sequenceDiagram
    participant BuildHost as Build Host
    participant NFS as NFS (/hpc/local)
    participant Builder as Builder Stage
    participant CompilerImg as Compiler Image
    participant Runtime as Runtime Container
    
    Note over BuildHost,NFS: Image Build Phase (One-time)
    BuildHost->>NFS: Mount /hpc/local:ro via --volume
    BuildHost->>Builder: FROM rhel8.6/builder:inbox
    Builder->>NFS: cp -r compilers & modulefiles
    Builder->>Builder: Store in /tmp/compilers
    BuildHost->>CompilerImg: COPY --from=builder /tmp/compilers
    CompilerImg->>CompilerImg: Place in /hpc/local
    BuildHost->>CompilerImg: Push to Harbor registry
    
    Note over Runtime,CompilerImg: Compiler Step Runtime (Every PR)
    Runtime->>CompilerImg: Pull pre-built image
    Runtime->>Runtime: Compilers already in /hpc/local
    Runtime->>Runtime: module load compiler/version
    Runtime->>Runtime: Run compilation tests
    
    Note over Runtime: No NFS mount needed at runtime!
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR introduces a multi-stage Dockerfile to optimize the compiler CI step by baking compiler installations into a container image, reducing runtime NFS access overhead.

Key Changes:

  • Created Dockerfile.rhel8.6.compiler that copies compilers from /hpc/local during build (with NFS mounted read-only via --volume flag)
  • Updated .ci/matrix_job.yaml to use new compiler container instead of toolbox for compiler tests
  • Approach uses multi-stage build: Stage 1 copies from mounted NFS, Stage 2 contains baked-in compilers

Critical Issues Found:

  • Previous reviewers noted that while this reduces runtime NFS access, the build phase still requires NFS access (via --volume /hpc/local:/hpc/local:ro), which may not fully address the 20-40 minute bottleneck if build hosts also face NFS latency
  • The compiler list in the Dockerfile doesn't perfectly align with contrib/jenkins_tests/compiler.sh:13, which could cause module load failures
  • Module files typically reference absolute paths to compiler binaries, so even with files copied, modules may still attempt NFS access if paths aren't rewritten

Trade-offs:

  • One-time build cost with NFS access vs. repeated runtime NFS access across multiple CI runs
  • Image becomes large (contains full compiler installations) but enables faster repeated tests

Confidence Score: 2/5

  • This PR has fundamental design issues that may not achieve the stated optimization goal
  • The PR aims to eliminate NFS access bottlenecks, but critical concerns remain: (1) build phase still requires NFS access, potentially not solving the root performance issue, (2) compiler list mismatches may cause runtime failures, (3) module files typically contain hardcoded paths that may still reference NFS locations even after files are copied. These issues suggest the optimization may not work as intended or could introduce test failures.
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler requires careful attention - verify that copied compilers actually work without NFS access at runtime, and ensure compiler list matches contrib/jenkins_tests/compiler.sh:13

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 2/5 Multi-stage Dockerfile to cache compilers locally - has critical issues with NFS access during build and runtime module loading
.ci/matrix_job.yaml 4/5 Configuration changes to use new compiler image - correctly configured with volume mount and container selector update

Sequence Diagram

sequenceDiagram
    participant CI as CI System
    participant BuildHost as Build Host
    participant NFS as NFS Storage
    participant Registry as Harbor Registry
    participant Runner as CI Runner
    
    Note over CI,Runner: Image Build Phase
    CI->>BuildHost: Trigger docker build
    BuildHost->>BuildHost: Mount /hpc/local:ro
    BuildHost->>NFS: Access /hpc/local during build (COPY)
    Note over BuildHost,NFS: --volume flag provides read-only NFS access
    BuildHost->>BuildHost: Stage 1: Copy compilers to /tmp
    BuildHost->>BuildHost: Stage 2: COPY from builder stage
    BuildHost->>Registry: Push image with baked-in compilers
    
    Note over CI,Runner: Compiler Test Phase
    CI->>Runner: Start compiler container
    Runner->>Registry: Pull compiler image
    Note over Runner: Compilers now in /hpc/local inside container
    Runner->>Runner: Load module (e.g., dev/gcc-9.3.0)
    Note over Runner: Module files reference /hpc/local paths
    Runner->>Runner: Execute compile tests
    Runner->>CI: Report results
Loading

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR introduces a new Docker image to optimize the compiler CI step by copying compiler binaries locally to avoid NFS access during test execution.

Changes:

  • Added Dockerfile.rhel8.6.compiler using multi-stage build to copy compiler binaries and modulefiles from /hpc/local into the image
  • Updated matrix_job.yaml to use the new compiler image instead of toolbox for the Compiler step
  • Minor whitespace cleanup in DOCA installation steps

Critical Issues:
Previous comments identified several blocking issues that remain unaddressed:

  • The docker build still accesses NFS via the --volume /hpc/local:/hpc/local:ro mount during build time, defeating the optimization goal
  • Modulefiles alone won't work - they reference binaries on NFS paths that must also be copied
  • Compiler binary directory name mismatch: copying parallel_studio_xe_2020_update1 but modulefiles expect ics-19.1.1/ics-18.0.4

Confidence Score: 1/5

  • This PR likely won't achieve its optimization goal and may break compiler tests
  • Multiple critical issues remain from previous review: NFS still accessed during build (defeating optimization), and compiler path mismatches will likely cause module load failures at runtime
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler requires significant rework to address NFS access and path mismatch issues

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 2/5 New Dockerfile to copy compilers locally; has critical issues with NFS access during build and compiler binary/modulefile path mismatches
.ci/matrix_job.yaml 4/5 Updates compiler step to use new compiler image; minor whitespace cleanup on DOCA install lines

Sequence Diagram

sequenceDiagram
    participant CI as CI Pipeline
    participant BuildHost as Build Host
    participant NFS as NFS /hpc/local
    participant Builder as Builder Stage
    participant Final as Final Image
    participant CompilerStep as Compiler Test Step
    
    Note over CI,CompilerStep: Build Phase (docker build)
    CI->>BuildHost: docker build with --volume /hpc/local:/hpc/local:ro
    BuildHost->>NFS: Mount /hpc/local (read-only)
    BuildHost->>Builder: Start builder stage
    Builder->>NFS: Access /hpc/local paths during COPY/RUN
    Note over Builder,NFS: Issue: Still accessing NFS during build
    Builder->>Builder: Copy compilers to /tmp/compilers
    Builder->>Final: Multi-stage copy to final image
    Final->>Final: COPY --from=builder /tmp/compilers/ to /hpc/local/
    CI->>CI: Push image to harbor
    
    Note over CI,CompilerStep: Runtime Phase (tests)
    CI->>CompilerStep: Start compiler container
    CompilerStep->>CompilerStep: Access /hpc/local (now local, no NFS)
    CompilerStep->>CompilerStep: module load compiler modules
    CompilerStep->>CompilerStep: Run compiler tests
Loading

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@ntsemah ntsemah changed the title [CI] issue: HPCINFRA-4016 Optimize compiler step [skip CI] issue: HPCINFRA-4016 Optimize compiler step Nov 9, 2025
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR introduces a new Docker-based approach to optimize the compiler CI step by copying compilers locally instead of accessing them via NFS at runtime. A multi-stage Dockerfile copies compilers and modulefiles from /hpc/local during the build stage (with NFS mounted), then packages them into the final image.

Key changes:

  • Added .ci/dockerfiles/Dockerfile.rhel8.6.compiler with multi-stage build to copy 4 GCC versions and 1 Clang version locally
  • Updated .ci/matrix_job.yaml to use new compiler image instead of toolbox for compiler testing
  • Intel compilers (ics-18.0.4, ics-19.1.1) modulefiles are copied but the actual binaries are commented out

Issues found:

  • Intel compiler binaries are not copied (lines 16-17 commented), but modulefiles are copied (lines 22-23). This will cause module load intel/ics-* to fail when the test tries to use them
  • The build_args include --volume /hpc/local:/hpc/local:ro, which means the build still accesses NFS (not just runtime). The optimization only helps runtime, not the build itself
  • Minor whitespace-only changes to unrelated DOCA installation lines

Confidence Score: 2/5

  • Intel compiler tests will fail due to missing binaries; build still accesses NFS defeating partial optimization goal
  • The PR attempts to optimize compiler step by copying compilers locally, but has critical issues: (1) Intel compiler binaries are commented out while modulefiles remain, causing inevitable test failures when compiler.sh:13 tries to load intel/ics-18.0.4 and intel/ics-19.1.1, and (2) the build still accesses NFS via mounted volume, so only runtime is optimized not the build itself. Previous comments correctly identified these issues.
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler needs Intel compiler binaries uncommented or modulefiles removed to match actual copied content

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 2/5 New multi-stage Dockerfile to copy compilers locally, but missing Intel compilers and has potential NFS timing issues
.ci/matrix_job.yaml 3/5 Updated compiler step to use new Docker image, includes minor whitespace fixes

Sequence Diagram

sequenceDiagram
    participant CI as CI System
    participant NFS as NFS Storage
    participant Builder as Builder Stage
    participant Final as Final Image
    participant Runtime as Runtime Container

    CI->>Builder: Start docker build with NFS mounted
    Builder->>NFS: Access /hpc/local during build
    Builder->>Builder: Copy compilers to /tmp/compilers
    Builder->>Builder: Copy modulefiles to /tmp/compilers
    Builder->>Final: COPY --from=builder /tmp/compilers to /hpc/local
    CI->>Runtime: Run compiler tests with final image
    Runtime->>Runtime: module load reads local /hpc/local
    Runtime->>Runtime: Compiler binaries execute from local copy
Loading

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds a specialized Docker image for the compiler step to reduce the 20-40 minute runtime by copying compilers locally instead of accessing them via NFS.

Key changes:

  • Created Dockerfile.rhel8.6.compiler with multi-stage build that copies GCC/Clang compilers and modulefiles from /hpc/local
  • Removed NFS volume mounts (/hpc/local/bin, /hpc/local/oss, /hpc/local/commercial, /hpc/local/etc/modulefiles) from matrix_job.yaml
  • Updated Compiler step to use new compiler container instead of toolbox

Critical issues identified:

  • Intel compiler binaries (lines 16-17 in Dockerfile) are commented out, but Intel modulefiles (lines 22-23) are copied. When compiler.sh:13 attempts module load intel/ics-18.0.4 or intel/ics-19.1.1, these will fail because the binaries don't exist
  • The build_args: '--volume /hpc/local:/hpc/local:ro' means Docker build still accesses NFS during image creation, which defeats the stated optimization goal of avoiding NFS to reduce runtime
  • The optimization approach needs reconsideration: either uncomment Intel binary copies, use cached/pre-built images, or accept that Intel compilers won't be tested

Confidence Score: 1/5

  • This PR will likely cause CI failures due to missing Intel compiler binaries and won't achieve the stated performance optimization
  • Critical implementation flaws: Intel compiler modulefiles are copied but binaries are commented out, causing guaranteed module load failures. Additionally, the --volume flag in build_args means the build still accesses NFS at build time, completely defeating the optimization goal of reducing the 20-40 minute runtime
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler requires uncommenting Intel binary copy lines (16-17) or removing Intel modulefiles (22-23). .ci/matrix_job.yaml needs build strategy revision to avoid NFS access during docker build

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 1/5 New Dockerfile to copy compilers locally, but Intel binaries are commented out while modulefiles are copied, causing module load failures
.ci/matrix_job.yaml 2/5 Removes NFS volume mounts and adds compiler docker config, but --volume in build_args still accesses NFS at build time

Sequence Diagram

sequenceDiagram
    participant CI as CI System
    participant Docker as Docker Build
    participant NFS as NFS Storage
    participant Builder as Builder Stage
    participant Final as Final Image
    participant Runtime as Runtime Container

    Note over CI,Runtime: Current Implementation (with issues)
    
    CI->>Docker: docker build with --volume /hpc/local:/hpc/local:ro
    Docker->>NFS: Access /hpc/local during build
    Note over Docker,NFS: Build still accesses NFS<br/>(defeats optimization goal)
    
    Docker->>Builder: FROM rhel8.6/builder:inbox
    Builder->>NFS: cp -r compilers from /hpc/local to /tmp/compilers
    Note over Builder: Copies GCC & Clang binaries + modulefiles<br/>Intel modulefiles copied but binaries commented out
    
    Builder->>Final: COPY --from=builder /tmp/compilers/ /hpc/local/
    Note over Final: Compilers embedded in image
    
    CI->>Runtime: Run compiler step with compiler image
    Runtime->>Runtime: module load intel/ics-18.0.4
    Runtime-->>Runtime: FAILS - Intel binaries missing
    Runtime->>Runtime: module load dev/gcc-8.3.0
    Runtime->>Runtime: Success - GCC binaries present
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@ntsemah ntsemah changed the title [skip CI] issue: HPCINFRA-4016 Optimize compiler step [CI] issue: HPCINFRA-4016 Optimize compiler step Nov 10, 2025
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR introduces a new Dockerfile that attempts to optimize the compiler step by copying compiler binaries and modulefiles locally during image build, then removes NFS volume mounts from the CI configuration.

Key changes:

  • Created .ci/dockerfiles/Dockerfile.rhel8.6.compiler using multi-stage build to copy GCC/Clang/Intel compilers from /hpc/local to the image
  • Updated .ci/matrix_job.yaml to use new compiler container and removed /hpc/local/bin, /hpc/local/oss, /hpc/local/commercial, and /hpc/local/etc/modulefiles volume mounts
  • Changed compiler step to use new specialized container instead of generic toolbox container

Critical issue: Intel compiler binaries (lines 16-17 in Dockerfile) are commented out but Intel modulefiles (lines 22-23) are still copied. When compiler.sh:13 attempts to load intel/ics-18.0.4 and intel/ics-19.1.1 modules, they will fail because the binaries don't exist in the image, causing the compiler tests to fail.

Confidence Score: 1/5

  • This PR will fail when the compiler step runs due to missing Intel compiler binaries
  • The Dockerfile copies Intel modulefiles but not the actual Intel compiler binaries (lines 16-17 are commented out). When the compiler test suite runs, it will attempt to load intel/ics-18.0.4 and intel/ics-19.1.1 modules per compiler.sh:13, and these will fail because the binaries referenced by those modulefiles don't exist in the container. This is a critical logic error that will prevent the compiler step from completing successfully.
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler requires immediate attention - Intel compiler lines must be uncommented or Intel modulefiles removed

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 1/5 New multi-stage Dockerfile to optimize compiler step by copying compilers locally, but Intel compilers are commented out causing module load failures
.ci/matrix_job.yaml 3/5 Removed NFS volume mounts and added new compiler container configuration, minor whitespace fixes

Sequence Diagram

sequenceDiagram
    participant CI as CI Pipeline
    participant Docker as Docker Build
    participant NFS as NFS (/hpc/local)
    participant Builder as Builder Stage
    participant Final as Final Image
    participant Compiler as Compiler Test

    Note over CI,Compiler: Build Phase (with --volume /hpc/local:/hpc/local:ro)
    CI->>Docker: Build Dockerfile.rhel8.6.compiler
    Docker->>NFS: Mount /hpc/local (read-only)
    Docker->>Builder: Start builder stage
    Builder->>NFS: Copy GCC/Clang binaries to /tmp/compilers
    Builder->>NFS: Copy modulefiles to /tmp/compilers
    Note over Builder,NFS: Intel binaries (lines 16-17) COMMENTED OUT
    Builder->>NFS: Copy Intel modulefiles anyway
    Docker->>Final: Start final stage
    Docker->>Final: COPY /tmp/compilers to /hpc/local
    
    Note over CI,Compiler: Runtime Phase (no NFS volumes)
    CI->>Compiler: Run compiler step in new container
    Compiler->>Compiler: module load intel/ics-18.0.4
    Note over Compiler: FAILS: modulefiles reference<br/>binaries that don't exist
    Compiler-->>CI: Test failure
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@ntsemah ntsemah changed the title [CI] issue: HPCINFRA-4016 Optimize compiler step [skip CI] issue: HPCINFRA-4016 Optimize compiler step Nov 10, 2025
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR attempts to optimize the compiler CI step (currently 20-40 mins) by creating a Docker image that pre-copies compilers from NFS into the image, avoiding repeated NFS access during test runs.

Key changes:

  • New Dockerfile.rhel8.6.compiler that copies GCC and Clang compilers + modulefiles from /hpc/local using multi-stage build
  • Updated .ci/matrix_job.yaml to use new compiler container instead of toolbox
  • Added debug output in compiler.sh to verify mounts

Critical issues preventing optimization goal:

  • The --volume /hpc/local:/hpc/local:ro flag in build_args (.ci/matrix_job.yaml:96) doesn't work with docker build - volumes can't be mounted during build, only at runtime
  • Even if the volume were available, the Dockerfile's RUN cp commands (.ci/dockerfiles/Dockerfile.rhel8.6.compiler:12-23) still access NFS during the build itself, meaning the build will take 20-40 mins
  • Intel compiler binaries are commented out (lines 16-17) but their modulefiles are copied (lines 22-23), causing module load intel/* to fail when tests run
  • The optimization strategy fundamentally won't work as designed - NFS is still accessed during build time, not avoided

Confidence Score: 1/5

  • This PR has critical design flaws that will prevent it from achieving its optimization goal and will likely cause test failures
  • Score of 1 due to fundamental implementation issues: (1) the --volume flag doesn't work in docker build context, (2) NFS is still accessed during build defeating the 20-40min optimization goal, (3) Intel compiler binaries are missing while modulefiles are included causing module load failures. The approach needs architectural redesign to actually avoid NFS during builds.
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler requires complete redesign - current approach accesses NFS during build. .ci/matrix_job.yaml needs build_args corrected.

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 1/5 New Dockerfile to optimize compiler step by copying compilers from NFS - has critical issues: Intel binaries commented out but modulefiles included, --volume flag in build_args won't work at build time, still accesses NFS defeating optimization goal
.ci/matrix_job.yaml 2/5 Configures new compiler container with --volume flag in build_args which won't mount volumes during docker build, only at runtime; container switched from toolbox to compiler

Sequence Diagram

sequenceDiagram
    participant CI as CI System
    participant Docker as Docker Build
    participant NFS as NFS Storage
    participant Builder as Builder Stage
    participant Final as Final Image
    participant Test as Compiler Test
    
    Note over CI,NFS: Build Phase (with --volume flag)
    CI->>Docker: docker build --volume /hpc/local:/hpc/local:ro
    Docker->>NFS: Mount /hpc/local (at build time)
    Note over Docker,NFS: ⚠️ --volume doesn't work in build context
    Docker->>Builder: Start builder stage
    Builder->>NFS: RUN cp -r /hpc/local/oss/* /tmp/compilers/
    Builder->>NFS: RUN cp -r /hpc/local/etc/modulefiles/* /tmp/compilers/
    Note over Builder,NFS: Still accessing NFS during build
    Builder->>Builder: Store compilers in /tmp/compilers
    Docker->>Final: Start final stage
    Docker->>Final: COPY --from=builder /tmp/compilers/ /hpc/local/
    Docker->>CI: Return built image
    
    Note over CI,Test: Runtime Phase
    CI->>Test: Run compiler.sh in container
    Test->>Test: module load intel/ics-18.0.4
    Note over Test: ⚠️ Intel binaries missing (commented out)
    Test->>Test: Compilation fails
Loading

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds a new Dockerfile to optimize the CI compiler step by copying compiler binaries into a Docker image, avoiding repeated NFS access during test runs. The approach uses a multi-stage Docker build with bind mounts to copy compilers from /hpc/local once during image build, then reuses the image for tests.

Key changes:

  • Created Dockerfile.rhel8.6.compiler that copies GCC 8.3.0, 9.3.0, 10.1.0 and Clang 9.0.1 compilers along with their modulefiles
  • Modified .ci/matrix_job.yaml to build this new image and use it for the compiler test step instead of the generic toolbox image
  • Added debug output to compiler.sh to verify mount status

Critical issue:

  • Intel compiler binaries (lines 17-18) are commented out, but their modulefiles (lines 23-24) are still copied. When compiler.sh:16 attempts to load intel/ics-18.0.4 and intel/ics-19.1.1, the module load will fail because the binaries don't exist, causing 2 out of 6 compiler tests to fail.

Confidence Score: 1/5

  • This PR will cause compiler tests to fail due to missing Intel compiler binaries
  • Score reflects a critical logic error where Intel compiler modulefiles are copied but the actual compiler binaries are commented out (lines 17-18 of Dockerfile). The compiler.sh script expects to test 6 compilers including intel/ics-18.0.4 and intel/ics-19.1.1, but these will fail with module load errors. This defeats the purpose of the optimization since the CI step will fail.
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler requires immediate attention - uncomment lines 17-18 to copy Intel binaries, or remove lines 23-24 if Intel compilers are intentionally excluded

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 1/5 New Dockerfile to optimize compiler step by copying compilers into image. Critical issues: Intel compiler binaries commented out (lines 17-18) but modulefiles copied (lines 23-24), causing module load failures for intel/ics-18.0.4 and intel/ics-19.1.1.
.ci/matrix_job.yaml 3/5 Added new compiler docker config and switched compiler step from toolbox to new compiler image. Minor whitespace cleanup on unrelated lines.
contrib/jenkins_tests/compiler.sh 5/5 Added debug line to verify NFS mount status. Non-functional change for troubleshooting.

Sequence Diagram

sequenceDiagram
    participant CI as CI System
    participant Build as Docker Build Host
    participant NFS as NFS (/hpc/local)
    participant Builder as Builder Stage
    participant Final as Final Image
    participant Test as Compiler Test

    CI->>Build: Trigger docker build
    Note over Build: DOCKER_BUILDKIT=1
    Build->>Builder: Create builder stage
    Builder->>NFS: Mount /hpc/local (readonly bind)
    Note over Builder,NFS: One-time NFS access during build
    NFS-->>Builder: Provide compiler files
    Builder->>Builder: Copy GCC compilers to /tmp/compilers
    Builder->>Builder: Copy Clang compilers to /tmp/compilers
    Note over Builder: Intel binaries SKIPPED (commented)
    Builder->>Builder: Copy all modulefiles to /tmp/compilers
    Build->>Final: Create final stage
    Build->>Final: COPY /tmp/compilers from builder
    Note over Final: Compilers now in image at /hpc/local
    CI->>Test: Run compiler.sh in final image
    Test->>Test: module load dev/clang-9.0.1 ✓
    Test->>Test: module load dev/gcc-8.3.0 ✓
    Test->>Test: module load intel/ics-18.0.4 ✗
    Note over Test: FAILS: binaries missing
    Test->>Test: module load intel/ics-19.1.1 ✗
    Note over Test: FAILS: binaries missing
Loading

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR optimizes the CI compiler step by creating a pre-built Docker image with compilers baked in, avoiding NFS mount access during test runs. The approach uses a multi-stage Dockerfile to copy compilers from /hpc/local/* into the image.

Key changes:

  • New Dockerfile.rhel8.6.compiler that copies GCC (8.3.0, 9.3.0, 10.1.0), Clang (9.0.1), and Intel compilers into the image
  • Updated matrix_job.yaml to use pre-built compiler:20251106 image and removed NFS volume mounts for compilers
  • Modified compiler.sh to remove Intel compilers from the active test list (only GCC and Clang are tested)
  • Deleted generated json_config.h file (unrelated cleanup)

Issues identified in previous comments:

  • Intel compiler binaries are copied in Dockerfile but not tested (compiler.sh excludes them)
  • NFS access still occurs during Docker image build time, though runtime access is eliminated
  • The optimization goal is achieved for CI runtime, but image building still requires NFS

Confidence Score: 3/5

  • This PR is safe to merge for CI optimization, but contains unused Intel compiler setup and minor unrelated changes
  • The core functionality works correctly - the pre-built image eliminates NFS access during CI runtime, achieving the optimization goal. Intel compilers are included in the Dockerfile but properly excluded from tests. Previous comments identified that NFS is still accessed during Docker build time (not runtime), which is acceptable since the image is pre-built. The unrelated json_config.h deletion is minor. Score reflects functional correctness with some inefficiency (unused Intel compilers in image)
  • .ci/dockerfiles/Dockerfile.rhel8.6.compiler has Intel compiler setup that isn't used, and previous comments identified concerns about NFS access during build time

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 3/5 New Dockerfile to pre-copy compilers into image; Intel compilers included but unused; previous comments identified NFS access still occurs during build
.ci/matrix_job.yaml 4/5 Switches compiler step to pre-built compiler image; removes NFS volume mounts for compilers; minor whitespace cleanup
contrib/jenkins_tests/compiler.sh 5/5 Removes Intel compilers from test list; matches compilers available in Dockerfile (excluding Intel)

Sequence Diagram

sequenceDiagram
    participant CI as CI Pipeline
    participant Harbor as Harbor Registry
    participant Docker as Compiler Container
    participant NFS as NFS Storage
    participant Build as Build Process
    
    Note over Harbor: Pre-built compiler:20251106<br/>image available
    
    CI->>Harbor: Pull compiler:20251106 image
    Harbor->>CI: Return image with pre-copied compilers
    
    CI->>Docker: Start compiler container
    Note over Docker: Compilers at /hpc/local/<br/>(gcc, clang, intel)
    
    Docker->>Build: Run compiler.sh
    Note over Build: Test only gcc & clang<br/>(intel commented out)
    
    loop For each compiler
        Build->>Docker: module load compiler
        Docker->>Build: Compiler loaded from local path
        Build->>Build: configure && make
    end
    
    Build->>CI: Return test results
    
    Note over NFS: NFS not accessed during runtime<br/>(optimization achieved)
Loading

Additional Comments (1)

  1. third_party/json-c/json_config.h

    style: Unrelated change - this file deletion should be in a separate commit or explained in the PR description

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR introduces a compiler optimization for CI by creating a pre-built Docker image containing all necessary compilers and modulefiles. The multi-stage Dockerfile copies compilers from NFS during the one-time build phase, then the resulting image is used for repeated CI runs without requiring NFS access.

Key changes:

  • New Dockerfile.rhel8.6.compiler with multi-stage build that packages 6 compilers (clang-9.0.1, gcc-8.3.0/9.3.0/10.1.0, intel ics-18.0.4/19.1.1) along with their modulefiles
  • Commented out runtime NFS volume mounts for /hpc/local/oss, /hpc/local/commercial, and /hpc/local/etc/modulefiles in matrix_job.yaml
  • Compiler step now uses pre-built compiler:20251106 image instead of toolbox image

Previous issues addressed:

  • Intel compiler binaries are now included (lines 16-17 uncommented)
  • All compilers from contrib/jenkins_tests/compiler.sh:13 are present
  • The approach correctly uses NFS only during image build, not during CI runtime

Architecture:
The optimization works by front-loading NFS access into the image build phase. During CI runs (20-40 min), containers pull the pre-built image from Harbor registry and access compilers locally, avoiding slow NFS operations.

Confidence Score: 4/5

  • This PR is safe to merge with low risk - it's a CI optimization that doesn't affect production code
  • Score of 4 reflects solid implementation with one minor concern: the Dockerfile build still requires NFS access during image creation, so build hosts must have /hpc/local available. However, this achieves the stated goal of optimizing the 20-40 min compiler step by avoiding runtime NFS access. All required compilers are properly included, volume mounts correctly removed, and the multi-stage approach is sound.
  • No files require special attention - both files implement straightforward CI infrastructure changes

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 4/5 New multi-stage Dockerfile that copies compiler binaries and modulefiles from NFS during build, enabling faster CI runs without runtime NFS access. All required compilers from compiler.sh:13 are now included.
.ci/matrix_job.yaml 5/5 Removes runtime NFS mounts for compilers and modulefiles, switches Compiler step to use pre-built compiler image. Minor whitespace cleanup.

Sequence Diagram

sequenceDiagram
    participant Build as Docker Build<br/>(one-time, with NFS)
    participant Registry as Harbor Registry
    participant CI as CI Pipeline<br/>(repeated runs)
    participant Container as Compiler Container
    
    Note over Build: Image Creation Phase
    Build->>Build: Mount /hpc/local from NFS
    Build->>Build: Copy compilers to /tmp/compilers<br/>(gcc, clang, intel)
    Build->>Build: Copy modulefiles to /tmp/compilers
    Build->>Build: Create final image with compilers<br/>at /hpc/local (no NFS)
    Build->>Registry: Push compiler:20251106
    
    Note over CI,Container: CI Runtime Phase (20-40 min)
    CI->>Registry: Pull compiler:20251106
    Registry->>Container: Download pre-built image
    Note over Container: No NFS mounts needed!<br/>Compilers already in image
    Container->>Container: module load dev/clang-9.0.1
    Container->>Container: module load intel/ics-18.0.4
    Container->>Container: module load dev/gcc-8.3.0
    Container->>Container: Run compiler tests<br/>(fast, local access)
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR attempts to optimize the compiler CI step (currently taking 20-40 minutes) by creating a Docker image with locally copied compilers to avoid NFS access. However, the implementation has fundamental issues that prevent it from achieving this goal:

Critical Issues:

  • The multi-stage build's RUN commands execute during docker build and will still access /hpc/local via NFS, since the --volume flag in build_args (.ci/matrix_job.yaml:97) only applies at container runtime, not build time
  • Commenting out the NFS volume mounts (.ci/matrix_job.yaml:32,35-36) means the copied compilers won't function because modulefiles contain hardcoded paths to original NFS locations
  • The compiler list in the Dockerfile doesn't match contrib/jenkins_tests/compiler.sh:13 (modulefile names like ics-18.0.4 vs directory names like parallel_studio_xe_2018_update4)

What was changed:

  • Added new Dockerfile that uses multi-stage build to copy compilers and modulefiles from /hpc/local
  • Updated matrix_job.yaml to define the new compiler image and switch the Compiler step to use it
  • Commented out three NFS volume mounts that were previously required

The fundamental problem: Docker COPY and RUN commands during docker build execute on the build host filesystem, not inside a running container. If /hpc/local is only available via NFS, the build will still be slow (accessing NFS) or fail entirely (if NFS isn't mounted on the build host).

Confidence Score: 1/5

  • This PR will not achieve its optimization goal and may cause CI failures
  • The implementation has fundamental architectural issues: Docker build commands still access NFS (defeating the optimization), commented-out volume mounts will break compiler functionality, and there are mismatches between compiler names in the Dockerfile vs the test script
  • Both files require significant rework - .ci/dockerfiles/Dockerfile.rhel8.6.compiler needs a different approach (pre-built base image or build-time NFS access), and .ci/matrix_job.yaml volume configuration conflicts with the Dockerfile design

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 1/5 Multi-stage build that copies compilers from NFS during build time - will still access NFS and not achieve the stated optimization goal
.ci/matrix_job.yaml 2/5 Updates compiler step to use new image and comments out NFS volume mounts, but build_args volume flag won't work as intended

Sequence Diagram

sequenceDiagram
    participant CI as CI Pipeline
    participant Builder as Docker Build
    participant NFS as NFS Storage
    participant Runtime as Container Runtime
    
    Note over CI,Runtime: Current Implementation (Will Not Work As Intended)
    
    CI->>Builder: docker build --volume /hpc/local:/hpc/local:ro
    Note over Builder: --volume flag ignored during build
    
    Builder->>NFS: COPY /hpc/local/oss/* (accesses NFS)
    Note over Builder,NFS: Still slow - accessing NFS!
    Builder->>NFS: COPY /hpc/local/commercial/* (accesses NFS)
    Builder->>NFS: COPY /hpc/local/etc/modulefiles/* (accesses NFS)
    
    Builder-->>CI: Image built (compilers copied to /hpc/local)
    
    CI->>Runtime: Start container (no NFS volumes mounted)
    Note over Runtime: Volumes commented out in matrix_job.yaml:32,35-36
    
    Runtime->>Runtime: module load intel/ics-18.0.4
    Note over Runtime: Modulefiles reference /hpc/local paths
    Runtime->>Runtime: Check /hpc/local/commercial/intel/...
    Note over Runtime: Path exists (was copied in image)
    
    Runtime->>Runtime: Execute compiler from modulefile
    Note over Runtime: May fail if modulefile has wrong paths or missing deps
    
    Note over CI,Runtime: Issue: Build time still accesses NFS (slow)
    Note over CI,Runtime: Issue: Runtime may fail due to missing NFS mounts
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR attempts to optimize the compiler CI step (currently 20-40 minutes) by creating a Docker image with compilers copied locally instead of accessing them via NFS. However, the implementation has critical flaws that will prevent it from working.

Key Changes:

  • Added Dockerfile.rhel8.6.compiler with multi-stage build to copy compilers from /hpc/local into the image
  • Modified matrix_job.yaml to define the new compiler docker image and switch the Compiler step to use it

Critical Issues Found:

  • Container won't be selected: Docker category is 'build' but Compiler step selector expects 'tool' (line 336)
  • Optimization won't work: The --volume flag in build_args is for runtime, not build time. The COPY and RUN commands at build time will still access NFS
  • Modulefiles incomplete: Only modulefiles are copied, but they reference compiler binaries that remain on NFS paths. The compilers won't actually work without NFS access
  • Compiler list mismatch: Some Intel compiler references in the Dockerfile don't match what compiler.sh:13 expects

The approach needs fundamental rework - either the base image must include compilers, or a different caching strategy is needed.

Confidence Score: 0/5

  • This PR has critical implementation flaws that will prevent it from functioning - the container won't be selected and the optimization won't work as intended
  • Score of 0 reflects multiple blocking issues: (1) Category mismatch prevents container selection entirely, (2) Volume mount flag applies to runtime not build time so NFS will still be accessed during build, (3) Modulefiles reference compiler binaries that remain on NFS paths defeating the optimization goal. The PR cannot achieve its stated objective without fundamental rework
  • Both files require attention - .ci/matrix_job.yaml has the critical category mismatch bug, and .ci/dockerfiles/Dockerfile.rhel8.6.compiler needs architectural changes to actually avoid NFS access

Important Files Changed

File Analysis

Filename Score Overview
.ci/dockerfiles/Dockerfile.rhel8.6.compiler 1/5 Multi-stage Dockerfile to copy compilers from NFS to image. Critical issues: modulefiles reference binaries still on NFS, defeating optimization goal; Intel compilers included but won't work at runtime
.ci/matrix_job.yaml 0/5 Configures compiler docker with category 'build' but step selector expects 'tool', causing container selection to fail. Volume mount in build_args won't help at image build time

Sequence Diagram

sequenceDiagram
    participant CI as CI Pipeline
    participant Builder as Docker Build
    participant NFS as NFS (/hpc/local)
    participant Image as Compiler Image
    participant Runtime as Compiler Step

    Note over CI,Runtime: Current Implementation (Broken)
    
    CI->>Builder: docker build with --volume /hpc/local (runtime flag)
    Note over Builder: Multi-stage build starts
    Builder->>NFS: COPY compilers during build (still accesses NFS!)
    Note over Builder,NFS: Build time: volume flag ignored
    NFS-->>Builder: Compiler binaries & modulefiles
    Builder->>Image: Create image with /hpc/local/compilers
    
    CI->>Runtime: Start container with category='build'
    Note over Runtime: Compiler step selector: category='tool'
    Runtime--xCI: Container not selected (category mismatch)
    
    Note over CI,Runtime: Additional Issue
    Note over Runtime: Even if selected, modulefiles reference<br/>original NFS paths, defeating optimization
Loading

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. contrib/jenkins_tests/compiler.sh, line 32 (link)

    syntax: single quotes prevent variable expansion - ${WORKSPACE}, $compiler_dir, $cc, $cxx, $jenkins_test_custom_configure, and $make_opt will be literal strings when do_check_result calls eval (globals.sh:193), causing configure to fail

  2. contrib/jenkins_tests/compiler.sh, line 37-39 (link)

    style: make distclean between tests is still present but now unnecessary since each parallel job runs only one compiler test (due to COMPILER_SPEC override). This cleanup was needed when all 6 compilers ran sequentially in the same workspace. Consider removing to reduce runtime

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

@dpressle dpressle self-requested a review November 17, 2025 10:29
@ntsemah ntsemah force-pushed the HPCINFRA-4016 branch 2 times, most recently from b2c7b55 to 110b4e7 Compare November 17, 2025 11:03
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. contrib/jenkins_tests/compiler.sh, line 37-39 (link)

    logic: When parallel execution is enabled (after fixing parallel: false in matrix_job.yaml), all jobs will race on make distclean in the shared ${WORKSPACE}/third_party/json-c directory. One job's cleanup will corrupt another job's build.

2 files reviewed, 10 comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. contrib/jenkins_tests/compiler.sh, line 37-39 (link)

    logic: When parallel execution is enabled in matrix_job.yaml, all 4 parallel steps share WORKSPACE=$PWD (lines 335, 350, 365, 380). Running make distclean in the shared ${WORKSPACE}/third_party/json-c will cause race conditions where one job's cleanup corrupts another job's build.

    Solutions:

    1. Remove this cleanup (json-c configure is idempotent enough), or
    2. Use separate workspace directories per job: WORKSPACE=$PWD/build-clang-9.0.1

2 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

dpressle
dpressle previously approved these changes Nov 18, 2025
The compiler step currently takes a long time to run (about 20–40 minutes).
Split the compiler step into 6 separate steps (one per compiler) running in parallel to reduce build duration.

Signed-off-by: Noam Tsemah <[email protected]>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

@galnoam galnoam merged commit dbc0d03 into Mellanox:vNext Nov 19, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants