Skip to content

Conversation

@NirWolfer
Copy link
Contributor

Description

Today we are running the Valgrind test in R2C mode only, meaning we are not testing the new worker threads mode in our CI process.

What

Add Worker threads mode test to the vg.sh based on WORKER_THREADS env var

Why ?

HPCINFRA-3936

How ?

It is optional but for complex PRs please provide information about the design,
architecture, approach, etc.

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

@NirWolfer NirWolfer force-pushed the vg_worker_threads branch 2 times, most recently from f644db3 to 85ddb4e Compare October 22, 2025 16:16
@NirWolfer NirWolfer marked this pull request as ready for review October 22, 2025 16:19
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds CI coverage for XLIO's worker-threads execution mode in Valgrind testing, addressing a gap where only R2C (run-to-completion) mode was validated. The implementation introduces a WORKER_THREADS environment variable to vg.sh that switches between two test configurations: worker-threads mode (TCP-only,512MB memory, XLIO_WORKER_THREADS=1) and R2C mode (TCP+UDP, 256MB memory). In .ci/matrix_job.yaml, a new Docker container vg-worker-threads and parallel CI step are added to run both modes independently, mirroring the existing dual-mode Test/Test(Worker thread mode) pattern used elsewhere in the pipeline.

Potential Issues

  1. Memory allocation may be excessive for CI containers: The worker-threads mode allocates 512MB (XLIO_MEMORY_LIMIT=512MB) compared to 256MB for R2C mode. While this reflects architectural differences, verify that CI pods have sufficient memory headroom given the 10Gi pod limit specified in matrix_job.yaml.

  2. UDP testing skipped in worker-threads mode: The new mode only tests TCP (test_list="tcp:--tcp"), while R2C mode tests both TCP and UDP. If worker-threads architecture supports UDP, this creates a coverage gap. If UDP is unsupported in worker-threads mode, consider adding a comment explaining why.

  3. Duplicate container definition increases maintenance burden: The vg-worker-threads container (lines 126-140 in matrix_job.yaml) is nearly identical to the existing vg container except for the tag (20251022vs implicit earlier tag). Future Dockerfile changes will need to be validated against both containers. Consider whether a single container with runtime configuration would reduce duplication.

  4. Missing explicit Docker image build for new tag: The vg-worker-threads container references tag 20251022 but there's no indication whether this image has been built and pushed to the registry. CI will fail if this tag doesn't exist. Verify the image exists or update the tag to match the existing vg container.

Confidence Score

4/5 - This is a low-risk, well-structured CI enhancement that follows existing patterns in the codebase. The conditional logic in vg.sh is straightforward, and the matrix duplication mirrors the Test step pattern (lines 382-410). The main concerns are operational (memory sizing, missing UDP tests, Docker image availability) rather than correctness issues. Score reduced by one point due to uncertainty around the Docker image tag and potential UDP coverage gap.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@BasharRadya
Copy link
Collaborator

BasharRadya commented Oct 26, 2025

The *-tcp-output-cl logs do not include XLIO logs, which limits the ability to thoroughly review test results.

Possible Root Cause in CI Code:
The server process uses 2>&1 | tee, which correctly captures both stdout and stderr in the output log.
The client process, however, uses | tee without 2>&1, so it only captures stdout, omitting stderr.

Other than that, logs seem fine.

BasharRadya
BasharRadya previously approved these changes Oct 26, 2025
@BasharRadya
Copy link
Collaborator

My approval is for the test logs only, they look good.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds Valgrind test coverage for the new worker-threads mode to the CI pipeline, addressing a gap where only R2C mode was being tested. The changes introduce a WORKER_THREADS environment variable that parameterizes the test behavior in vg.sh: when true, the script enables worker-threads mode with TCP-only testing, nonblocked mode, 512MB memory limit, and the XLIO_WORKER_THREADS=1 flag; when false (default), it retains the existing R2C mode with TCP+UDP testing and 256MB memory. The CI matrix configuration is updated to run both modes in parallel by duplicating the Valgrind step and introducing a new 'vg-worker-threads' container with identical Kubernetes resource settings (SRIOV networking, 10Gi memory/CPU/hugepages, IPC_LOCK/SYS_RESOURCE capabilities). This change integrates with the existing CI infrastructure (contrib/test_jenkins.sh, matrix_job.yaml) by reusing the same test framework and adding the new container to DOCA installation and artifact collection steps.

Important Files Changed

Filename Score Overview
contrib/jenkins_tests/vg.sh 4/5 Adds conditional logic to switch between R2C and worker-threads modes based on WORKER_THREADS env var; consolidates LD_PRELOAD, adjusts memory limits and test parameters per mode
.ci/matrix_job.yaml 5/5 Adds new 'vg-worker-threads' container definition, duplicates Valgrind step with WORKER_THREADS=true, and changes 'sockperf-worker-threads' to pre-built image URL

Confidence score: 4/5

  • This PR is safe to merge with low risk, as it adds new test coverage without modifying existing production code or breaking backward compatibility.
  • Score reflects the straightforward nature of the changes (parameterized testing, CI configuration updates) and proper isolation of new functionality. One point deducted because the 512MB memory limit for worker-threads mode appears to be estimated rather than empirically validated, which could cause intermittent test failures if worker thread overhead exceeds expectations under certain workloads.
  • Pay close attention to contrib/jenkins_tests/vg.sh to verify the memory limit (512MB) is sufficient for all worker-thread test scenarios, particularly under high concurrency or stress conditions.

Sequence Diagram

sequenceDiagram
    participant User
    participant CI System
    participant Jenkins
    participant Kubernetes Pod
    participant Docker Container
    participant Test Script (vg.sh)
    participant Sockperf Server
    participant Sockperf Client
    participant Valgrind

    User->>CI System: Trigger PR or Schedule
    CI System->>Jenkins: Load matrix_job.yaml
    Jenkins->>Jenkins: Read matrix configuration
    Jenkins->>Kubernetes Pod: Create pod with vg or vg-worker-threads container
    Kubernetes Pod->>Docker Container: Start container with networking
    Docker Container->>Test Script (vg.sh): Execute contrib/test_jenkins.sh
    Test Script (vg.sh)->>Test Script (vg.sh): Configure environment (ulimit, hugepages)
    Test Script (vg.sh)->>Test Script (vg.sh): Build XLIO with Valgrind support
    Test Script (vg.sh)->>Test Script (vg.sh): Detect net1 interface IP
    Test Script (vg.sh)->>Test Script (vg.sh): Set test parameters based on WORKER_THREADS
    
    loop For each test (tcp/udp) and IP (IPv4/IPv6)
        Test Script (vg.sh)->>Valgrind: Start Valgrind with sockperf server
        Valgrind->>Sockperf Server: "LD_PRELOAD=$test_lib sockperf sr"
        Sockperf Server->>Sockperf Server: Wait for warmup stage
        Test Script (vg.sh)->>Valgrind: Start Valgrind with sockperf client
        Valgrind->>Sockperf Client: "LD_PRELOAD=$test_lib sockperf pp"
        Sockperf Client->>Sockperf Server: Send ping-pong traffic
        Sockperf Server->>Sockperf Client: Respond to traffic
        Sockperf Client->>Valgrind: Complete test
        Valgrind->>Test Script (vg.sh): Generate valgrind log
        Test Script (vg.sh)->>Test Script (vg.sh): Kill sockperf processes
        Test Script (vg.sh)->>Test Script (vg.sh): Parse Valgrind results
        Test Script (vg.sh)->>Test Script (vg.sh): Check for memory errors/leaks
    end
    
    Test Script (vg.sh)->>Docker Container: Return exit code
    Docker Container->>Jenkins: Archive artifacts (logs, TAP results)
    Jenkins->>CI System: Report test status
    CI System->>User: Update PR status
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@NirWolfer
Copy link
Contributor Author

bot:retest

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR extends the CI Valgrind test coverage to validate XLIO's worker threads mode alongside the existing R2C (run-to-completion) mode. In vg.sh, the script now branches based on a WORKER_THREADS environment variable: when true it enables worker threads (tests TCP only with 512MB memory limit and non-blocking mode), when false or unset it uses the legacy R2C configuration (tests both TCP and UDP with 256MB limit). The script also increases ulimits for file descriptors and locked memory, adds the --fair-sched=yes Valgrind flag for better thread race detection, and refactors LD_PRELOAD into a consolidated test_lib_env variable. In .ci/matrix_job.yaml, a new vg-worker-threads container definition and corresponding test step are added to execute Valgrind tests with WORKER_THREADS=true, while existing steps now explicitly set WORKER_THREADS=false for clarity. Both test variants use dedicated pre-built Docker images with SR-IOV network attachments and identical resource allocations. This change addresses HPCINFRA-3936 by ensuring both execution models are validated in CI.

Important Files Changed

Filename Score Overview
contrib/jenkins_tests/vg.sh 2/5 Adds conditional logic for worker threads vs R2C mode testing; LD_PRELOAD appears duplicated in test_lib_env and eval commands causing potential conflicts
.ci/matrix_job.yaml 4/5 Introduces vg-worker-threads container and test step; refactors sockperf-worker-threads to use pre-built image; all changes follow existing patterns

Confidence score: 2/5

  • This PR requires careful review due to a critical bug in vg.sh that will likely cause test failures.
  • Score lowered primarily because LD_PRELOAD is now defined in test_lib_env but the old LD_PRELOAD=$test_lib syntax remains in the eval commands on lines 110 and 125 of vg.sh, resulting in duplicate or conflicting LD_PRELOAD definitions. Additionally, the refactoring moves LD_PRELOAD into test_lib_env but this appears incomplete—if the intent was to consolidate environment variables, all uses should be updated consistently.
  • Pay close attention to contrib/jenkins_tests/vg.sh lines 49, 110, and 125 where the LD_PRELOAD duplication occurs; verify this script actually works with the current changes before merging.

Sequence Diagram

sequenceDiagram
    participant CI as CI/CD System
    participant Matrix as Matrix Pipeline
    participant Container as Docker Container
    participant Script as vg.sh Script
    participant System as System Environment
    participant Sockperf as Sockperf Application
    participant Valgrind as Valgrind Tool
    participant Artifacts as Artifact Storage

    CI->>Matrix: Trigger Pipeline (PR/Schedule)
    Matrix->>Matrix: Load matrix_job.yaml configuration
    Matrix->>Matrix: Evaluate ${do_valgrind} condition
    
    alt Valgrind R2C Mode
        Matrix->>Container: Start vg container
        Container->>Script: Execute jenkins_test_vg with WORKER_THREADS=false
    else Valgrind Worker Thread Mode
        Matrix->>Container: Start vg-worker-threads container
        Container->>Script: Execute jenkins_test_vg with WORKER_THREADS=true
    end
    
    Script->>System: Configure ulimit and hugepages
    Script->>System: Create vg_dir directory
    Script->>Script: Configure and build with --with-valgrind
    Script->>System: Detect network interface (net1 or eth)
    Script->>System: Extract IPv4/IPv6 addresses
    
    alt WORKER_THREADS=true
        Script->>Script: Set test_list="tcp:--tcp"
        Script->>Script: Set test_lib_env with XLIO_WORKER_THREADS=1
        Script->>Script: Set test_params="--nonblocked"
    else WORKER_THREADS=false
        Script->>Script: Set test_list="tcp:--tcp udp:"
        Script->>Script: Set test_params=""
    end
    
    Script->>Sockperf: Check for sockperf binary
    
    loop For each IP (IPv4/IPv6)
        loop For each protocol (TCP/UDP)
            Script->>Valgrind: Start valgrind with sockperf server
            Valgrind->>Sockperf: Execute "sockperf sr" with LD_PRELOAD
            Sockperf->>Sockperf: Wait for "Warmup stage" message
            Script->>Valgrind: Start valgrind with sockperf client
            Valgrind->>Sockperf: Execute "sockperf pp" for 10 seconds
            Sockperf-->>Valgrind: Return test results
            Valgrind-->>Script: Generate valgrind logs
            Script->>Script: Kill sockperf processes
            Script->>Artifacts: Archive valgrind and output logs
            Script->>Script: Parse valgrind error summary
            
            alt Errors detected (ret > 0)
                Script->>Script: Write "not ok" to vg.tap
                Script->>Script: Increment nerrors counter
            else No errors
                Script->>Script: Write "ok" to vg.tap
            end
        end
    end
    
    Script->>Script: Calculate total errors (rc + nerrors)
    
    alt Errors found (nerrors > 0)
        Script-->>Container: Exit with error code
        Container->>Matrix: Report failure
        Matrix->>Artifacts: Archive valgrind logs (onfail)
    else No errors
        Script-->>Container: Exit with success
        Container->>Matrix: Report success
    end
    
    Matrix->>Artifacts: Archive all artifacts
    Matrix-->>CI: Report pipeline status
Loading

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR extends the Valgrind CI test suite to support testing XLIO in Worker Threads mode in addition to the existing R2C (Run-to-Completion) mode. The changes introduce a WORKER_THREADS environment variable that controls the test configuration: when enabled, it runs TCP-only tests with non-blocking sockets, 512MB memory limit, and the XLIO_WORKER_THREADS=1 setting; when disabled (default), it maintains the existing R2C behavior with TCP+UDP tests and 256MB memory limit. The implementation consolidates LD_PRELOAD into the test_lib_env variable and adds --fair-sched=yes to Valgrind for better multi-threaded testing. The .ci/matrix_job.yaml file was also updated to enable this new test configuration in the Jenkins pipeline.

Important Files Changed

Filename Score Overview
contrib/jenkins_tests/vg.sh 3/5 Added conditional logic to support Worker Threads mode testing with mode-specific parameters, memory limits, and test configurations
.ci/matrix_job.yaml 4/5 Updated CI matrix configuration to enable Worker Threads mode Valgrind testing in the pipeline

Confidence score: 3/5

  • This PR introduces important testing coverage but has implementation issues that could cause runtime failures.
  • Score reflects a critical LD_PRELOAD duplication bug (previously identified) where the library is loaded twice, potential issues with the string-based boolean check for WORKER_THREADS, and the 2x memory limit increase (512MB vs 256MB) that lacks justification in the PR description. The .ci/matrix_job.yaml changes were not visible in the diff but are referenced in the file list.
  • Pay close attention to contrib/jenkins_tests/vg.sh lines 49-66 (environment variable setup) and lines 110, 125 (where env $test_lib_env is used with commands) to ensure LD_PRELOAD is not duplicated and that the Worker Threads configuration is correct.

Sequence Diagram

sequenceDiagram
    participant User
    participant Jenkins
    participant Matrix Pipeline
    participant Docker Container (vg/vg-worker-threads)
    participant Valgrind Tool
    participant Sockperf Application
    
    User->>Jenkins: "Trigger CI (bot:retest or PR)"
    Jenkins->>Matrix Pipeline: "Load matrix_job.yaml"
    Matrix Pipeline->>Docker Container (vg/vg-worker-threads): "Start container with net1 interface"
    Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Install DOCA"
    Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Run autogen.sh"
    Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Configure with --with-valgrind"
    Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Build and install library"
    Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Check WORKER_THREADS mode"
    
    alt WORKER_THREADS=true
        Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Set XLIO_WORKER_THREADS=1"
        Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Set test_list=tcp only"
    else WORKER_THREADS=false
        Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Set test_list=tcp,udp"
    end
    
    Docker Container (vg/vg-worker-threads)->>Valgrind Tool: "Start Valgrind with libxlio.so"
    Valgrind Tool->>Sockperf Application: "Launch sockperf server (sr)"
    Sockperf Application->>Sockperf Application: "Wait for warmup stage"
    Valgrind Tool->>Sockperf Application: "Launch sockperf client (pp)"
    Sockperf Application->>Sockperf Application: "Run ping-pong test for 10s"
    Sockperf Application->>Valgrind Tool: "Complete test"
    Valgrind Tool->>Docker Container (vg/vg-worker-threads): "Generate valgrind logs"
    Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Check for memory errors"
    Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Archive logs"
    Docker Container (vg/vg-worker-threads)->>Matrix Pipeline: "Return test results (TAP format)"
    Matrix Pipeline->>Jenkins: "Archive artifacts on failure"
    Jenkins->>User: "Report CI status"
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR extends the CI Valgrind test suite to support Worker Threads execution mode alongside the existing R2C (Run-to-Completion) mode. The change adds a WORKER_THREADS environment variable that switches between two test configurations: Worker Threads mode runs TCP-only tests with 512MB memory limit and non-blocking I/O, while R2C mode tests both TCP and UDP with 256MB memory. The implementation refactors contrib/jenkins_tests/vg.sh by consolidating LD_PRELOAD into the test_lib_env variable, adding resource limit adjustments (ulimit), and introducing Valgrind's --fair-sched=yes flag for better thread scheduling. The .ci/matrix_job.yaml configuration is updated to enable this new test mode in the CI pipeline. This addresses a gap where Worker Threads mode was not being validated during continuous integration.

Important Files Changed

Filename Score Overview
contrib/jenkins_tests/vg.sh 2/5 Added Worker Threads mode support with conditional test configuration, but contains a critical LD_PRELOAD duplication issue that needs resolution
.ci/matrix_job.yaml 4/5 Updated CI matrix configuration to enable the new Worker Threads test mode in the pipeline

Confidence score: 2/5

  • This PR introduces a critical issue that will likely cause problems during test execution due to environment variable duplication
  • Score reflects the LD_PRELOAD duplication bug identified in line 49 where it's set in test_lib_env, then implicitly expanded in lines 110 and 125, potentially causing dynamic linker warnings or unpredictable behavior. Additionally, the string comparison pattern [[ "$WORKER_THREADS" = "true" ]] lacks input validation and could behave unexpectedly with case variations or whitespace. The ulimit changes (lines 39-40) are not well-justified and could mask resource leaks in tests.
  • Pay close attention to contrib/jenkins_tests/vg.sh, specifically the LD_PRELOAD handling and the conditional logic for mode selection

Sequence Diagram

sequenceDiagram
    participant CI as "CI Pipeline"
    participant Docker as "Docker Container"
    participant Script as "vg.sh Script"
    participant SockperfSR as "Sockperf Server"
    participant Valgrind as "Valgrind Tool"
    participant SockperfCL as "Sockperf Client"
    participant Logs as "Log Files"

    CI->>Docker: "Trigger Valgrind test step"
    Docker->>Script: "Execute vg.sh with WORKER_THREADS env"
    Script->>Script: "Configure test environment (ulimit, hugepages)"
    Script->>Script: "Build XLIO with valgrind support"
    Script->>Script: "Detect net1 interface IP addresses"
    Script->>Script: "Determine test mode (R2C or Worker Threads)"
    
    alt Worker Threads Mode
        Script->>Script: "Set test_list=tcp only"
        Script->>Script: "Add XLIO_WORKER_THREADS=1 to env"
    else R2C Mode
        Script->>Script: "Set test_list=tcp + udp"
    end
    
    loop For each test (tcp/udp) and IP
        Script->>Valgrind: "Launch valgrind with sockperf server"
        Valgrind->>SockperfSR: "Start server with instrumentation"
        SockperfSR->>Logs: "Log server output"
        Script->>Script: "Wait for server warmup (max 120s)"
        Script->>Valgrind: "Launch valgrind with sockperf client"
        Valgrind->>SockperfCL: "Start ping-pong test with instrumentation"
        SockperfCL->>SockperfSR: "Exchange test traffic"
        SockperfCL->>Logs: "Log client output"
        Script->>SockperfSR: "Kill server processes"
        Script->>Logs: "Archive valgrind and output logs"
        Script->>Logs: "Parse valgrind logs for errors"
        Logs-->>Script: "Return error count"
        Script->>Logs: "Write TAP test result"
    end
    
    Script->>Script: "Calculate total error count"
    
    alt Errors found
        Script->>CI: "Exit with error code"
        CI->>CI: "Archive failure artifacts"
    else No errors
        Script->>CI: "Exit with success"
    end
Loading

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR extends Valgrind CI testing to support Worker Threads mode alongside the existing R2C mode. The changes modify contrib/jenkins_tests/vg.sh to conditionally configure test parameters based on a new WORKER_THREADS environment variable, and update .ci/matrix_job.yaml to run the Valgrind test in both modes. When Worker Threads mode is enabled (WORKER_THREADS=true), the test runs TCP-only with increased memory limits (512MB), sets XLIO_WORKER_THREADS=1, adds --nonblocked to sockperf, and enables Valgrind's --fair-sched=yes option. This change integrates with the existing CI matrix infrastructure documented in .ci/, allowing the Jenkins pipeline to expand testing coverage by simply varying the WORKER_THREADS environment variable across matrix entries.

Important Files Changed

Filename Score Overview
contrib/jenkins_tests/vg.sh 2/5 Adds Worker Threads mode support with conditional test configuration, but introduces a critical LD_PRELOAD duplication bug
.ci/matrix_job.yaml 4/5 Adds two Valgrind test matrix entries to run both R2C and Worker Threads modes

Confidence score: 2/5

  • This PR requires careful review before merging due to a critical bug that will cause test failures
  • Score lowered because of the LD_PRELOAD duplication issue in vg.sh: test_lib_env includes LD_PRELOAD=$test_lib (line 49) but lines 110 and 125 still use bare env $test_lib_env without removing the now-redundant explicit LD_PRELOAD references, resulting in duplicate environment variable definitions
  • Pay close attention to contrib/jenkins_tests/vg.sh lines 110, 125 where the eval commands invoke env $test_lib_env - the explicit LD_PRELOAD=$test_lib portions must be removed since they're already included in test_lib_env

Sequence Diagram

sequenceDiagram
    participant Jenkins
    participant CI Pipeline
    participant Docker Container
    participant Test Environment
    participant Valgrind
    participant Sockperf

    Jenkins->>CI Pipeline: Trigger vg.sh (WORKER_THREADS env var set)
    CI Pipeline->>Docker Container: Launch vg or vg-worker-threads container
    Docker Container->>Test Environment: Setup (ulimit, hugepages, configure)
    Test Environment->>Test Environment: Build and install XLIO with Valgrind
    
    alt Container Environment
        Test Environment->>Test Environment: Get IP from net1 interface
    else Host Environment
        Test Environment->>Test Environment: Get IP from eth interface
    end
    
    alt WORKER_THREADS=true
        Test Environment->>Test Environment: Set test_list="tcp:--tcp", XLIO_WORKER_THREADS=1
    else WORKER_THREADS=false
        Test Environment->>Test Environment: Set test_list="tcp:--tcp udp:", R2C mode
    end
    
    loop For each test (tcp/udp) and IP (IPv4/IPv6)
        Test Environment->>Valgrind: Start Valgrind with sockperf server
        Valgrind->>Sockperf: Run "sockperf sr" with LD_PRELOAD=libxlio.so
        Sockperf-->>Test Environment: Wait for warmup (up to 120s)
        Test Environment->>Valgrind: Start Valgrind with sockperf client
        Valgrind->>Sockperf: Run "sockperf pp" for 10 seconds
        Sockperf-->>Valgrind: Generate latency results
        Valgrind-->>Test Environment: Write valgrind logs
        Test Environment->>Test Environment: Kill sockperf processes
        Test Environment->>Test Environment: Archive logs and analyze errors
        
        alt Valgrind found errors
            Test Environment->>CI Pipeline: Report "not ok" in TAP format
        else No errors found
            Test Environment->>CI Pipeline: Report "ok" in TAP format
        end
    end
    
    Test Environment->>CI Pipeline: Return exit code (0 or error count)
    CI Pipeline->>Jenkins: Archive artifacts and publish results
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

dpressle
dpressle previously approved these changes Oct 28, 2025
@galnoam
Copy link
Collaborator

galnoam commented Oct 30, 2025

@NirWolfer , need to rebase

Today we are running the Valgrind test in R2C mode only, meaning we are
not testing the new worker threads mode in our CI process.

Add Worker threads mode test to the vg.sh based on WORKER_THREADS env
var

Signed-off-by: NirWolfer <[email protected]>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds Valgrind testing support for worker threads mode alongside existing R2C mode testing

Key Changes:

  • New vg-worker-threads container configuration in CI pipeline
  • Separate Valgrind test step with WORKER_THREADS=true environment variable
  • Mode-specific test configuration: worker threads uses TCP-only with --nonblocked flag and 512MB memory, while R2C continues with TCP+UDP and 256MB
  • Adds --fair-sched=yes flag to Valgrind for improved thread scheduling
  • Refactors LD_PRELOAD into test_lib_env variable to eliminate duplication
  • Adds ulimit configuration at script start (50000 file descriptors, unlimited locked memory)
  • Updates sockperf-worker-threads container to use pre-built Harbor image

Confidence Score: 4/5

  • This PR is safe to merge with low risk - adds parallel test coverage without modifying existing tests
  • The changes properly extend CI testing to cover worker threads mode while preserving existing R2C tests. The implementation follows established patterns, uses appropriate mode-specific configurations, and improves code quality by consolidating LD_PRELOAD. Minor confidence reduction due to the test parameter changes that should be verified in actual CI runs
  • No files require special attention - both changes follow CI/test patterns correctly

Important Files Changed

File Analysis

Filename Score Overview
.ci/matrix_job.yaml 5/5 Adds vg-worker-threads container config and new Valgrind test step with WORKER_THREADS=true, updates sockperf-worker-threads to use pre-built image
contrib/jenkins_tests/vg.sh 4/5 Adds WORKER_THREADS env var support with mode-specific config (tcp-only, --nonblocked, 512MB memory for worker mode), moves LD_PRELOAD into test_lib_env, adds ulimit and --fair-sched flag

Sequence Diagram

sequenceDiagram
    participant CI as CI Pipeline
    participant Matrix as matrix_job.yaml
    participant Jenkins as test_jenkins.sh
    participant VG as vg.sh
    participant Valgrind as Valgrind Tool
    participant Sockperf as Sockperf App

    CI->>Matrix: Trigger Valgrind tests
    
    alt R2C Mode (WORKER_THREADS=false)
        Matrix->>Jenkins: Run with vg container
        Jenkins->>VG: Execute with WORKER_THREADS=false
        VG->>VG: Configure R2C mode (TCP+UDP, 256MB)
        VG->>Valgrind: Start server (sr) with R2C config
        Valgrind->>Sockperf: Launch server
        VG->>Valgrind: Start client (pp) with R2C config
        Valgrind->>Sockperf: Launch client
        Sockperf-->>VG: Test results
        VG-->>Jenkins: Report R2C test results
    end
    
    alt Worker Threads Mode (WORKER_THREADS=true)
        Matrix->>Jenkins: Run with vg-worker-threads container
        Jenkins->>VG: Execute with WORKER_THREADS=true
        VG->>VG: Configure Worker mode (TCP only, 512MB, --nonblocked)
        VG->>Valgrind: Start server with Worker config + --fair-sched
        Valgrind->>Sockperf: Launch server with --nonblocked
        VG->>Valgrind: Start client with Worker config + --fair-sched
        Valgrind->>Sockperf: Launch client with --nonblocked
        Sockperf-->>VG: Test results
        VG-->>Jenkins: Report Worker mode test results
    end
    
    Jenkins-->>Matrix: All test results
    Matrix-->>CI: Final status
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@NirWolfer NirWolfer dismissed stale reviews from dpressle and BasharRadya via 6493e0a October 30, 2025 09:34
@NirWolfer NirWolfer requested a review from dpressle October 30, 2025 09:45
@galnoam
Copy link
Collaborator

galnoam commented Oct 30, 2025

@NirWolfer , need approval from reviwer

@dpressle
Copy link
Collaborator

dpressle commented Nov 2, 2025

@galnoam please merge

@galnoam
Copy link
Collaborator

galnoam commented Nov 2, 2025

@NirWolfer, are you signing commits with -s or just writing the signoff in the comment as text?

@dpressle
Copy link
Collaborator

dpressle commented Nov 2, 2025

@NirWolfer, are you signing commits with -s or just writing the signoff in the comment as text?

we use -s as always

@galnoam galnoam merged commit 3835320 into Mellanox:vNext Nov 3, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants