-
Notifications
You must be signed in to change notification settings - Fork 29
[CI] issue: HPCINFRA-3936 Valgrind worker threads mode #490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f644db3 to
85ddb4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds CI coverage for XLIO's worker-threads execution mode in Valgrind testing, addressing a gap where only R2C (run-to-completion) mode was validated. The implementation introduces a WORKER_THREADS environment variable to vg.sh that switches between two test configurations: worker-threads mode (TCP-only,512MB memory, XLIO_WORKER_THREADS=1) and R2C mode (TCP+UDP, 256MB memory). In .ci/matrix_job.yaml, a new Docker container vg-worker-threads and parallel CI step are added to run both modes independently, mirroring the existing dual-mode Test/Test(Worker thread mode) pattern used elsewhere in the pipeline.
Potential Issues
-
Memory allocation may be excessive for CI containers: The worker-threads mode allocates 512MB (
XLIO_MEMORY_LIMIT=512MB) compared to 256MB for R2C mode. While this reflects architectural differences, verify that CI pods have sufficient memory headroom given the 10Gi pod limit specified inmatrix_job.yaml. -
UDP testing skipped in worker-threads mode: The new mode only tests TCP (
test_list="tcp:--tcp"), while R2C mode tests both TCP and UDP. If worker-threads architecture supports UDP, this creates a coverage gap. If UDP is unsupported in worker-threads mode, consider adding a comment explaining why. -
Duplicate container definition increases maintenance burden: The
vg-worker-threadscontainer (lines 126-140 inmatrix_job.yaml) is nearly identical to the existingvgcontainer except for the tag (20251022vs implicit earlier tag). Future Dockerfile changes will need to be validated against both containers. Consider whether a single container with runtime configuration would reduce duplication. -
Missing explicit Docker image build for new tag: The
vg-worker-threadscontainer references tag20251022but there's no indication whether this image has been built and pushed to the registry. CI will fail if this tag doesn't exist. Verify the image exists or update the tag to match the existingvgcontainer.
Confidence Score
4/5 - This is a low-risk, well-structured CI enhancement that follows existing patterns in the codebase. The conditional logic in vg.sh is straightforward, and the matrix duplication mirrors the Test step pattern (lines 382-410). The main concerns are operational (memory sizing, missing UDP tests, Docker image availability) rather than correctness issues. Score reduced by one point due to uncertainty around the Docker image tag and potential UDP coverage gap.
2 files reviewed, no comments
85ddb4e to
97d0067
Compare
|
The *-tcp-output-cl logs do not include XLIO logs, which limits the ability to thoroughly review test results. Possible Root Cause in CI Code: Other than that, logs seem fine. |
|
My approval is for the test logs only, they look good. |
97d0067 to
fa104d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds Valgrind test coverage for the new worker-threads mode to the CI pipeline, addressing a gap where only R2C mode was being tested. The changes introduce a WORKER_THREADS environment variable that parameterizes the test behavior in vg.sh: when true, the script enables worker-threads mode with TCP-only testing, nonblocked mode, 512MB memory limit, and the XLIO_WORKER_THREADS=1 flag; when false (default), it retains the existing R2C mode with TCP+UDP testing and 256MB memory. The CI matrix configuration is updated to run both modes in parallel by duplicating the Valgrind step and introducing a new 'vg-worker-threads' container with identical Kubernetes resource settings (SRIOV networking, 10Gi memory/CPU/hugepages, IPC_LOCK/SYS_RESOURCE capabilities). This change integrates with the existing CI infrastructure (contrib/test_jenkins.sh, matrix_job.yaml) by reusing the same test framework and adding the new container to DOCA installation and artifact collection steps.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| contrib/jenkins_tests/vg.sh | 4/5 | Adds conditional logic to switch between R2C and worker-threads modes based on WORKER_THREADS env var; consolidates LD_PRELOAD, adjusts memory limits and test parameters per mode |
| .ci/matrix_job.yaml | 5/5 | Adds new 'vg-worker-threads' container definition, duplicates Valgrind step with WORKER_THREADS=true, and changes 'sockperf-worker-threads' to pre-built image URL |
Confidence score: 4/5
- This PR is safe to merge with low risk, as it adds new test coverage without modifying existing production code or breaking backward compatibility.
- Score reflects the straightforward nature of the changes (parameterized testing, CI configuration updates) and proper isolation of new functionality. One point deducted because the 512MB memory limit for worker-threads mode appears to be estimated rather than empirically validated, which could cause intermittent test failures if worker thread overhead exceeds expectations under certain workloads.
- Pay close attention to contrib/jenkins_tests/vg.sh to verify the memory limit (512MB) is sufficient for all worker-thread test scenarios, particularly under high concurrency or stress conditions.
Sequence Diagram
sequenceDiagram
participant User
participant CI System
participant Jenkins
participant Kubernetes Pod
participant Docker Container
participant Test Script (vg.sh)
participant Sockperf Server
participant Sockperf Client
participant Valgrind
User->>CI System: Trigger PR or Schedule
CI System->>Jenkins: Load matrix_job.yaml
Jenkins->>Jenkins: Read matrix configuration
Jenkins->>Kubernetes Pod: Create pod with vg or vg-worker-threads container
Kubernetes Pod->>Docker Container: Start container with networking
Docker Container->>Test Script (vg.sh): Execute contrib/test_jenkins.sh
Test Script (vg.sh)->>Test Script (vg.sh): Configure environment (ulimit, hugepages)
Test Script (vg.sh)->>Test Script (vg.sh): Build XLIO with Valgrind support
Test Script (vg.sh)->>Test Script (vg.sh): Detect net1 interface IP
Test Script (vg.sh)->>Test Script (vg.sh): Set test parameters based on WORKER_THREADS
loop For each test (tcp/udp) and IP (IPv4/IPv6)
Test Script (vg.sh)->>Valgrind: Start Valgrind with sockperf server
Valgrind->>Sockperf Server: "LD_PRELOAD=$test_lib sockperf sr"
Sockperf Server->>Sockperf Server: Wait for warmup stage
Test Script (vg.sh)->>Valgrind: Start Valgrind with sockperf client
Valgrind->>Sockperf Client: "LD_PRELOAD=$test_lib sockperf pp"
Sockperf Client->>Sockperf Server: Send ping-pong traffic
Sockperf Server->>Sockperf Client: Respond to traffic
Sockperf Client->>Valgrind: Complete test
Valgrind->>Test Script (vg.sh): Generate valgrind log
Test Script (vg.sh)->>Test Script (vg.sh): Kill sockperf processes
Test Script (vg.sh)->>Test Script (vg.sh): Parse Valgrind results
Test Script (vg.sh)->>Test Script (vg.sh): Check for memory errors/leaks
end
Test Script (vg.sh)->>Docker Container: Return exit code
Docker Container->>Jenkins: Archive artifacts (logs, TAP results)
Jenkins->>CI System: Report test status
CI System->>User: Update PR status
2 files reviewed, no comments
|
bot:retest |
fa104d3 to
f7ec47d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR extends the CI Valgrind test coverage to validate XLIO's worker threads mode alongside the existing R2C (run-to-completion) mode. In vg.sh, the script now branches based on a WORKER_THREADS environment variable: when true it enables worker threads (tests TCP only with 512MB memory limit and non-blocking mode), when false or unset it uses the legacy R2C configuration (tests both TCP and UDP with 256MB limit). The script also increases ulimits for file descriptors and locked memory, adds the --fair-sched=yes Valgrind flag for better thread race detection, and refactors LD_PRELOAD into a consolidated test_lib_env variable. In .ci/matrix_job.yaml, a new vg-worker-threads container definition and corresponding test step are added to execute Valgrind tests with WORKER_THREADS=true, while existing steps now explicitly set WORKER_THREADS=false for clarity. Both test variants use dedicated pre-built Docker images with SR-IOV network attachments and identical resource allocations. This change addresses HPCINFRA-3936 by ensuring both execution models are validated in CI.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| contrib/jenkins_tests/vg.sh | 2/5 | Adds conditional logic for worker threads vs R2C mode testing; LD_PRELOAD appears duplicated in test_lib_env and eval commands causing potential conflicts |
| .ci/matrix_job.yaml | 4/5 | Introduces vg-worker-threads container and test step; refactors sockperf-worker-threads to use pre-built image; all changes follow existing patterns |
Confidence score: 2/5
- This PR requires careful review due to a critical bug in
vg.shthat will likely cause test failures. - Score lowered primarily because
LD_PRELOADis now defined intest_lib_envbut the oldLD_PRELOAD=$test_libsyntax remains in the eval commands on lines 110 and 125 ofvg.sh, resulting in duplicate or conflicting LD_PRELOAD definitions. Additionally, the refactoring movesLD_PRELOADintotest_lib_envbut this appears incomplete—if the intent was to consolidate environment variables, all uses should be updated consistently. - Pay close attention to
contrib/jenkins_tests/vg.shlines 49, 110, and 125 where the LD_PRELOAD duplication occurs; verify this script actually works with the current changes before merging.
Sequence Diagram
sequenceDiagram
participant CI as CI/CD System
participant Matrix as Matrix Pipeline
participant Container as Docker Container
participant Script as vg.sh Script
participant System as System Environment
participant Sockperf as Sockperf Application
participant Valgrind as Valgrind Tool
participant Artifacts as Artifact Storage
CI->>Matrix: Trigger Pipeline (PR/Schedule)
Matrix->>Matrix: Load matrix_job.yaml configuration
Matrix->>Matrix: Evaluate ${do_valgrind} condition
alt Valgrind R2C Mode
Matrix->>Container: Start vg container
Container->>Script: Execute jenkins_test_vg with WORKER_THREADS=false
else Valgrind Worker Thread Mode
Matrix->>Container: Start vg-worker-threads container
Container->>Script: Execute jenkins_test_vg with WORKER_THREADS=true
end
Script->>System: Configure ulimit and hugepages
Script->>System: Create vg_dir directory
Script->>Script: Configure and build with --with-valgrind
Script->>System: Detect network interface (net1 or eth)
Script->>System: Extract IPv4/IPv6 addresses
alt WORKER_THREADS=true
Script->>Script: Set test_list="tcp:--tcp"
Script->>Script: Set test_lib_env with XLIO_WORKER_THREADS=1
Script->>Script: Set test_params="--nonblocked"
else WORKER_THREADS=false
Script->>Script: Set test_list="tcp:--tcp udp:"
Script->>Script: Set test_params=""
end
Script->>Sockperf: Check for sockperf binary
loop For each IP (IPv4/IPv6)
loop For each protocol (TCP/UDP)
Script->>Valgrind: Start valgrind with sockperf server
Valgrind->>Sockperf: Execute "sockperf sr" with LD_PRELOAD
Sockperf->>Sockperf: Wait for "Warmup stage" message
Script->>Valgrind: Start valgrind with sockperf client
Valgrind->>Sockperf: Execute "sockperf pp" for 10 seconds
Sockperf-->>Valgrind: Return test results
Valgrind-->>Script: Generate valgrind logs
Script->>Script: Kill sockperf processes
Script->>Artifacts: Archive valgrind and output logs
Script->>Script: Parse valgrind error summary
alt Errors detected (ret > 0)
Script->>Script: Write "not ok" to vg.tap
Script->>Script: Increment nerrors counter
else No errors
Script->>Script: Write "ok" to vg.tap
end
end
end
Script->>Script: Calculate total errors (rc + nerrors)
alt Errors found (nerrors > 0)
Script-->>Container: Exit with error code
Container->>Matrix: Report failure
Matrix->>Artifacts: Archive valgrind logs (onfail)
else No errors
Script-->>Container: Exit with success
Container->>Matrix: Report success
end
Matrix->>Artifacts: Archive all artifacts
Matrix-->>CI: Report pipeline status
2 files reviewed, 2 comments
f7ec47d to
d955898
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR extends the Valgrind CI test suite to support testing XLIO in Worker Threads mode in addition to the existing R2C (Run-to-Completion) mode. The changes introduce a WORKER_THREADS environment variable that controls the test configuration: when enabled, it runs TCP-only tests with non-blocking sockets, 512MB memory limit, and the XLIO_WORKER_THREADS=1 setting; when disabled (default), it maintains the existing R2C behavior with TCP+UDP tests and 256MB memory limit. The implementation consolidates LD_PRELOAD into the test_lib_env variable and adds --fair-sched=yes to Valgrind for better multi-threaded testing. The .ci/matrix_job.yaml file was also updated to enable this new test configuration in the Jenkins pipeline.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| contrib/jenkins_tests/vg.sh | 3/5 | Added conditional logic to support Worker Threads mode testing with mode-specific parameters, memory limits, and test configurations |
| .ci/matrix_job.yaml | 4/5 | Updated CI matrix configuration to enable Worker Threads mode Valgrind testing in the pipeline |
Confidence score: 3/5
- This PR introduces important testing coverage but has implementation issues that could cause runtime failures.
- Score reflects a critical LD_PRELOAD duplication bug (previously identified) where the library is loaded twice, potential issues with the string-based boolean check for WORKER_THREADS, and the 2x memory limit increase (512MB vs 256MB) that lacks justification in the PR description. The
.ci/matrix_job.yamlchanges were not visible in the diff but are referenced in the file list. - Pay close attention to contrib/jenkins_tests/vg.sh lines 49-66 (environment variable setup) and lines 110, 125 (where
env $test_lib_envis used with commands) to ensure LD_PRELOAD is not duplicated and that the Worker Threads configuration is correct.
Sequence Diagram
sequenceDiagram
participant User
participant Jenkins
participant Matrix Pipeline
participant Docker Container (vg/vg-worker-threads)
participant Valgrind Tool
participant Sockperf Application
User->>Jenkins: "Trigger CI (bot:retest or PR)"
Jenkins->>Matrix Pipeline: "Load matrix_job.yaml"
Matrix Pipeline->>Docker Container (vg/vg-worker-threads): "Start container with net1 interface"
Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Install DOCA"
Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Run autogen.sh"
Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Configure with --with-valgrind"
Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Build and install library"
Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Check WORKER_THREADS mode"
alt WORKER_THREADS=true
Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Set XLIO_WORKER_THREADS=1"
Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Set test_list=tcp only"
else WORKER_THREADS=false
Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Set test_list=tcp,udp"
end
Docker Container (vg/vg-worker-threads)->>Valgrind Tool: "Start Valgrind with libxlio.so"
Valgrind Tool->>Sockperf Application: "Launch sockperf server (sr)"
Sockperf Application->>Sockperf Application: "Wait for warmup stage"
Valgrind Tool->>Sockperf Application: "Launch sockperf client (pp)"
Sockperf Application->>Sockperf Application: "Run ping-pong test for 10s"
Sockperf Application->>Valgrind Tool: "Complete test"
Valgrind Tool->>Docker Container (vg/vg-worker-threads): "Generate valgrind logs"
Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Check for memory errors"
Docker Container (vg/vg-worker-threads)->>Docker Container (vg/vg-worker-threads): "Archive logs"
Docker Container (vg/vg-worker-threads)->>Matrix Pipeline: "Return test results (TAP format)"
Matrix Pipeline->>Jenkins: "Archive artifacts on failure"
Jenkins->>User: "Report CI status"
2 files reviewed, no comments
d955898 to
f5bda64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR extends the CI Valgrind test suite to support Worker Threads execution mode alongside the existing R2C (Run-to-Completion) mode. The change adds a WORKER_THREADS environment variable that switches between two test configurations: Worker Threads mode runs TCP-only tests with 512MB memory limit and non-blocking I/O, while R2C mode tests both TCP and UDP with 256MB memory. The implementation refactors contrib/jenkins_tests/vg.sh by consolidating LD_PRELOAD into the test_lib_env variable, adding resource limit adjustments (ulimit), and introducing Valgrind's --fair-sched=yes flag for better thread scheduling. The .ci/matrix_job.yaml configuration is updated to enable this new test mode in the CI pipeline. This addresses a gap where Worker Threads mode was not being validated during continuous integration.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| contrib/jenkins_tests/vg.sh | 2/5 | Added Worker Threads mode support with conditional test configuration, but contains a critical LD_PRELOAD duplication issue that needs resolution |
| .ci/matrix_job.yaml | 4/5 | Updated CI matrix configuration to enable the new Worker Threads test mode in the pipeline |
Confidence score: 2/5
- This PR introduces a critical issue that will likely cause problems during test execution due to environment variable duplication
- Score reflects the
LD_PRELOADduplication bug identified in line 49 where it's set intest_lib_env, then implicitly expanded in lines 110 and 125, potentially causing dynamic linker warnings or unpredictable behavior. Additionally, the string comparison pattern[[ "$WORKER_THREADS" = "true" ]]lacks input validation and could behave unexpectedly with case variations or whitespace. The ulimit changes (lines 39-40) are not well-justified and could mask resource leaks in tests. - Pay close attention to contrib/jenkins_tests/vg.sh, specifically the
LD_PRELOADhandling and the conditional logic for mode selection
Sequence Diagram
sequenceDiagram
participant CI as "CI Pipeline"
participant Docker as "Docker Container"
participant Script as "vg.sh Script"
participant SockperfSR as "Sockperf Server"
participant Valgrind as "Valgrind Tool"
participant SockperfCL as "Sockperf Client"
participant Logs as "Log Files"
CI->>Docker: "Trigger Valgrind test step"
Docker->>Script: "Execute vg.sh with WORKER_THREADS env"
Script->>Script: "Configure test environment (ulimit, hugepages)"
Script->>Script: "Build XLIO with valgrind support"
Script->>Script: "Detect net1 interface IP addresses"
Script->>Script: "Determine test mode (R2C or Worker Threads)"
alt Worker Threads Mode
Script->>Script: "Set test_list=tcp only"
Script->>Script: "Add XLIO_WORKER_THREADS=1 to env"
else R2C Mode
Script->>Script: "Set test_list=tcp + udp"
end
loop For each test (tcp/udp) and IP
Script->>Valgrind: "Launch valgrind with sockperf server"
Valgrind->>SockperfSR: "Start server with instrumentation"
SockperfSR->>Logs: "Log server output"
Script->>Script: "Wait for server warmup (max 120s)"
Script->>Valgrind: "Launch valgrind with sockperf client"
Valgrind->>SockperfCL: "Start ping-pong test with instrumentation"
SockperfCL->>SockperfSR: "Exchange test traffic"
SockperfCL->>Logs: "Log client output"
Script->>SockperfSR: "Kill server processes"
Script->>Logs: "Archive valgrind and output logs"
Script->>Logs: "Parse valgrind logs for errors"
Logs-->>Script: "Return error count"
Script->>Logs: "Write TAP test result"
end
Script->>Script: "Calculate total error count"
alt Errors found
Script->>CI: "Exit with error code"
CI->>CI: "Archive failure artifacts"
else No errors
Script->>CI: "Exit with success"
end
2 files reviewed, 1 comment
f5bda64 to
394b76c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR extends Valgrind CI testing to support Worker Threads mode alongside the existing R2C mode. The changes modify contrib/jenkins_tests/vg.sh to conditionally configure test parameters based on a new WORKER_THREADS environment variable, and update .ci/matrix_job.yaml to run the Valgrind test in both modes. When Worker Threads mode is enabled (WORKER_THREADS=true), the test runs TCP-only with increased memory limits (512MB), sets XLIO_WORKER_THREADS=1, adds --nonblocked to sockperf, and enables Valgrind's --fair-sched=yes option. This change integrates with the existing CI matrix infrastructure documented in .ci/, allowing the Jenkins pipeline to expand testing coverage by simply varying the WORKER_THREADS environment variable across matrix entries.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| contrib/jenkins_tests/vg.sh | 2/5 | Adds Worker Threads mode support with conditional test configuration, but introduces a critical LD_PRELOAD duplication bug |
| .ci/matrix_job.yaml | 4/5 | Adds two Valgrind test matrix entries to run both R2C and Worker Threads modes |
Confidence score: 2/5
- This PR requires careful review before merging due to a critical bug that will cause test failures
- Score lowered because of the LD_PRELOAD duplication issue in vg.sh:
test_lib_envincludesLD_PRELOAD=$test_lib(line 49) but lines 110 and 125 still use bareenv $test_lib_envwithout removing the now-redundant explicit LD_PRELOAD references, resulting in duplicate environment variable definitions - Pay close attention to contrib/jenkins_tests/vg.sh lines 110, 125 where the eval commands invoke
env $test_lib_env- the explicitLD_PRELOAD=$test_libportions must be removed since they're already included intest_lib_env
Sequence Diagram
sequenceDiagram
participant Jenkins
participant CI Pipeline
participant Docker Container
participant Test Environment
participant Valgrind
participant Sockperf
Jenkins->>CI Pipeline: Trigger vg.sh (WORKER_THREADS env var set)
CI Pipeline->>Docker Container: Launch vg or vg-worker-threads container
Docker Container->>Test Environment: Setup (ulimit, hugepages, configure)
Test Environment->>Test Environment: Build and install XLIO with Valgrind
alt Container Environment
Test Environment->>Test Environment: Get IP from net1 interface
else Host Environment
Test Environment->>Test Environment: Get IP from eth interface
end
alt WORKER_THREADS=true
Test Environment->>Test Environment: Set test_list="tcp:--tcp", XLIO_WORKER_THREADS=1
else WORKER_THREADS=false
Test Environment->>Test Environment: Set test_list="tcp:--tcp udp:", R2C mode
end
loop For each test (tcp/udp) and IP (IPv4/IPv6)
Test Environment->>Valgrind: Start Valgrind with sockperf server
Valgrind->>Sockperf: Run "sockperf sr" with LD_PRELOAD=libxlio.so
Sockperf-->>Test Environment: Wait for warmup (up to 120s)
Test Environment->>Valgrind: Start Valgrind with sockperf client
Valgrind->>Sockperf: Run "sockperf pp" for 10 seconds
Sockperf-->>Valgrind: Generate latency results
Valgrind-->>Test Environment: Write valgrind logs
Test Environment->>Test Environment: Kill sockperf processes
Test Environment->>Test Environment: Archive logs and analyze errors
alt Valgrind found errors
Test Environment->>CI Pipeline: Report "not ok" in TAP format
else No errors found
Test Environment->>CI Pipeline: Report "ok" in TAP format
end
end
Test Environment->>CI Pipeline: Return exit code (0 or error count)
CI Pipeline->>Jenkins: Archive artifacts and publish results
2 files reviewed, no comments
|
@NirWolfer , need to rebase |
Today we are running the Valgrind test in R2C mode only, meaning we are not testing the new worker threads mode in our CI process. Add Worker threads mode test to the vg.sh based on WORKER_THREADS env var Signed-off-by: NirWolfer <[email protected]>
394b76c to
6493e0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds Valgrind testing support for worker threads mode alongside existing R2C mode testing
Key Changes:
- New
vg-worker-threadscontainer configuration in CI pipeline - Separate Valgrind test step with
WORKER_THREADS=trueenvironment variable - Mode-specific test configuration: worker threads uses TCP-only with
--nonblockedflag and 512MB memory, while R2C continues with TCP+UDP and 256MB - Adds
--fair-sched=yesflag to Valgrind for improved thread scheduling - Refactors
LD_PRELOADintotest_lib_envvariable to eliminate duplication - Adds ulimit configuration at script start (50000 file descriptors, unlimited locked memory)
- Updates
sockperf-worker-threadscontainer to use pre-built Harbor image
Confidence Score: 4/5
- This PR is safe to merge with low risk - adds parallel test coverage without modifying existing tests
- The changes properly extend CI testing to cover worker threads mode while preserving existing R2C tests. The implementation follows established patterns, uses appropriate mode-specific configurations, and improves code quality by consolidating LD_PRELOAD. Minor confidence reduction due to the test parameter changes that should be verified in actual CI runs
- No files require special attention - both changes follow CI/test patterns correctly
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| .ci/matrix_job.yaml | 5/5 | Adds vg-worker-threads container config and new Valgrind test step with WORKER_THREADS=true, updates sockperf-worker-threads to use pre-built image |
| contrib/jenkins_tests/vg.sh | 4/5 | Adds WORKER_THREADS env var support with mode-specific config (tcp-only, --nonblocked, 512MB memory for worker mode), moves LD_PRELOAD into test_lib_env, adds ulimit and --fair-sched flag |
Sequence Diagram
sequenceDiagram
participant CI as CI Pipeline
participant Matrix as matrix_job.yaml
participant Jenkins as test_jenkins.sh
participant VG as vg.sh
participant Valgrind as Valgrind Tool
participant Sockperf as Sockperf App
CI->>Matrix: Trigger Valgrind tests
alt R2C Mode (WORKER_THREADS=false)
Matrix->>Jenkins: Run with vg container
Jenkins->>VG: Execute with WORKER_THREADS=false
VG->>VG: Configure R2C mode (TCP+UDP, 256MB)
VG->>Valgrind: Start server (sr) with R2C config
Valgrind->>Sockperf: Launch server
VG->>Valgrind: Start client (pp) with R2C config
Valgrind->>Sockperf: Launch client
Sockperf-->>VG: Test results
VG-->>Jenkins: Report R2C test results
end
alt Worker Threads Mode (WORKER_THREADS=true)
Matrix->>Jenkins: Run with vg-worker-threads container
Jenkins->>VG: Execute with WORKER_THREADS=true
VG->>VG: Configure Worker mode (TCP only, 512MB, --nonblocked)
VG->>Valgrind: Start server with Worker config + --fair-sched
Valgrind->>Sockperf: Launch server with --nonblocked
VG->>Valgrind: Start client with Worker config + --fair-sched
Valgrind->>Sockperf: Launch client with --nonblocked
Sockperf-->>VG: Test results
VG-->>Jenkins: Report Worker mode test results
end
Jenkins-->>Matrix: All test results
Matrix-->>CI: Final status
2 files reviewed, no comments
|
@NirWolfer , need approval from reviwer |
|
@galnoam please merge |
|
@NirWolfer, are you signing commits with -s or just writing the signoff in the comment as text? |
we use -s as always |
Description
Today we are running the Valgrind test in R2C mode only, meaning we are not testing the new worker threads mode in our CI process.
What
Add Worker threads mode test to the vg.sh based on WORKER_THREADS env var
Why ?
HPCINFRA-3936
How ?
It is optional but for complex PRs please provide information about the design,
architecture, approach, etc.
Change type
What kind of change does this PR introduce?
Check list