Skip to content

Conversation

@NirWolfer
Copy link
Contributor

Description

Valgrind today is being terminated with -9 which prevents it from printing all the summaries at the end which are needed for debuging.

What

Change -9 signal to -SIGINT af the first try of shutting down Valgrind

Why ?

HPCINFRA-3937

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

@NirWolfer NirWolfer force-pushed the vg_mem_summary_bugfix branch from 3211dee to 1cf3db2 Compare September 30, 2025 18:01
@dpressle dpressle self-requested a review October 2, 2025 17:42
@NirWolfer
Copy link
Contributor Author

bot:retest

@NirWolfer NirWolfer marked this pull request as ready for review October 15, 2025 10:51
@NirWolfer
Copy link
Contributor Author

Depends on #468

@NirWolfer NirWolfer force-pushed the vg_mem_summary_bugfix branch from 1cf3db2 to 54199bf Compare October 26, 2025 11:45
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR modifies the Valgrind test cleanup script in the CI infrastructure to replace SIGKILL (-9) with SIGINT as the initial termination signal for sockperf processes. The change enables Valgrind to execute its cleanup routines and generate complete memory leak summaries, which are essential for debugging memory issues in the XLIO library during CI validation. The script retains a three-tier escalation strategy (SIGINT with 10s wait → SIGTERM with 3s wait → SIGKILL) to ensure that unresponsive processes are eventually terminated. This modification aligns with standard best practices for graceful process termination while maintaining CI pipeline reliability.

PR Description Notes:

  • Minor typo: "debuging" should be "debugging"
  • Typo: "af the first try" should be "at the first try"

Important Files Changed

Filename Score Overview
contrib/jenkins_tests/vg.sh 5/5 Changed initial process termination from SIGKILL to SIGINT to allow Valgrind graceful shutdown and summary generation

Confidence score: 5/5

  • This PR is safe to merge with minimal risk
  • Score reflects a simple, well-understood signal handling change in a CI script with proper fallback mechanisms already in place to prevent hanging processes
  • No files require special attention

Sequence Diagram

sequenceDiagram
    participant User
    participant CI as CI System
    participant Script as vg.sh
    participant Valgrind
    participant Sockperf as Sockperf Server
    participant SockperfClient as Sockperf Client
    
    User->>CI: Trigger CI Pipeline
    CI->>Script: Execute vg.sh
    Script->>Script: Configure hugepages
    Script->>Script: Build XLIO with Valgrind support
    Script->>Script: Detect network interface (net1/eth)
    Script->>Script: Get IPv4/IPv6 addresses
    
    loop For each IP (IPv4/IPv6)
        loop For each test (TCP/UDP)
            Script->>Valgrind: Start Valgrind with memory checks
            Valgrind->>Sockperf: Launch sockperf server (sr mode)
            Sockperf-->>Script: Log warmup stage message
            Script->>Script: Wait for warmup (max 120s)
            Script->>Valgrind: Start Valgrind for client
            Valgrind->>SockperfClient: Launch sockperf client (pp mode, 10s)
            SockperfClient->>Sockperf: Send test traffic
            Sockperf-->>SockperfClient: Process requests
            SockperfClient-->>Script: Complete test run
            Script->>Sockperf: Send SIGINT signal
            Script->>Script: Wait 10 seconds
            alt Process still running
                Script->>Sockperf: Send SIGTERM signal
                Script->>Script: Wait 3 seconds
            end
            alt Process still running
                Script->>Sockperf: Send SIGKILL signal
            end
            Valgrind-->>Script: Generate memory check logs
            Script->>Script: Archive logs
            Script->>Script: Parse ERROR SUMMARY
            alt Errors found > 0
                Script->>Script: Record test failure in TAP
            else No errors and latency found
                Script->>Script: Record test success in TAP
            end
        end
    end
    
    Script->>CI: Return exit code
    CI-->>User: Report test results
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Valgrind today is being terminated with -9 which prevents it from
printing all the summaries at the end which are needed for debuging.

Change -9 signal to -SIGINT af the first try of shutting down Valgrind

Signed-off-by: NirWolfer <[email protected]>
@NirWolfer NirWolfer force-pushed the vg_mem_summary_bugfix branch from 54199bf to b03556f Compare October 30, 2025 09:44
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Changed the initial process termination signal from SIGKILL (-9) to SIGINT in the Valgrind test cleanup routine. This allows Valgrind to gracefully shutdown and print its full error summaries and leak reports, which are essential for debugging.

Key changes:

  • Line 117: Changed pkill -9 to pkill -SIGINT for the first termination attempt
  • Removed obsolete comment about SIGINT not working (it was previously referring to the wrong signal)
  • Maintains existing escalation path: SIGINTSIGTERMSIGKILL with appropriate wait times

The change follows standard Unix signal handling best practices where graceful signals are attempted before forceful termination.

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • The change is minimal, well-understood, and follows Unix best practices. The signal escalation path ensures that even if graceful shutdown fails, the process will still be terminated. The existing 10-second wait and fallback mechanisms prevent any potential hangs.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
contrib/jenkins_tests/vg.sh 5/5 Changed first termination signal from SIGKILL to SIGINT, allowing Valgrind to print summaries gracefully

Sequence Diagram

sequenceDiagram
    participant Script as vg.sh
    participant Process as sockperf (Valgrind)
    
    Note over Script,Process: After tests complete
    
    Script->>Process: Check if process still running
    alt Process is running
        Script->>Process: pkill -SIGINT (graceful shutdown)
        Note over Process: Valgrind catches SIGINT<br/>and prints summaries
        Script->>Script: Wait 10 seconds
        
        alt Process still running
            Script->>Process: pkill -SIGTERM (terminate)
            Script->>Script: Wait 3 seconds
            
            alt Process still running
                Script->>Process: pkill -SIGKILL (force kill)
                Note over Process: Process terminated immediately
            end
        end
    end
    
    Script->>Script: Archive logs and check results
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@dpressle
Copy link
Collaborator

dpressle commented Nov 2, 2025

@galnoam please merge

@galnoam galnoam merged commit 5412ecf into Mellanox:vNext Nov 4, 2025
1 check passed
@NirWolfer NirWolfer deleted the vg_mem_summary_bugfix branch November 4, 2025 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants