Skip to content

Conversation

@pasis
Copy link
Member

@pasis pasis commented Oct 2, 2024

Description

Avoid start_thread with each reg action.
Refactor destructor.

What

Avoid start_thread with each reg action.
Refactor destructor.

Why ?

Potential CPU improvement for CPS and code cleanup.

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

@pasis pasis added the draft Not to review yet label Oct 2, 2024
@galnoam
Copy link
Collaborator

galnoam commented Mar 16, 2025

@pasis, can we keep this in your privare fork?

@galnoam
Copy link
Collaborator

galnoam commented Mar 17, 2025

bot:retest

@galnoam galnoam removed the draft Not to review yet label Mar 17, 2025
@pasis pasis changed the title issue: Refactor event_handler_manager issue: 4432879 Refactor event_handler_manager May 5, 2025
@galnoam
Copy link
Collaborator

galnoam commented May 12, 2025

bot:retest

1 similar comment
@galnoam
Copy link
Collaborator

galnoam commented May 13, 2025

bot:retest

@galnoam
Copy link
Collaborator

galnoam commented May 14, 2025

/review

@galnoam
Copy link
Collaborator

galnoam commented May 14, 2025

/improve

@pr-review-bot-app
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Resource Cleanup

Ensure that the m_epfd resource is properly closed and reset in the destructor to avoid resource leaks or undefined behavior.

if (m_epfd >= 0) {
    SYSCALL(close, m_epfd);
}
m_epfd = -1;
Thread Affinity Handling

Validate the logic for setting thread affinity and fallback mechanisms in start_thread to ensure compatibility and robustness across different environments.

bool affinity_requested = false;

if (!m_b_continue_running) {
    errno = ECANCELED;
    return -1;
}
if (m_event_handler_tid != 0) {
    return 0;
}

ret = pthread_attr_init(&tattr);
if (ret != 0) {
    return -1;
}

cpu_set = safe_mce_sys().internal_thread_affinity;
if (strcmp(safe_mce_sys().internal_thread_affinity_str, "-1") &&
    !strcmp(safe_mce_sys().internal_thread_cpuset, MCE_DEFAULT_INTERNAL_THREAD_CPUSET)) {
    ret = pthread_attr_setaffinity_np(&tattr, sizeof(cpu_set), &cpu_set);
    if (ret != 0) {
        evh_logwarn("Failed to set event handler thread affinity. (errno=%d)", errno);
    }
    affinity_requested = (ret == 0);
} else {
    evh_logdbg("Internal thread affinity not set.");
}

ret = pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this);
if (ret != 0 && affinity_requested) {
    // Try without affinity in case this is a cset issue.
    evh_logwarn("Failed to start event handler thread with a thread affinity. Trying default "
                "thread affinity. (errno=%d)",
                errno);
    pthread_attr_destroy(&tattr);
    ret = pthread_attr_init(&tattr)
        ?: pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this);
}
// Destroy will either succeed or return EINVAL if the init fails in the above block.
pthread_attr_destroy(&tattr);

if (ret == 0) {
    evh_logdbg("Started event handler thread.");
} else {
    evh_logerr("Failed to start event handler thread. (errno=%d)", errno);
}
return ret;
Exception Handling

Confirm that throwing an exception when start_thread fails in do_global_ctors_helper is the appropriate behavior and won't cause unintended side effects during initialization.

if (rc != 0) {
    BULLSEYE_EXCLUDE_BLOCK_START
    throw_xlio_exception("Failed to start event handler thread.\n");
    BULLSEYE_EXCLUDE_BLOCK_END
}

@pr-review-bot-app
Copy link

pr-review-bot-app bot commented May 14, 2025

PR Code Suggestions ✨

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add error handling for thread initialization

Add error handling for the pthread_attr_init and pthread_create calls to ensure that
any failure in these operations is logged or handled appropriately, avoiding
potential undefined behavior.

src/core/event/event_handler_manager.cpp [347-348]

-ret = pthread_attr_init(&tattr)
-    ?: pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this);
+ret = pthread_attr_init(&tattr);
+if (ret != 0) {
+    evh_logerr("Failed to initialize thread attributes. (errno=%d)", errno);
+    return ret;
+}
+ret = pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this);
+if (ret != 0) {
+    evh_logerr("Failed to create event handler thread. (errno=%d)", errno);
+    pthread_attr_destroy(&tattr);
+    return ret;
+}
Suggestion importance[1-10]: 8

__

Why: The suggestion improves robustness by adding error handling for pthread_attr_init and pthread_create. This ensures that failures in thread initialization are logged and handled properly, reducing the risk of undefined behavior.

Medium
Add null check for pointer access

Add a null check for m_p_reg_action_q_to_push_to before accessing it to prevent
potential segmentation faults if the pointer is null.

src/core/event/event_handler_manager.cpp [446]

-is_empty = m_p_reg_action_q_to_push_to->empty();
+is_empty = (m_p_reg_action_q_to_push_to && m_p_reg_action_q_to_push_to->empty());
Suggestion importance[1-10]: 8

__

Why: Adding a null check for m_p_reg_action_q_to_push_to prevents potential segmentation faults, which is a significant improvement for code safety and stability.

Medium
Prevent resource leaks in thread attributes

Ensure that the pthread_attr_destroy is called after the second pthread_attr_init to
avoid resource leaks in case of repeated initialization.

src/core/event/event_handler_manager.cpp [341-349]

 if (ret != 0 && affinity_requested) {
     evh_logwarn("Failed to start event handler thread with a thread affinity. Trying default "
                 "thread affinity. (errno=%d)",
                 errno);
     pthread_attr_destroy(&tattr);
-    ret = pthread_attr_init(&tattr)
-        ?: pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this);
+    ret = pthread_attr_init(&tattr);
+    if (ret == 0) {
+        ret = pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this);
+        pthread_attr_destroy(&tattr);
+    }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion ensures that pthread_attr_destroy is called after the second pthread_attr_init, preventing resource leaks. This is a good improvement for resource management and aligns with best practices.

Medium
General
Log error code before exception

Log the error code before throwing the exception to provide more context for
debugging failures in starting the event handler thread.

src/core/main.cpp [1000-1003]

 rc = g_p_event_handler_manager->start_thread();
 if (rc != 0) {
+    evh_logerr("Failed to start event handler thread. (rc=%d)", rc);
     BULLSEYE_EXCLUDE_BLOCK_START
     throw_xlio_exception("Failed to start event handler thread.\n");
     BULLSEYE_EXCLUDE_BLOCK_END
 }
Suggestion importance[1-10]: 6

__

Why: Logging the error code before throwing an exception provides valuable debugging information. While not critical, it enhances the ability to diagnose issues when the thread fails to start.

Low

@galnoam
Copy link
Collaborator

galnoam commented May 14, 2025

Hi @pasis , @AlexanderGrissik , please check pr agent review comments.

@pasis
Copy link
Member Author

pasis commented Oct 20, 2025

bot:retest

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Refactors event handler manager initialization to start the thread once during XLIO initialization instead of on every registration action.

Key changes:

  • Moved start_thread() call from post_new_reg_action() to do_global_ctors_helper() in main.cpp
  • Made start_thread() public to enable explicit initialization control
  • Removed free_evh_resources() method, inlining cleanup logic into destructor
  • Simplified constructor by setting m_b_continue_running = true unconditionally before mode check
  • Improved error handling in start_thread() with better logging and explicit error returns

Benefits:

  • Eliminates redundant thread creation checks on every registration (CPS performance improvement)
  • Cleaner separation of concerns - initialization vs. registration
  • More predictable thread lifecycle management

Critical issue found:

  • Line 338-339 in event_handler_manager.cpp: Logic error in fallback path where pthread_create may be called with uninitialized attributes if second pthread_attr_init fails

Confidence Score: 2/5

  • This PR contains a critical logic bug that must be fixed before merging
  • The refactoring concept is sound and improves performance, but the retry logic in start_thread() (lines 338-339) has a critical bug where pthread_create can be called with an uninitialized tattr if the second pthread_attr_init fails. This could cause undefined behavior or crashes during thread creation fallback scenarios.
  • src/core/event/event_handler_manager.cpp (lines 338-339) - fix the compound statement logic error

Important Files Changed

File Analysis

Filename Score Overview
src/core/event/event_handler_manager.cpp 2/5 Refactored start_thread with improved error handling but has critical logic bug in retry path (line 338-339)
src/core/event/event_handler_manager.h 5/5 Made start_thread() public and removed free_evh_resources() declaration - clean refactoring
src/core/main.cpp 5/5 Added explicit start_thread() call with error handling during initialization - improves control flow

Sequence Diagram

sequenceDiagram
    participant Main as main.cpp
    participant EHM as event_handler_manager
    participant Thread as event_handler_thread
    participant Reg as Registration callers

    Note over Main: XLIO initialization (do_global_ctors_helper)
    Main->>EHM: NEW_CTOR(g_p_event_handler_manager, event_handler_manager())
    activate EHM
    EHM->>EHM: m_b_continue_running = true
    EHM->>EHM: epoll_create(m_epfd)
    EHM->>EHM: wakeup_set_epoll_fd(m_epfd)
    EHM-->>Main: return
    deactivate EHM
    
    Main->>EHM: start_thread()
    activate EHM
    EHM->>EHM: pthread_attr_init(&tattr)
    EHM->>EHM: pthread_attr_setaffinity_np (optional)
    EHM->>Thread: pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this)
    activate Thread
    EHM-->>Main: return 0 (success)
    deactivate EHM
    
    Note over Main: Continue with initialization<br/>Create g_p_ib_ctx_handler_collection,<br/>g_p_net_device_table_mgr, etc.
    
    par Event Handler Thread Loop
        Thread->>Thread: thread_loop()
        loop while m_b_continue_running
            Thread->>Thread: m_timer.update_timeout()
            Thread->>Thread: epoll_wait(m_epfd, ...)
            Thread->>Thread: process registration actions
            Thread->>Thread: process events
        end
    and Registration Actions (from other components)
        Reg->>EHM: register_ibverbs_event()
        activate EHM
        EHM->>EHM: post_new_reg_action(reg_action)
        Note over EHM: OLD: Called start_thread() here<br/>NEW: Thread already running
        EHM->>EHM: m_p_reg_action_q_to_push_to->push_back()
        EHM->>EHM: do_wakeup()
        EHM-->>Reg: return
        deactivate EHM
        EHM-->>Thread: wakeup signal
        Thread->>Thread: handle_registration_action()
    end
    
    Note over Main: Shutdown sequence
    Main->>EHM: ~event_handler_manager()
    activate EHM
    EHM->>EHM: stop_thread()
    EHM->>EHM: m_b_continue_running = false
    EHM->>Thread: do_wakeup()
    Thread->>Thread: exit thread_loop()
    Thread->>Thread: process_remaining_registration_actions()
    deactivate Thread
    EHM->>EHM: pthread_join(m_event_handler_tid)
    EHM->>EHM: close(m_epfd)
    deactivate EHM
Loading

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +338 to +339
ret = pthread_attr_init(&tattr)
?: pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: logic error with ternary operator: if pthread_attr_init fails, pthread_create still executes with potentially uninitialized tattr

Suggested change
ret = pthread_attr_init(&tattr)
?: pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this);
if (pthread_attr_init(&tattr) != 0) {
ret = -1;
} else {
ret = pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this);
}

Event handler manager is a global object with a single internal thread.
Once XLIO is initialized, there is no option not to start the internal
thread and it's used starting from the initialization method (to create
a timer event for netlink).

Therefore, there is no point in trying to start the thread with each
post action to the event handler. This can save CPU in CPS scenario.

Signed-off-by: Dmytro Podgornyi <[email protected]>
This method simply calls stop_thread(). Remove it to simply the code and
separate thread stopping from epfd closing.

Signed-off-by: Dmytro Podgornyi <[email protected]>
@greptile-apps
Copy link

greptile-apps bot commented Nov 18, 2025

Greptile Summary

  • Moved event handler thread initialization from lazy (on-demand) to eager (during XLIO init) to improve CPS performance by avoiding repeated thread startup attempts
  • Simplified destructor by inlining free_evh_resources() and moving epfd cleanup from stop_thread() to destructor
  • Improved error handling in start_thread() with proper return codes, better retry logic without affinity, and informative logging

Confidence Score: 3/5

  • This PR has a critical logic bug that must be fixed before merging
  • The refactoring improves performance and code clarity, but line 338-339 contains a critical ternary operator bug where pthread_create executes with potentially uninitialized tattr if pthread_attr_init fails, which could cause undefined behavior or crashes
  • src/core/event/event_handler_manager.cpp requires immediate attention to fix the ternary operator logic error

Important Files Changed

Filename Overview
src/core/event/event_handler_manager.cpp Refactored thread initialization and cleanup logic; contains a critical ternary operator bug at line 338-339 that could result in using uninitialized thread attributes

Sequence Diagram

sequenceDiagram
    participant User
    participant "do_global_ctors_helper" as Init
    participant "event_handler_manager" as EHM
    participant "pthread" as Thread
    
    User->>Init: "Initialize XLIO"
    Init->>EHM: "NEW_CTOR(g_p_event_handler_manager, event_handler_manager())"
    activate EHM
    EHM->>EHM: "m_b_continue_running = true"
    EHM->>EHM: "epoll_create()"
    EHM->>EHM: "wakeup_set_epoll_fd()"
    EHM-->>Init: "Return manager instance"
    deactivate EHM
    
    Init->>EHM: "start_thread()"
    activate EHM
    EHM->>EHM: "Check m_b_continue_running"
    EHM->>EHM: "Check m_event_handler_tid == 0"
    EHM->>EHM: "pthread_attr_init(&tattr)"
    EHM->>EHM: "Set affinity if requested"
    EHM->>Thread: "pthread_create(&m_event_handler_tid, &tattr, event_handler_thread, this)"
    Thread-->>EHM: "Return thread ID"
    EHM->>EHM: "pthread_attr_destroy(&tattr)"
    EHM-->>Init: "Return 0 on success"
    deactivate EHM
    
    Init->>Init: "Check rc != 0"
    Init->>User: "Throw exception if failed"
    
    Note over EHM,Thread: "Thread now runs continuously<br/>handling events via thread_loop()"
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants