Skip to content

Conversation

@tomerdbz
Copy link
Collaborator

@tomerdbz tomerdbz commented Mar 31, 2025

Description

This change replaces a thread-local dummy locker with a global one in ring_slave.cpp to fix a use-after-free issue.

What

Replace thread-local dummy locker with a global one in ring_slave.

Why ?

During XLIO shutdown, when one thread attempts to access a socket's locker that was created by a terminated thread, a use-after-free issue occurs because the thread-local dummy locker object is freed when its creator thread terminates. This can lead to segmentation faults during cleanup.

How ?

The solution is straightforward:

  1. Replace static thread_local lock_dummy t_lock_dummy_ring with a global static lock_dummy g_lock_dummy_ring
  2. Update get_new_lock() to use this global locker
  3. The global locker is initialized with a name for better debugging: "global_dummy_ring"

This change ensures the dummy locker remains valid throughout the program's lifetime, regardless of thread termination status.

Performance considerations

Pros of global implementation:

  1. The dummy lock is accessed frequently in the data path
  2. Each access to a thread-local variable requires the more complex memory access pattern
  3. This compounds the overhead across many accesses

The global was aligned to be in a different cache line to prevent false-sharing - thus not introducing perf hit,

Tests

added a system-test (gtest) that discovered same issue occured on src/core/sock/sockinfo_tcp.cpp.
Applied the same fix there as well for the test to pass.

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

@tomerdbz tomerdbz requested a review from galnoam March 31, 2025 08:11
@tomerdbz tomerdbz marked this pull request as draft March 31, 2025 08:13
@tomerdbz tomerdbz marked this pull request as ready for review March 31, 2025 08:29
@tomerdbz tomerdbz force-pushed the issue_4347777 branch 4 times, most recently from 02361d9 to ab41c35 Compare April 1, 2025 07:49
The thread-local dummy locker in ring_slave could cause use-after-free
issues during XLIO shutdown when one thread attempts to access
a socket's locker that was created by a terminated thread.
This occurs because the thread-local object is freed
when its creator thread terminates.

Replace the thread-local dummy locker with a global one to prevent this
issue. To maintain data path performance, optimize the dummy lock for
a different cache-line to prevent false sharing by aligning the lock on
a 64-byte boundary.

Signed-off-by: Tomer Cabouly <[email protected]>
@tomerdbz
Copy link
Collaborator Author

bot:retest

@tomerdbz
Copy link
Collaborator Author

/review

@pr-review-bot-app
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The replacement of the thread-local t_lock_dummy_ring with the global g_lock_dummy_ring may introduce unintended side effects in multi-threaded environments. Ensure that the global lock is thread-safe and does not cause contention or race conditions.

static padded_lock_dummy g_lock_dummy_ring;

static lock_base *get_new_lock(const char *name, bool real_lock)
{
    return (real_lock
                ? static_cast<lock_base *>(multilock::create_new_lock(MULTILOCK_RECURSIVE, name))
                : static_cast<lock_base *>(&g_lock_dummy_ring.lock));
Thread Safety

The change from t_lock_dummy_socket to g_lock_dummy_socket introduces a global lock. Verify that this global lock does not lead to false sharing or performance degradation in high-concurrency scenarios.

static padded_lock_dummy g_lock_dummy_socket;

/*
 * The following socket options are inherited by a connected TCP socket from the listening socket:
 * SO_DEBUG, SO_DONTROUTE, SO_KEEPALIVE, SO_LINGER, SO_OOBINLINE, SO_RCVBUF, SO_RCVLOWAT, SO_SNDBUF,
 * SO_SNDLOWAT, TCP_MAXSEG, TCP_NODELAY.
 */
static bool is_inherited_option(int __level, int __optname)
{
    bool ret = false;
    if (__level == SOL_SOCKET) {
        switch (__optname) {
        case SO_DEBUG:
        case SO_DONTROUTE:
        case SO_KEEPALIVE:
        case SO_LINGER:
        case SO_OOBINLINE:
        case SO_RCVBUF:
        case SO_RCVLOWAT:
        case SO_SNDBUF:
        case SO_SNDLOWAT:
        case SO_XLIO_RING_ALLOC_LOGIC:
            ret = true;
        }
    } else if (__level == IPPROTO_TCP) {
        switch (__optname) {
        case TCP_MAXSEG:
        case TCP_NODELAY:
        case TCP_KEEPIDLE:
#if LWIP_TCP_KEEPALIVE
        case TCP_KEEPINTVL:
        case TCP_KEEPCNT:
#endif
        case TCP_USER_TIMEOUT:
            ret = true;
        }
    } else if (__level == IPPROTO_IP) {
        switch (__optname) {
        case IP_TTL:
            ret = true;
        }
    } else if (__level == IPPROTO_IPV6) {
        switch (__optname) {
        case IPV6_V6ONLY:
            ret = true;
        }
    }

    return ret;
}

event_handler_manager *sockinfo_tcp::get_event_mgr()
{
    if (is_xlio_socket()) {
        return m_p_group->get_event_handler();
    } else if (safe_mce_sys().tcp_ctl_thread ==
               option_tcp_ctl_thread::CTL_THREAD_DELEGATE_TCP_TIMERS) {
        return &g_event_handler_manager_local;
    } else {
        return g_p_event_handler_manager;
    }
}

tcp_timers_collection *sockinfo_tcp::get_tcp_timer_collection()
{
    if (is_xlio_socket()) {
        return m_p_group->get_tcp_timers();
    } else if (safe_mce_sys().tcp_ctl_thread ==
               option_tcp_ctl_thread::CTL_THREAD_DELEGATE_TCP_TIMERS) {
        return &g_thread_local_tcp_timers;
    } else {
        return g_tcp_timers_collection;
    }
}

static lock_base *get_new_tcp_lock()
{
    return (
        safe_mce_sys().tcp_ctl_thread != option_tcp_ctl_thread::CTL_THREAD_DELEGATE_TCP_TIMERS
            ? static_cast<lock_base *>(multilock::create_new_lock(MULTILOCK_RECURSIVE, "tcp_con"))
            : static_cast<lock_base *>(&g_lock_dummy_socket.lock));

@tomerdbz tomerdbz changed the title issue: 4321234 Replace thread-local dummy lock with global one issue: 4347777 Replace thread-local dummy lock with global one Apr 21, 2025
@galnoam galnoam merged commit 98ffee6 into Mellanox:vNext May 5, 2025
1 check passed
@galnoam
Copy link
Collaborator

galnoam commented May 5, 2025

@tomerdbz , please move to Fix/closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants