Skip to content

Conversation

@tomerdbz
Copy link
Collaborator

@tomerdbz tomerdbz commented Apr 9, 2025

Description

Resolve race condition where TCP segments leak during SYN-RCVD
timeout handling, causing "still N tcp segs in use" warnings.

Root cause: When tcp_slowtmr() times out a connection stuck in
SYN-RCVD state, it calls TCP_EVENT_ERR() which triggers
handle_incoming_handshake_failure(). This function calls close()
on the child connection, which attempts to send FIN and
allocates new TCP segments. These segments were never cleaned up.

Fix: Call abort_connection() before close() in
handle_incoming_handshake_failure(). The tcp_abort() ->
tcp_abandon() path explicitly does NOT send RST for SYN-RCVD
connections (matching Linux kernel behavior),
and properly purges all segments before setting state to CLOSED.
This prevents segment allocation during subsequent close() call.

Also enhanced destructor logging to show PCB state and queue
pointers for better debugging of any remaining segment leaks.

What

Fix TCP segment leak in SYN flood.

Why

Solves 4050516.

How
  1. Identified that TCP_EVENT_ERR() callback triggered close() which
    allocated segments that were never freed
  2. Added abort_connection() call before close() to preemptively
    clean up the PCB and set state to CLOSED
  3. The tcp_abort() -> tcp_abandon() code path explicitly skips
    sending RST for SYN-RCVD connections (matching Linux kernel
    behavior) and properly purges all queued segments
  4. When close() runs afterward, PCB is already in CLOSED state,
    preventing any attempt to send FIN/RST or allocate segments

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

@tomerdbz tomerdbz requested a review from galnoam April 9, 2025 06:35
@tomerdbz tomerdbz force-pushed the 4398221_fin_on_syn branch from 661ca2d to bcc47e5 Compare April 10, 2025 10:02
@tomerdbz
Copy link
Collaborator Author

/review

@pr-review-bot-app
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

RFC Compliance

Ensure the implementation correctly adheres to the RFC specification for handling connections in SYN_RCVD state, particularly the transition to CLOSED without sending FIN or RST packets.

if (get_tcp_state(pcb) == SYN_RCVD) {
    // according to the RFC, in case we get a SYN and no more data
    // we should just close w/o FIN or RST
    tcp_pcb_purge(pcb);
    set_tcp_state(pcb, CLOSED);
    return ERR_OK;
Resource Cleanup

Validate that tcp_pcb_purge(pcb) effectively cleans up resources associated with the PCB to prevent memory leaks or dangling references.

tcp_pcb_purge(pcb);

@tomerdbz
Copy link
Collaborator Author

tomerdbz commented Apr 11, 2025

to trigger the bug on any environment:

  1. compile the program below
  2. syn-flood it from a client (I choose scapy but there are many tools available)
  3. bug will trigger on vNext. after the patch you could see it's fixed :)
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>

int main() {
  int server_fd;
  struct sockaddr_in address;
  int opt = 1;
  int addrlen = sizeof(address);

  printf("TCP server cycling program starting...\n");

  // Create socket file descriptor
  if ((server_fd = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
    perror("socket failed");
    exit(EXIT_FAILURE);
  }

  // Set socket options
  if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR | SO_REUSEPORT, &opt,
                 sizeof(opt))) {
    perror("setsockopt failed");
    close(server_fd);
    exit(EXIT_FAILURE);
  }

  // Setup server address structure
  memset(&address, 0, sizeof(address));
  address.sin_family = AF_INET;
  address.sin_addr.s_addr = INADDR_ANY;
  address.sin_port = htons(55400);

  // Bind socket to the port
  if (bind(server_fd, (struct sockaddr *)&address, sizeof(address)) < 0) {
    perror("bind failed");
    close(server_fd);
    exit(EXIT_FAILURE);
  }

  // Listen for connections
  if (listen(server_fd, 3) < 0) {
    perror("listen failed");
    close(server_fd);
    exit(EXIT_FAILURE);
  }

  while (1) {
    // Accept a connection
    int new_socket;
    struct sockaddr_in client_addr;
    socklen_t client_addrlen = sizeof(client_addr);
    if ((new_socket = accept(server_fd, (struct sockaddr *)&client_addr,
                             &client_addrlen)) < 0) {
      perror("accept failed");
      close(server_fd);
      exit(EXIT_FAILURE);
    }
    // Close the accepted connection
    close(new_socket);
  }

  // This point will never be reached in the current code
  close(server_fd);
  return 0;
}

set_tcp_state(pcb, CLOSED);
return ERR_OK;
} else {
return tcp_close_shutdown(pcb, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inside tcp_close_shutdown we can see this code(below), we can fix the logic in "case SYN_RCVD" instead of adding this if condition.

switch (get_tcp_state(pcb)) {

...
case SYN_RCVD:
err = tcp_send_fin(pcb);
if (err == ERR_OK) {
set_tcp_state(pcb, FIN_WAIT_1);
}
..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. thanks :) note that I needed to add an if before the switch-case as I don't want to address rst_on_unacked_data.

@tomerdbz tomerdbz force-pushed the 4398221_fin_on_syn branch from bcc47e5 to 1772089 Compare April 20, 2025 07:32
BasharRadya
BasharRadya previously approved these changes Apr 20, 2025
@galnoam galnoam requested a review from pasis May 5, 2025 11:22
@galnoam
Copy link
Collaborator

galnoam commented May 5, 2025

@pasis, please review

Comment on lines 151 to 158
if (get_tcp_state(pcb) == SYN_RCVD) {
// according to the RFC, in case we get a SYN and no more data
// we should just close w/o FIN or RST
tcp_pcb_purge(pcb);
set_tcp_state(pcb, CLOSED);
return ERR_OK;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC states that FIN needs to be sent in this case. So, looks like original code makes sense.

section 3.10.4

3.10.4.  CLOSE Call
...
SYN-RECEIVED STATE

If no SENDs have been issued and there is no pending data to send, then form a FIN segment and send it, and enter FIN-WAIT-1 state; otherwise, queue for processing after entering ESTABLISHED state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pasis
Actually, I instrumented the code, adding this to tcp.c line 760:
printf("DEBUG: tcp_slowtmr PATH1 BEFORE purge: state=%d, unsent=%p unacked=%p\n",
get_tcp_state(pcb), pcb->unsent, pcb->unacked);

it validated we are aborting due to being "too long in SYN-RCVD" - we are not in CLOSE flow (3.10.4).

I've tried to look online what to do on timeout on SYN-RCVD, but it's not well-defined.
the linux kernel though simply abandons w/o sending RST, w/o sending FIN, as we do here - so I believe this solution is correct

Copy link
Collaborator Author

@tomerdbz tomerdbz Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see linux kernel answer to a dev opening a bug on this:
https://bugzilla.redhat.com/show_bug.cgi?id=150611

"This is not a bug, there is no standard that specifies that we should
elicit a reset when we've only seen a SYN from the other end.

It's quite clear why too, because if the host won't respond to our
SYN+ACK response packets, there is no reason to belive it will receive
and process correctly any RST frame we send out as well.

If your firewall is relying on such behavior, it really is the problem
not the Linux TCP stack."

Copy link
Collaborator Author

@tomerdbz tomerdbz Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this instrumentation does show the fix is not quite right - look at this updated fix please :)

@galnoam galnoam requested a review from pasis June 29, 2025 08:25
@galnoam
Copy link
Collaborator

galnoam commented Jun 29, 2025

@pasis , please review.
If approved, is it relevant to VMA as well?

@tomerdbz tomerdbz force-pushed the 4398221_fin_on_syn branch 2 times, most recently from 4855d58 to 11e9241 Compare October 28, 2025 12:12
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR fixes a TCP connection closure bug in the SYN_RCVD state by calling abort_connection() before close() in the handle_incoming_handshake_failure() path. The issue manifested as a race condition in Nginx CPS scenarios: when prepare_to_close() was invoked on a socket in SYN_RCVD state before the slow-timer could catch it, the normal tcp_close() path would incorrectly send a FIN packet and leak a TCP segment. Per RFC 9293 section 3.10.4, a SYN_RCVD socket with no pending data should transition directly to CLOSED. The fix leverages the existing abort_connection() method, which calls tcp_abandon() - a function that specifically does NOT send RST for SYN_RCVD state (matching Linux kernel behavior). This ensures no control packets are sent and no segments are allocated during handshake-failure cleanup. Enhanced destructor logging was also added to aid in debugging segment leaks by displaying PCB state and segment pointers.

Important Files Changed

Filename Score Overview
src/core/sock/sockinfo_tcp.cpp 5/5 Added abort_connection() call before close() in handle_incoming_handshake_failure() to prevent FIN transmission and segment leaks in SYN_RCVD state; enhanced destructor logging with PCB state.

Confidence score: 5/5

  • This PR is safe to merge with minimal risk.
  • Score reflects a well-targeted fix to a documented race condition with no changes to data-path logic, clear alignment with RFC behavior for SYN_RCVD state, and improved observability via enhanced logging.
  • No files require special attention.

Sequence Diagram

sequenceDiagram
    participant User as "User Application"
    participant Socket as "sockinfo_tcp"
    participant PCB as "TCP PCB"
    participant Network as "Network Layer"

    User->>Socket: close() connection
    Socket->>Socket: prepare_to_close(false)
    Socket->>PCB: get_tcp_state(&m_pcb)
    PCB-->>Socket: SYN_RCVD
    
    alt State is SYN_RCVD and not process_shutdown
        Socket->>PCB: set_tcp_state(&m_pcb, CLOSED)
        Note over Socket,Network: Skip FIN/RST transmission
        Socket->>Socket: unlock_tcp_con()
        Socket-->>User: return (closable)
    else Other states
        Socket->>Socket: continue normal shutdown
        Socket->>PCB: tcp_close(&m_pcb)
        PCB->>Network: Send FIN/RST packets
        Socket->>Socket: unlock_tcp_con()
        Socket-->>User: return result
    end
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@tomerdbz tomerdbz changed the title issue: 4398221 Fix connection closure on SYN_RCVD issue: 4050516 Fix TCP segment leak in SYN flood Oct 28, 2025
@tomerdbz tomerdbz changed the title issue: 4050516 Fix TCP segment leak in SYN flood issue: 4050516 Fix TCP segment leak Oct 28, 2025
@galnoam
Copy link
Collaborator

galnoam commented Nov 10, 2025

@BasharRadya can you review?

Resolve race condition where TCP segments leak during SYN-RCVD
timeout handling, causing "still N tcp segs in use" warnings.

Root cause: When tcp_slowtmr() times out a connection stuck in
SYN-RCVD state, it calls TCP_EVENT_ERR() which triggers
handle_incoming_handshake_failure(). This function calls close()
on the child connection, which attempts to send FIN and
allocates new TCP segments. These segments were never cleaned up.

Fix: Call abort_connection() before close() in
handle_incoming_handshake_failure(). The tcp_abort() ->
tcp_abandon() path explicitly does NOT send RST for SYN-RCVD
connections (matching Linux kernel behavior),
and properly purges all segments before setting state to CLOSED.
This prevents segment allocation during subsequent close() call.

Also enhanced destructor logging to show PCB state and queue
pointers for better debugging of any remaining segment leaks.

Signed-off-by: Tomer Cabouly <[email protected]>
@greptile-apps
Copy link

greptile-apps bot commented Nov 18, 2025

Greptile Summary

  • Fixes TCP segment leak during SYN flood by calling abort_connection() before close() in handle_incoming_handshake_failure()
  • The tcp_abandon() path skips RST for SYN_RCVD connections and purges all segments before setting state to CLOSED, preventing segment allocation during subsequent close()
  • Enhanced destructor logging adds PCB state and queue pointers for better debugging of segment leaks

Confidence Score: 5/5

  • This PR is safe to merge - it correctly fixes a resource leak without introducing new issues
  • The fix properly addresses the segment leak by calling abort_connection() before close(). The tcp_abandon() code path explicitly avoids sending RST for SYN_RCVD state (line 324 in tcp.c) and calls tcp_pcb_purge() to free all segments before setting state to CLOSED. Setting m_parent=nullptr before abort prevents recursive calls. Enhanced logging aids future debugging.
  • No files require special attention

Important Files Changed

Filename Overview
src/core/sock/sockinfo_tcp.cpp Fixes TCP segment leak in SYN-RCVD timeout by calling abort_connection() before close() to prevent FIN segment allocation

Sequence Diagram

sequenceDiagram
    participant Timer as "tcp_slowtmr()"
    participant PCB as "TCP PCB (SYN_RCVD)"
    participant ErrCB as "TCP_EVENT_ERR"
    participant Parent as "Parent Listen Socket"
    participant Child as "Child Connection"
    participant lwIP as "lwIP tcp_abandon()"
    
    Timer->>PCB: "Timeout detected"
    Timer->>ErrCB: "TCP_EVENT_ERR(ERR_TIMEOUT)"
    ErrCB->>Parent: "err_lwip_cb() -> handle_incoming_handshake_failure()"
    Parent->>Parent: "Lock parent, remove from m_syn_received"
    Parent->>Child: "Set m_parent = nullptr"
    Parent->>Child: "abort_connection()"
    Child->>lwIP: "tcp_abort() -> tcp_abandon()"
    lwIP->>lwIP: "Skip RST for SYN_RCVD state"
    lwIP->>lwIP: "tcp_pcb_remove() -> tcp_pcb_purge()"
    lwIP->>lwIP: "Free unsent/unacked segments"
    lwIP->>lwIP: "Set state = CLOSED"
    lwIP->>ErrCB: "TCP_EVENT_ERR(ERR_ABRT)"
    Note over ErrCB: "m_parent is nullptr, no recursive call"
    Parent->>Child: "close()"
    Note over Child: "State is CLOSED, no FIN sent"
    Child->>Child: "No segment allocation"
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants