-
Notifications
You must be signed in to change notification settings - Fork 29
issue: 4050516 Fix TCP segment leak #334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: vNext
Are you sure you want to change the base?
Conversation
661ca2d to
bcc47e5
Compare
|
/review |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
to trigger the bug on any environment:
|
src/core/lwip/tcp.c
Outdated
| set_tcp_state(pcb, CLOSED); | ||
| return ERR_OK; | ||
| } else { | ||
| return tcp_close_shutdown(pcb, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inside tcp_close_shutdown we can see this code(below), we can fix the logic in "case SYN_RCVD" instead of adding this if condition.
switch (get_tcp_state(pcb)) {
...
case SYN_RCVD:
err = tcp_send_fin(pcb);
if (err == ERR_OK) {
set_tcp_state(pcb, FIN_WAIT_1);
}
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. thanks :) note that I needed to add an if before the switch-case as I don't want to address rst_on_unacked_data.
bcc47e5 to
1772089
Compare
|
@pasis, please review |
src/core/lwip/tcp.c
Outdated
| if (get_tcp_state(pcb) == SYN_RCVD) { | ||
| // according to the RFC, in case we get a SYN and no more data | ||
| // we should just close w/o FIN or RST | ||
| tcp_pcb_purge(pcb); | ||
| set_tcp_state(pcb, CLOSED); | ||
| return ERR_OK; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC states that FIN needs to be sent in this case. So, looks like original code makes sense.
3.10.4. CLOSE Call
...
SYN-RECEIVED STATE
If no SENDs have been issued and there is no pending data to send, then form a FIN segment and send it, and enter FIN-WAIT-1 state; otherwise, queue for processing after entering ESTABLISHED state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pasis
Actually, I instrumented the code, adding this to tcp.c line 760:
printf("DEBUG: tcp_slowtmr PATH1 BEFORE purge: state=%d, unsent=%p unacked=%p\n",
get_tcp_state(pcb), pcb->unsent, pcb->unacked);
it validated we are aborting due to being "too long in SYN-RCVD" - we are not in CLOSE flow (3.10.4).
I've tried to look online what to do on timeout on SYN-RCVD, but it's not well-defined.
the linux kernel though simply abandons w/o sending RST, w/o sending FIN, as we do here - so I believe this solution is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see linux kernel answer to a dev opening a bug on this:
https://bugzilla.redhat.com/show_bug.cgi?id=150611
"This is not a bug, there is no standard that specifies that we should
elicit a reset when we've only seen a SYN from the other end.
It's quite clear why too, because if the host won't respond to our
SYN+ACK response packets, there is no reason to belive it will receive
and process correctly any RST frame we send out as well.
If your firewall is relying on such behavior, it really is the problem
not the Linux TCP stack."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this instrumentation does show the fix is not quite right - look at this updated fix please :)
|
@pasis , please review. |
4855d58 to
11e9241
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR fixes a TCP connection closure bug in the SYN_RCVD state by calling abort_connection() before close() in the handle_incoming_handshake_failure() path. The issue manifested as a race condition in Nginx CPS scenarios: when prepare_to_close() was invoked on a socket in SYN_RCVD state before the slow-timer could catch it, the normal tcp_close() path would incorrectly send a FIN packet and leak a TCP segment. Per RFC 9293 section 3.10.4, a SYN_RCVD socket with no pending data should transition directly to CLOSED. The fix leverages the existing abort_connection() method, which calls tcp_abandon() - a function that specifically does NOT send RST for SYN_RCVD state (matching Linux kernel behavior). This ensures no control packets are sent and no segments are allocated during handshake-failure cleanup. Enhanced destructor logging was also added to aid in debugging segment leaks by displaying PCB state and segment pointers.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| src/core/sock/sockinfo_tcp.cpp | 5/5 | Added abort_connection() call before close() in handle_incoming_handshake_failure() to prevent FIN transmission and segment leaks in SYN_RCVD state; enhanced destructor logging with PCB state. |
Confidence score: 5/5
- This PR is safe to merge with minimal risk.
- Score reflects a well-targeted fix to a documented race condition with no changes to data-path logic, clear alignment with RFC behavior for SYN_RCVD state, and improved observability via enhanced logging.
- No files require special attention.
Sequence Diagram
sequenceDiagram
participant User as "User Application"
participant Socket as "sockinfo_tcp"
participant PCB as "TCP PCB"
participant Network as "Network Layer"
User->>Socket: close() connection
Socket->>Socket: prepare_to_close(false)
Socket->>PCB: get_tcp_state(&m_pcb)
PCB-->>Socket: SYN_RCVD
alt State is SYN_RCVD and not process_shutdown
Socket->>PCB: set_tcp_state(&m_pcb, CLOSED)
Note over Socket,Network: Skip FIN/RST transmission
Socket->>Socket: unlock_tcp_con()
Socket-->>User: return (closable)
else Other states
Socket->>Socket: continue normal shutdown
Socket->>PCB: tcp_close(&m_pcb)
PCB->>Network: Send FIN/RST packets
Socket->>Socket: unlock_tcp_con()
Socket-->>User: return result
end
1 file reviewed, no comments
|
@BasharRadya can you review? |
Resolve race condition where TCP segments leak during SYN-RCVD timeout handling, causing "still N tcp segs in use" warnings. Root cause: When tcp_slowtmr() times out a connection stuck in SYN-RCVD state, it calls TCP_EVENT_ERR() which triggers handle_incoming_handshake_failure(). This function calls close() on the child connection, which attempts to send FIN and allocates new TCP segments. These segments were never cleaned up. Fix: Call abort_connection() before close() in handle_incoming_handshake_failure(). The tcp_abort() -> tcp_abandon() path explicitly does NOT send RST for SYN-RCVD connections (matching Linux kernel behavior), and properly purges all segments before setting state to CLOSED. This prevents segment allocation during subsequent close() call. Also enhanced destructor logging to show PCB state and queue pointers for better debugging of any remaining segment leaks. Signed-off-by: Tomer Cabouly <[email protected]>
11e9241 to
6415bcc
Compare
Greptile Summary
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Timer as "tcp_slowtmr()"
participant PCB as "TCP PCB (SYN_RCVD)"
participant ErrCB as "TCP_EVENT_ERR"
participant Parent as "Parent Listen Socket"
participant Child as "Child Connection"
participant lwIP as "lwIP tcp_abandon()"
Timer->>PCB: "Timeout detected"
Timer->>ErrCB: "TCP_EVENT_ERR(ERR_TIMEOUT)"
ErrCB->>Parent: "err_lwip_cb() -> handle_incoming_handshake_failure()"
Parent->>Parent: "Lock parent, remove from m_syn_received"
Parent->>Child: "Set m_parent = nullptr"
Parent->>Child: "abort_connection()"
Child->>lwIP: "tcp_abort() -> tcp_abandon()"
lwIP->>lwIP: "Skip RST for SYN_RCVD state"
lwIP->>lwIP: "tcp_pcb_remove() -> tcp_pcb_purge()"
lwIP->>lwIP: "Free unsent/unacked segments"
lwIP->>lwIP: "Set state = CLOSED"
lwIP->>ErrCB: "TCP_EVENT_ERR(ERR_ABRT)"
Note over ErrCB: "m_parent is nullptr, no recursive call"
Parent->>Child: "close()"
Note over Child: "State is CLOSED, no FIN sent"
Child->>Child: "No segment allocation"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
Description
Resolve race condition where TCP segments leak during SYN-RCVD
timeout handling, causing "still N tcp segs in use" warnings.
Root cause: When tcp_slowtmr() times out a connection stuck in
SYN-RCVD state, it calls TCP_EVENT_ERR() which triggers
handle_incoming_handshake_failure(). This function calls close()
on the child connection, which attempts to send FIN and
allocates new TCP segments. These segments were never cleaned up.
Fix: Call abort_connection() before close() in
handle_incoming_handshake_failure(). The tcp_abort() ->
tcp_abandon() path explicitly does NOT send RST for SYN-RCVD
connections (matching Linux kernel behavior),
and properly purges all segments before setting state to CLOSED.
This prevents segment allocation during subsequent close() call.
Also enhanced destructor logging to show PCB state and queue
pointers for better debugging of any remaining segment leaks.
What
Fix TCP segment leak in SYN flood.
Why
Solves 4050516.
How
allocated segments that were never freed
clean up the PCB and set state to CLOSED
sending RST for SYN-RCVD connections (matching Linux kernel
behavior) and properly purges all queued segments
preventing any attempt to send FIN/RST or allocate segments
Change type
What kind of change does this PR introduce?
Check list