You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
issue: 4724535 Fix TSO SS aggressive cwnd/ssthresh
Resolve 30-second throughput ramp-up issue for TSO-enabled TCP
connections by implementing proper initial congestion window (cwnd)
and slow start threshold (ssthresh) values.
Problem:
Applications using TSO experienced ~30 seconds of near-zero throughput
before achieving line-rate. Debug analysis revealed that ssthresh was
being unconditionally reset to 10*MSS (14,600 bytes) during SYN-ACK
processing in tcp_in.c line 582, forcing TCP into congestion avoidance
mode immediately. This caused linear cwnd growth instead of exponential
slow start, resulting in extremely slow ramp-up.
Solution:
Created centralized helper function tcp_set_initial_cwnd_ssthresh() that
sets TSO-aware parameters:
For TSO-enabled connections:
- cwnd = TSO_max_payload / 4 (64KB with default 256KB TSO)
- ssthresh = 0x7FFFFFFF (2GB - effectively unlimited)
For non-TSO connections:
- cwnd = RFC 3390 compliant: min(4*MSS, max(2*MSS, 4380 bytes))
- ssthresh = 10 * MSS
Technical Rationale:
1. Very high ssthresh (2GB) follows industry best practices, allowing
slow start to run until network conditions dictate otherwise rather
than artificially limiting growth (Excentis research on optimizing
TCP for gigabit networks).
2. TSO max payload is independent of negotiated MSS (determined by
hardware capabilities), so initial window should also be independent
of MSS for TSO connections.
3. Initial cwnd of 64KB (TSO_max/4) balances aggressive throughput
with conservative buffer management. This exceeds RFC 6928's
recommendation of 10 segments (~15KB) but is appropriate for
XLIO's controlled environment where TSO hardware
handles segmentation and applications target high-throughput
scenarios. Empirically verified to achieve 200 Gbps in <1 second.
Implementation Details:
- Replaced duplicate TSO initialization logic in 6 locations:
* tcp_pcb_init() - initial PCB setup
* tcp_pcb_recycle() - PCB reuse after TIME_WAIT
* tcp_connect() - client-side connection initiation
* tcp_in.c SYN-ACK handler - CRITICAL FIX (line 584)
* lwip_conn_init() - LWIP CC module initialization
* cubic_conn_init() - Cubic CC module initialization
Performance Impact:
Before: 20+ seconds to reach line-rate (200 Gbps)
After: Line-rate achieved in <1 second
Verification: GDB debugging confirmed ssthresh was being overwritten
during SYN-ACK processing. After fix, cwnd=64KB and ssthresh=2GB are
maintained throughout connection establishment, enabling exponential
growth as designed.
References:
- RFC 3390: Increasing TCP's Initial Window
- RFC 5681: TCP Congestion Control
- RFC 6928: Increasing TCP's Initial Window (10 segments standard)
- Excentis: "Optimizing TCP Congestion Avoidance Parameters for
Gigabit Networks" - recommends very high ssthresh (approaching 2^31)
for fast networks
- NASA: "Performance Analysis of TCP with Large Segmentation Offload"
- analysis of TSO impact on congestion control
Signed-off-by: Tomer Cabouly <[email protected]>
0 commit comments