Skip to content

Conversation

@tomerdbz
Copy link
Collaborator

@tomerdbz tomerdbz commented Nov 9, 2025

Description

Resolve 30-second throughput ramp-up issue for TSO-enabled TCP connections by implementing proper initial congestion window (cwnd) and slow start threshold (ssthresh) values.

Problem:
Applications using TSO experienced ~30 seconds of near-zero throughput before achieving line-rate. Debug analysis revealed that ssthresh was being unconditionally reset to 10*MSS (14,600 bytes) during SYN-ACK processing in tcp_in.c line 582, forcing TCP into congestion avoidance mode immediately. This caused linear cwnd growth instead of exponential slow start, resulting in extremely slow ramp-up.

Solution:
Created centralized helper function tcp_set_initial_cwnd_ssthresh() that sets TSO-aware parameters:

For TSO-enabled connections:

  • cwnd = TSO_max_payload / 4 (64KB with default 256KB TSO)
  • ssthresh = 0x7FFFFFFF (2GB - effectively unlimited)

For non-TSO connections:

  • cwnd = RFC 3390 compliant: min(4MSS, max(2MSS, 4380 bytes))
  • ssthresh = 10 * MSS

Technical Rationale:

  1. Very high ssthresh (2GB) follows industry best practices, allowing slow start to run until network conditions dictate otherwise rather than artificially limiting growth (Excentis research on optimizing TCP for gigabit networks).

  2. TSO max payload is independent of negotiated MSS (determined by hardware capabilities), so initial window should also be independent of MSS for TSO connections.

  3. Initial cwnd of 64KB (TSO_max/4) balances aggressive throughput with conservative buffer management. This exceeds RFC 6928's recommendation of 10 segments (~15KB) but is appropriate for XLIO's controlled environment where TSO hardware handles segmentation and applications target high-throughput scenarios. Empirically verified to achieve 200 Gbps in <1 second.

Implementation Details:

  • Replaced duplicate TSO initialization logic in 6 locations:
    • tcp_pcb_init() - initial PCB setup
    • tcp_pcb_recycle() - PCB reuse after TIME_WAIT
    • tcp_connect() - client-side connection initiation
    • tcp_in.c SYN-ACK handler - CRITICAL FIX (line 584)
    • lwip_conn_init() - LWIP CC module initialization
    • cubic_conn_init() - Cubic CC module initialization

Performance Impact:
Before: 20+ seconds to reach line-rate (200 Gbps)
After: Line-rate achieved in <1 second

Verification: GDB debugging confirmed ssthresh was being overwritten during SYN-ACK processing. After fix, cwnd=64KB and ssthresh=2GB are maintained throughout connection establishment, enabling exponential growth as designed.

References:

  • RFC 3390: Increasing TCP's Initial Window
  • RFC 5681: TCP Congestion Control
  • RFC 6928: Increasing TCP's Initial Window (10 segments standard)
  • Excentis: "Optimizing TCP Congestion Avoidance Parameters for Gigabit Networks" - recommends very high ssthresh (approaching 2^31) for fast networks
  • NASA: "Performance Analysis of TCP with Large Segmentation Offload"
    • analysis of TSO impact on congestion control
What

Fix TSO slow start with aggressive cwnd/ssthresh

Why ?

xlio_benchmark to alpha.

How ?

It is optional but for complex PRs please provide information about the design,
architecture, approach, etc.

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR fixes a critical TSO throughput issue where connections took 30+ seconds to reach line-rate, now achieving it in <1 second. The root cause was aggressive ssthresh values being overwritten with 10*MSS during SYN-ACK processing, forcing TCP into congestion avoidance mode prematurely.

Key Changes:

  • Centralized cwnd/ssthresh initialization in new tcp_set_initial_cwnd_ssthresh() helper
  • TSO-enabled connections: cwnd = max_payload_sz/4 (64KB), ssthresh = 2GB (effectively unlimited)
  • Non-TSO connections: RFC 3390 compliant cwnd, ssthresh = 10*MSS
  • Replaced 6 initialization sites: tcp_pcb_init(), tcp_pcb_recycle(), tcp_connect(), SYN-ACK handler, lwip_conn_init(), cubic_conn_init()

Technical Approach:
The fix properly handles the timing where TSO configuration happens after initial PCB setup but before SYN-ACK. The helper function is TSO-aware and uses tso.max_payload_sz (independent of negotiated MSS) for TSO connections, following industry best practices for high-speed networks.

Minor Issue Found:

  • Dead code at tcp_in.c:588 in #else branch (unreachable since TCP_CC_ALGO_MOD=1)

Confidence Score: 4/5

  • This PR is safe to merge with one minor style improvement recommended
  • Score reflects well-tested performance fix with clear rationale, centralized implementation, and proper TSO/non-TSO handling. Minor deduction for dead code in tcp_in.c that could be cleaned up. The core logic is sound and addresses a real production issue with empirical validation (200 Gbps in <1s).
  • src/core/lwip/tcp_in.c - contains unreachable dead code in #else branch at line 588 that should be removed for clarity

Important Files Changed

File Analysis

Filename Score Overview
src/core/lwip/tcp.c 4/5 Adds centralized tcp_set_initial_cwnd_ssthresh() function and replaces hardcoded cwnd/ssthresh initialization in tcp_connect(), tcp_pcb_init(), and tcp_pcb_recycle(). TSO-aware logic properly handles both TSO-enabled and non-TSO cases.
src/core/lwip/tcp.h 5/5 Adds function declaration for tcp_set_initial_cwnd_ssthresh(). Clean header change with no issues.
src/core/lwip/tcp_in.c 3/5 Replaces hardcoded ssthresh = mss * 10 with tcp_set_initial_cwnd_ssthresh() in SYN-ACK handler. Critical fix for TSO throughput. Contains dead code in #else branch that could cause confusion.
src/core/lwip/cc_lwip.c 5/5 Updates lwip_conn_init() to call tcp_set_initial_cwnd_ssthresh() when cwnd is uninitialized. Preserves existing behavior with cleaner code.
src/core/lwip/cc_cubic.c 5/5 Updates cubic_conn_init() to call tcp_set_initial_cwnd_ssthresh() when cwnd is uninitialized. Removes old hardcoded logic, cleaner implementation.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Socket as sockinfo_tcp
    participant PCB as tcp_pcb
    participant TCPIn as tcp_in (SYN-ACK)
    participant Helper as tcp_set_initial_cwnd_ssthresh()
    participant CC as Congestion Control Module

    Note over App,CC: Connection Establishment Flow

    App->>Socket: socket() creation
    Socket->>PCB: tcp_pcb_init()
    PCB->>Helper: Set initial cwnd/ssthresh
    Note over Helper: tso.max_payload_sz = 0<br/>Uses non-TSO values<br/>cwnd = RFC 3390<br/>ssthresh = 10*MSS
    Helper-->>PCB: cwnd set

    App->>Socket: connect()
    Socket->>PCB: tcp_connect()
    PCB->>Helper: Re-initialize cwnd/ssthresh
    Note over Helper: TSO still not configured<br/>Uses non-TSO values
    Helper-->>PCB: cwnd set
    PCB->>TCPIn: Send SYN

    Note over Socket,PCB: TSO Configuration
    Socket->>PCB: prepare_to_send()
    Note over PCB: tso.max_payload_sz = 256KB<br/>(from hardware caps)

    TCPIn->>PCB: Receive SYN-ACK
    Note over TCPIn: MSS negotiation complete
    TCPIn->>Helper: CRITICAL: Re-initialize after MSS
    Note over Helper: NOW TSO is configured!<br/>cwnd = max_payload_sz/4 = 64KB<br/>ssthresh = 0x7FFFFFFF (2GB)
    Helper-->>TCPIn: Aggressive TSO values set
    TCPIn->>CC: cc_conn_init()
    Note over CC: Preserves cwnd if already set<br/>(checks cwnd != 1)
    CC-->>TCPIn: Connection ready
    
    Note over App,CC: Fast Slow Start with TSO
    TCPIn->>PCB: Start data transmission
    Note over PCB: Exponential growth with<br/>cwnd=64KB, ssthresh=2GB<br/>Reaches line-rate in <1s
Loading

Additional Comments (1)

  1. src/core/lwip/tcp_in.c, line 588 (link)

    style: Dead code: this #else branch overwrites the cwnd just set by tcp_set_initial_cwnd_ssthresh() on line 584. While TCP_CC_ALGO_MOD is always defined to 1 (opt.h:114), making this unreachable, the dead code could confuse future maintainers.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Fixes critical 30-second throughput ramp-up issue for TSO-enabled TCP connections by implementing proper initial congestion window and slow start threshold values.

Key Changes:

  • Introduced centralized tcp_set_initial_cwnd_ssthresh() helper function (tcp.c:520-549) that sets TSO-aware parameters: cwnd=64KB and ssthresh=2GB for TSO, or RFC 3390-compliant values for non-TSO
  • Fixed critical bug at tcp_in.c:584 where ssthresh was hardcoded to 10*MSS during SYN-ACK processing, forcing immediate congestion avoidance mode instead of slow start
  • Replaced duplicate initialization logic across 6 locations: tcp_pcb_init(), tcp_pcb_recycle(), tcp_connect(), SYN-ACK handler, lwip_conn_init(), and cubic_conn_init()
  • Added guard logic in CC modules to preserve cwnd/ssthresh values set during connection establishment (only re-initialize if cwnd==1)

Technical Approach:
The fix uses aggressive initial parameters for TSO (cwnd=TSO_max/4, ssthresh=2GB) that are independent of negotiated MSS, allowing exponential slow start growth to quickly discover optimal sending rate. For non-TSO, maintains RFC 3390 compliance with MSS-based initialization.

Performance Impact:
Reduces time-to-line-rate from 20+ seconds to <1 second for 200 Gbps TSO connections.

Confidence Score: 4/5

  • Safe to merge with minor considerations for aggressive TSO parameters in diverse network conditions
  • The implementation is sound with proper centralization and correct ordering (tcp_set_initial_cwnd_ssthresh before cc_conn_init). Score reduced from 5 to 4 due to: (1) Very aggressive TSO parameters (64KB initial cwnd, 2GB ssthresh) deviate significantly from RFC 6928 (10 segments ~15KB) and may cause issues in networks with shallow buffers or high latency, though appropriate for XLIO's controlled high-throughput environment; (2) The fix assumes TSO max_payload_sz is always properly initialized before tcp_set_initial_cwnd_ssthresh is called, which appears valid based on code flow but isn't explicitly validated
  • tcp_in.c requires attention to verify the SYN-ACK handler change works correctly across all connection scenarios (client-side, server-side, connection reuse)

Important Files Changed

File Analysis

Filename Score Overview
src/core/lwip/tcp.h 5/5 Added declaration for tcp_set_initial_cwnd_ssthresh() helper function at tcp.h:408
src/core/lwip/tcp.c 5/5 Implemented centralized tcp_set_initial_cwnd_ssthresh() function with TSO-aware initialization (tcp.c:520-549), replaced duplicate initialization in tcp_connect(), tcp_pcb_init(), and tcp_pcb_recycle()
src/core/lwip/tcp_in.c 4/5 Critical fix at tcp_in.c:584 - replaced hardcoded ssthresh = mss * 10 with tcp_set_initial_cwnd_ssthresh() call in SYN-ACK handler, maintaining proper ordering before cc_conn_init() to prevent CC module override
src/core/lwip/cc_lwip.c 5/5 Updated lwip_conn_init() to call tcp_set_initial_cwnd_ssthresh() only when cwnd == 1, preserving values set during connection establishment
src/core/lwip/cc_cubic.c 5/5 Updated cubic_conn_init() to call tcp_set_initial_cwnd_ssthresh() only when cwnd == 1, preserving values set during connection establishment and maintaining max_cwnd tracking

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant TCP as tcp_connect()
    participant SYN as SYN-ACK Handler<br/>(tcp_in.c:584)
    participant Helper as tcp_set_initial_cwnd_ssthresh()
    participant CC as CC Module<br/>(lwip/cubic)

    Note over App,CC: Client-Side Connection Establishment

    App->>TCP: tcp_connect()
    TCP->>Helper: tcp_set_initial_cwnd_ssthresh()
    Helper->>Helper: Check tcp_tso(pcb)
    alt TSO Enabled
        Helper->>TCP: cwnd = max_payload_sz/4 (64KB)<br/>ssthresh = 0x7FFFFFFF (2GB)
    else Non-TSO
        Helper->>TCP: cwnd = RFC 3390 (4*MSS)<br/>ssthresh = 10*MSS
    end
    TCP->>TCP: Send SYN packet
    
    Note over SYN: Receive SYN-ACK from server

    SYN->>SYN: tcp_parseopt() - negotiate MSS
    SYN->>SYN: Update pcb->mss with negotiated value
    
    Note over SYN,Helper: CRITICAL FIX: Re-initialize after MSS negotiation
    
    SYN->>Helper: tcp_set_initial_cwnd_ssthresh()
    Helper->>Helper: Check tcp_tso(pcb)
    alt TSO Enabled
        Note right of Helper: TSO parameters are independent<br/>of negotiated MSS
        Helper->>SYN: cwnd = max_payload_sz/4 (64KB)<br/>ssthresh = 0x7FFFFFFF (2GB)
    else Non-TSO
        Note right of Helper: Scale with negotiated MSS
        Helper->>SYN: cwnd = min(4*MSS, max(2*MSS, 4380))<br/>ssthresh = 10*MSS
    end
    
    SYN->>CC: cc_conn_init()
    CC->>CC: Check if cwnd == 1
    alt cwnd != 1 (already initialized)
        Note right of CC: Preserve cwnd/ssthresh<br/>set by Helper function
        CC->>CC: Skip re-initialization
    else cwnd == 1 (uninitialized)
        CC->>Helper: tcp_set_initial_cwnd_ssthresh()
    end
    CC->>CC: Initialize CC-specific state<br/>(e.g., cubic max_cwnd = cwnd)
    
    SYN->>App: Connection ESTABLISHED
    
    Note over App,CC: TCP now in slow start with proper cwnd/ssthresh<br/>TSO: Exponential growth from 64KB<br/>Non-TSO: RFC-compliant growth from 4*MSS
Loading

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link

greptile-apps bot commented Nov 10, 2025

Greptile Summary

  • Centralizes TSO-aware congestion control initialization into two new helper functions (tcp_set_initial_cwnd_ssthresh and tcp_reset_cwnd_on_congestion) that set aggressive initial values (cwnd=64KB, ssthresh=2GB) for TSO connections
  • Fixes critical bug where ssthresh was unconditionally reset to 10*MSS during SYN-ACK processing (tcp_in.c:584), which was forcing TCP into congestion avoidance mode immediately and causing 30-second ramp-up delays
  • Updates both LWIP and CUBIC congestion control algorithms to use proper exponential slow start growth (cwnd += acked instead of cwnd += mss), critical for TSO where one ACK can acknowledge many segments

Confidence Score: 4/5

  • This PR is safe to merge with one minor fix needed for overflow protection in CUBIC
  • The core fix addresses a critical performance issue with solid technical rationale. Implementation correctly centralizes congestion control logic and applies TSO-aware parameters. However, CUBIC slow start is missing overflow check that LWIP has (line 132), which could theoretically cause cwnd overflow with very large acknowledgments. This is unlikely in practice but should be fixed for completeness. The aggressive TSO parameters (64KB initial cwnd, 2GB ssthresh) deviate from RFC 5681 but are well-justified for high-throughput TSO environments.
  • src/core/lwip/cc_cubic.c requires overflow protection fix at line 132

Important Files Changed

Filename Overview
src/core/lwip/tcp.c Adds two new helper functions for TSO-aware congestion control initialization and recovery - implementation is sound with proper overflow handling for non-TSO case
src/core/lwip/tcp_in.c Replaces hardcoded ssthresh initialization with centralized helper function call in SYN-ACK handler - this was the critical fix mentioned in PR description
src/core/lwip/cc_cubic.c Updates CUBIC slow start to use proper exponential growth (cwnd += acked) but missing overflow check that LWIP implementation has, also refactors congestion recovery

Sequence Diagram

sequenceDiagram
    participant User
    participant App as "Application (TCP Client)"
    participant XLIO as "XLIO TCP Stack"
    participant Network

    User->>App: Initiate connection
    App->>XLIO: tcp_connect()
    XLIO->>XLIO: tcp_set_initial_cwnd_ssthresh()<br/>(TSO: cwnd=64KB, ssthresh=2GB)
    XLIO->>Network: Send SYN
    Network->>XLIO: Receive SYN-ACK (MSS negotiated)
    XLIO->>XLIO: tcp_set_initial_cwnd_ssthresh()<br/>(Re-initialize after MSS)
    XLIO->>XLIO: cc_conn_init()<br/>(CC algorithm initialization)
    XLIO->>Network: Send ACK (Connection established)
    
    loop Data Transfer (Slow Start)
        App->>XLIO: Send data
        XLIO->>Network: Send segments (up to cwnd)
        Network->>XLIO: Receive ACK
        XLIO->>XLIO: cc_ack_received()<br/>(cwnd += acked, exponential growth)
        Note over XLIO: cwnd grows: 64KB→128KB→256KB...
    end
    
    alt Congestion Event (RTO)
        XLIO->>XLIO: tcp_reset_cwnd_on_congestion(is_rto=true)<br/>(TSO: cwnd=26KB, ssthresh=64KB)
        XLIO->>Network: Retransmit lost segment
        loop Recovery
            Network->>XLIO: Receive ACK
            XLIO->>XLIO: cc_ack_received()<br/>(Slow start from 26KB)
        end
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Resolve 30-second throughput ramp-up issue for TSO-enabled
TCP connections by fixing multiple congestion control bugs
and implementing TSO-aware initial window and RTO recovery
parameters.

Problem:
Applications using TSO experienced ~30 seconds of near-zero
throughput before achieving line-rate. Investigation revealed
multiple interrelated bugs in XLIO's congestion control
implementation:

1. ssthresh Reset Bug: ssthresh was unconditionally reset to
   10*MSS (14,600 bytes) during SYN-ACK processing in
   tcp_in.c, forcing TCP into congestion avoidance mode
   immediately instead of slow start.

2. Slow Start Algorithm Bug: Both LWIP and CUBIC modules
   implemented incorrect linear cwnd growth (cwnd += mss)
   instead of RFC 5681's exponential growth (cwnd += acked).
   This is particularly devastating for TSO where one ACK can
   acknowledge 64KB (44 segments), but cwnd only grew by
   1 MSS (1460 bytes).

3. RTO Recovery Bug: On retransmission timeout, cwnd was
   reset to 1 MSS per RFC 5681, which is too conservative for
   modern TSO hardware that sends 256KB super-packets, causing
   20+ second recovery times.

Root Cause Analysis:
XLIO's CUBIC implementation is based on FreeBSD CUBIC
(2007-2010), which contained the slow start bug. Modern CUBIC
(Linux kernel 2024, RFC 9438) uses standard TCP slow start
with exponential growth. Verification against Linux kernel
source (net/ipv4/tcp_cubic.c) confirmed that XLIO's behavior
was incorrect and outdated.

Solution:

1. Centralized Initial Window Management:
   Created tcp_set_initial_cwnd_ssthresh() to consistently
   set TSO-aware parameters across all connection
   initialization paths:

   For TSO-enabled connections:
     - cwnd = TSO_max_payload / 4 (64KB with default 256KB)
     - ssthresh = 0x7FFFFFFF (2GB - effectively unlimited)

   For non-TSO connections:
     - cwnd = RFC 3390: min(4*MSS, max(2*MSS, 4380 bytes))
     - ssthresh = 10 * MSS

2. Fixed Slow Start Algorithm:
   Changed cwnd increment from mss to acked in both CC
   modules:

   Before (WRONG):
     pcb->cwnd += pcb->mss;  // Linear growth

   After (CORRECT):
     pcb->cwnd += pcb->acked;  // Exponential per RFC 5681

   This matches modern implementations including Linux CUBIC
   and is critical for TSO where one ACK can acknowledge many
   segments.

3. TSO-Aware RTO Recovery:
   Created tcp_reset_cwnd_on_congestion() for TSO-aware RTO
   handling:

   For TSO connections (deviates from RFC 5681):
     - cwnd = 26KB (10% of TSO max) instead of 1 MSS
     - ssthresh = 64KB (25% of TSO max) for slow start

   For non-TSO connections (RFC 5681 compliant):
     - cwnd = 1 MSS
     - ssthresh = max(FlightSize/2, 2*MSS)

   Rationale: RFC 5681 predates aggressive TSO. Even Linux
   CUBIC (which follows RFC) suffers from slow RTO recovery
   with large TSO. Our TSO-aware approach balances fast
   recovery with congestion safety.

Implementation Details:
- Consolidated initialization logic in 6 locations:
  * tcp_pcb_init() - initial PCB setup
  * tcp_pcb_recycle() - PCB reuse after TIME_WAIT
  * tcp_connect() - client-side connection initiation
  * tcp_in.c SYN-ACK handler
  * lwip_conn_init() - LWIP CC module initialization
  * cubic_conn_init() - CUBIC CC module initialization

- Fixed slow start in 2 locations:
  * cc_lwip.c:lwip_ack_received() - changed mss to acked
  * cc_cubic.c:cubic_ack_received() - changed mss to acked

- Centralized RTO recovery in 2 locations:
  * cc_lwip.c:lwip_cong_signal() - calls helper
  * cc_cubic.c:cubic_cong_signal() - calls helper

Performance Impact:
Before: 20+ seconds to reach line-rate (200-300 Gbps)
After: Line-rate achieved in <1 second

Technical Rationale:

1. Very High ssthresh (2GB): Follows industry best practices
   for high-BDP networks, allowing slow start to discover
   optimal window rather than artificially limiting growth.
   Standard TCP behavior for modern data center deployments.

2. TSO-Independence: TSO max payload is determined by
   hardware capabilities, not negotiated MSS. Initial window
   should similarly be independent of MSS for TSO
   connections.

3. Initial cwnd (64KB): Balances aggressive throughput with
   conservative buffer management. Exceeds RFC 6928's 10
   segments (~15KB) but appropriate for XLIO's controlled
   environment where TSO hardware handles segmentation and
   applications target high-throughput scenarios.

4. RTO Recovery (26KB): More conservative than initial window
   (25% vs 10% of TSO max) to balance fast recovery with
   safety. While this deviates from RFC 5681 (1 MSS), it
   recognizes the reality that 1460 bytes is artificially
   small when hardware sends 256KB super-packets.

RFC Compliance Notes:
- Initial window and slow start: Compliant with RFC 5681/3390
  spirit, optimized for TSO hardware.
- RTO recovery: Intentionally deviates from RFC 5681 for TSO
  to address modern hardware reality. Non-TSO connections
  remain RFC-compliant.

Comparison to Modern Implementations:
- Linux CUBIC (2024): Uses standard slow start
  (cwnd += acked)
- Linux CUBIC RTO: Follows RFC 5681 (cwnd = 1 MSS), which
  causes same slow recovery issue with aggressive TSO that we
  fix here
- FreeBSD CUBIC (2007-2010): Had slow start bug
  (cwnd += mss) that XLIO inherited and we now fix

References:
- RFC 3390: Increasing TCP's Initial Window
- RFC 5681: TCP Congestion Control (slow start, RTO behavior)
- RFC 6928: Increasing TCP's Initial Window (10 segments)
- RFC 9438: CUBIC for Fast and Long-Distance Networks (2023,
  obsoletes RFC 8312)
- Linux kernel: net/ipv4/tcp_cubic.c (verified modern CUBIC
  behavior)
  https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_cubic.c
- Excentis: "Optimizing TCP Congestion Avoidance Parameters
  for Gigabit Networks" - recommends very high ssthresh for
  fast networks

Verification:
- GDB debugging confirmed ssthresh overwrite during SYN-ACK
  processing
- Post-fix: cwnd=64KB and ssthresh=2GB maintained through
  connection setup
- Enhanced debug logs confirmed exponential cwnd growth during
  slow start
- Throughput testing: 200-300 Gbps achieved in <1s
  (previously 20+s)

Signed-off-by: Tomer Cabouly <[email protected]>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

* Critical for TSO where one ACK can acknowledge many segments (e.g., 64KB = 44
* segments).
*/
pcb->cwnd += pcb->acked;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: missing overflow check - lwip implementation has this protection

Suggested change
pcb->cwnd += pcb->acked;
if ((u32_t)(pcb->cwnd + pcb->acked) > pcb->cwnd) {
pcb->cwnd += pcb->acked;
}

if (tcp_tso(pcb)) {
/* TSO-aware recovery (deviates from RFC 5681 and modern CUBIC):
*
* RFC 5681 & Modern CUBIC (Linux 2024): cwnd = 1 MSS (1460 bytes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not always MSS=1460, e.g. Jumpo frames

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants