Skip to content

Conversation

@tomerdbz
Copy link
Collaborator

Description

Change initial TCP retransmission timeout from 3 seconds to 1 second as recommended by RFC 6298 Section 2.

RFC 6298 states: "Until a round-trip time (RTT) measurement has been made... the sender SHOULD set RTO <- 1 second"

This applies to all new TCP connections until an RTT measurement is made. Once the connection is established and ACKs received, TCP calculates actual RTT and updates RTO accordingly using the Van Jacobson algorithm.

Modified functions:

  • tcp_pcb_init(): Set rto/sv to 1000ms (was 3000ms)
  • tcp_pcb_recycle(): Set rto/sv to 1000ms (was 3000ms)

This fixes incorrect SYN retransmission timing where the first retry occurred after ~3 seconds instead of ~1 second, causing TCP_USER_TIMEOUT tests to fail and slower connection failure detection.

Benefits:

  • TCP/IP standards compliance (RFC 6298)
  • Faster connection establishment failure detection
  • Matches Linux kernel TCP_TIMEOUT_INIT behavior
What

Set initial RTO to 1 second per RFC 6298

Why ?

RFC Compliance.

How ?

It is optional but for complex PRs please provide information about the design,
architecture, approach, etc.

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR modifies the initial TCP Retransmission Timeout (RTO) in XLIO's embedded lwIP TCP/IP stack from 3 seconds to 1 second, bringing the implementation into compliance with RFC 6298 Section 2. The change affects two functions in src/core/lwip/tcp.c: tcp_pcb_init() (which initializes new protocol control blocks) and tcp_pcb_recycle() (which resets PCBs for connection reuse). Both the pcb->rto and pcb->sv (smoothed variance) fields are updated from 3000 / slow_tmr_interval to 1000 / slow_tmr_interval. This initial RTO applies only during the connection establishment phase before the first RTT measurement; once ACKs are received, the TCP stack dynamically calculates RTO using the Van Jacobson algorithm. The XLIO library intercepts socket calls and implements its own TCP/IP stack for RDMA-accelerated networking, so this change ensures its TCP behavior matches both RFC standards and the Linux kernel's TCP_TIMEOUT_INIT constant, improving connection failure detection speed during SYN retransmissions.

Important Files Changed

Filename Score Overview
src/core/lwip/tcp.c 5/5 Updated initial RTO and smoothed variance from 3000ms to1000ms in tcp_pcb_init() and tcp_pcb_recycle() to comply with RFC 6298

Confidence score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is localized to two initialization functions, updates only well-understood timing constants, is backed by RFC 6298 standards, and the new 1-second value matches industry-standard implementations including the Linux kernel; no logic changes or side effects are introduced
  • No files require special attention

Sequence Diagram

sequenceDiagram
    participant User
    participant Application
    participant TCP as TCP Stack (XLIO)
    participant Network
    participant RemoteHost as Remote Host

    User->>Application: Connect request
    Application->>TCP: tcp_connect()
    TCP->>TCP: tcp_pcb_init()<br/>Set initial RTO=1000ms<br/>(RFC 6298 compliant)
    TCP->>TCP: tcp_enqueue_flags(TCP_SYN)
    TCP->>TCP: set_tcp_state(SYN_SENT)
    TCP->>Network: Send SYN packet
    TCP->>TCP: Start retransmission timer

    alt Connection succeeds
        Network->>RemoteHost: SYN
        RemoteHost->>Network: SYN-ACK
        Network->>TCP: Receive SYN-ACK
        TCP->>TCP: Calculate actual RTT<br/>Update RTO dynamically
        TCP->>Network: Send ACK
        TCP->>TCP: set_tcp_state(ESTABLISHED)
        TCP->>Application: connected callback
        Application->>User: Connection established
    else No response - first retry
        Note over TCP: After ~1 second (RTO)
        TCP->>TCP: tcp_slowtmr()<br/>Retransmission timeout
        TCP->>TCP: tcp_rexmit_rto()
        TCP->>Network: Retransmit SYN
    else Connection fails after max retries
        Note over TCP: After TCP_SYNMAXRTX attempts
        TCP->>TCP: tcp_slowtmr()<br/>nrtx == TCP_SYNMAXRTX
        TCP->>TCP: tcp_pcb_remove()
        TCP->>Application: errf(ERR_TIMEOUT)
        Application->>User: Connection failed (faster detection)
    end
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@galnoam galnoam requested review from pasis and removed request for AlexanderGrissik November 10, 2025 11:39
@galnoam
Copy link
Collaborator

galnoam commented Nov 10, 2025

@pasis can you check if we want this fix? Sync with @tomerdbz please.

Comment on lines 930 to 931
/* Set initial RTO to 1 second as per RFC 6298.
* This value is used until an RTT measurement is made. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it 1 line comment for better readability of the code block. If needed, shorten the 2nd sentence or remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

pcb->rto = 3000 / slow_tmr_interval;
/* Set initial RTO to 1 second as per RFC 6298.
* This value is used until an RTT measurement is made. */
pcb->rto = 1000 / slow_tmr_interval;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please handle situation when the slow_tmr_interval is too long. For example:

pcb->rto = max(1000 / slow_tmr_interval, 3);

There were scenarios where we configured TCP timer to 1 second and pcb->rto=1 can trigger the event immediately due to timer tick granularity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment, wdyt about the fix?

pcb->rto = 3000 / slow_tmr_interval;
/* Set initial RTO to 1 second as per RFC 6298.
* This value is used until an RTT measurement is made. */
pcb->rto = 1000 / slow_tmr_interval;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] Do we want to make the initial RTO a constant and move it to src/core/lwip/opt.h? I see that we have the same value in 4 places across the code. Personally I don't have a preference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@tomerdbz tomerdbz force-pushed the 4043157_initial_rto branch from 07d6699 to 0bdc812 Compare November 10, 2025 17:56
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR updates the initial TCP Retransmission Timeout (RTO) from 3 seconds to 1 second to comply with RFC 6298 Section 2, which recommends RTO <- 1 second for new connections before RTT measurement.

Key Changes:

  • Introduced get_initial_rto() helper function that calculates 1000ms in timer ticks using ceiling division: (1000 + slow_tmr_interval - 1) / slow_tmr_interval
  • Updated tcp_pcb_init() and tcp_pcb_recycle() to use the new helper for both rto and sv (smoothed variance) fields
  • Added 500ms maximum constraint on tcp_timer_resolution_msec per RFC 1122 Section 4.2.3.2 (delayed ACK timer limit), with validation in two locations
  • Updated configuration schema and documentation to reflect the 500ms timer limit

Benefits:

  • RFC 6298 compliance for initial RTO value
  • Faster detection of connection establishment failures (1s vs 3s for first SYN retransmission)
  • Consistent with Linux kernel TCP_TIMEOUT_INIT behavior
  • Fixes timing issues in TCP_USER_TIMEOUT tests

The implementation correctly converts milliseconds to timer ticks and maintains the Van Jacobson RTT algorithm for ongoing connections once RTT measurements are available.

Confidence Score: 5/5

  • This PR is safe to merge - it implements a well-defined RFC standard with correct calculations and comprehensive validation
  • The implementation correctly follows RFC 6298 by setting initial RTO to 1 second. The ceiling division in get_initial_rto() properly converts milliseconds to timer ticks. The change only affects initial connection timing before RTT measurement, with the existing Van Jacobson algorithm taking over once ACKs are received. Additional timer validation (500ms max) ensures proper granularity per RFC 1122. No logic errors or edge cases detected.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
src/core/lwip/tcp.c 5/5 Core change: Added get_initial_rto() helper and updated tcp_pcb_init() and tcp_pcb_recycle() to set RTO/sv to 1 second using proper timer tick calculation
src/core/util/sys_vars.cpp 5/5 Added validation to clamp tcp_timer_resolution_msec to 500ms maximum per RFC 1122, ensuring proper timer granularity for RTO calculations
src/core/config/descriptor_providers/xlio_config_schema.json 5/5 Updated schema to add 500ms maximum constraint for TCP timer resolution with RFC 1122 reference in description
README 5/5 Documentation update: Added RFC 1122 reference explaining 500ms maximum timer resolution requirement

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant TCP as TCP Stack (libxlio)
    participant Net as Network
    
    Note over TCP: New Connection Initialization
    App->>TCP: tcp_connect() or listen/accept
    TCP->>TCP: tcp_pcb_init() or tcp_pcb_recycle()
    Note over TCP: rto = get_initial_rto()<br/>= (1000 + slow_tmr_interval - 1) / slow_tmr_interval<br/>sv = get_initial_rto()<br/>sa = 0
    
    TCP->>Net: Send SYN
    Note over TCP: Start retransmission timer<br/>Initial RTO = 1 second (was 3 seconds)
    
    alt ACK received within RTO
        Net->>TCP: SYN-ACK or ACK
        TCP->>TCP: Measure RTT (tcp_ticks - pcb->rttest)
        Note over TCP: Van Jacobson Algorithm:<br/>m = rtt - (sa >> 3)<br/>sa += m<br/>sv += |m| - (sv >> 2)<br/>rto = (sa >> 3) + sv
        TCP->>App: Connection established
    else No ACK within 1 second
        Note over TCP: rtime >= rto (1 second)
        TCP->>TCP: Double RTO with backoff<br/>rto = ((sa >> 3) + sv) << tcp_backoff[nrtx]
        TCP->>Net: Retransmit SYN
        Note over TCP: Faster failure detection<br/>compared to 3 second initial RTO
    end
Loading

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Change initial TCP retransmission timeout from 3 seconds to
1 second as recommended by RFC 6298 Section 2, with additional
safeguards for timer granularity and RFC 1122 compliance.

RFC 6298 states: "Until a round-trip time (RTT) measurement
has been made... the sender SHOULD set RTO <- 1 second"

This applies to all new TCP connections until an RTT
measurement is made.

Changes:

1. TCP RTO Calculation (src/core/lwip/tcp.c):
   - Added get_initial_rto() helper function
   - Uses round-up division
   - Prevents division trunc that could cause premature timeouts
   - Modified tcp_pcb_init() and tcp_pcb_recycle() to use helper

2. Timer Resolution Limits (src/core/util/sys_vars.cpp):
   - Added RFC 1122 validation for tcp_timer_resolution_msec
   - Enforces maximum of 500ms per RFC 1122 Section 4.2.3.2
   - Logs warning and clamps value if exceeded
   - Applied to both environment variable and config registry paths

3. Configuration Schema (xlio_config_schema.json):
   - Added "maximum": 500 constraint to timer_msec
   - Updated description to reference RFC 1122 requirement
   - Prevents invalid configurations at schema validation level

4. Documentation (README):
   - Added RFC 1122 reference to timer_msec documentation

Rationale:
Previous implementation used simple division (1000 / slow_tmr_interval)
which could result in:
- 0 ticks when interval > 1000ms (immediate timeout)
- 1 tick when interval = 1000ms (fires on next tick)
- Insufficient granularity with large timer intervals

The new implementation provides defense-in-depth:
- Schema validation prevents misconfiguration
- Runtime validation enforces RFC 1122 limits (delayed ACK ≤ 500ms)
- Round-up division ensures minimum 1 tick without truncation

This fixes incorrect SYN retransmission timing where the
first retry could occur prematurely or immediately, causing
TCP_USER_TIMEOUT tests to fail and connection establishment
issues.

Benefits:
- TCP/IP standards compliance (RFC 6298 and RFC 1122)
- Robust handling of timer granularity edge cases
- Faster connection establishment failure detection
- Prevents problematic timer configurations
- Matches Linux kernel TCP_TIMEOUT_INIT behavior

Signed-off-by: Tomer Cabouly <[email protected]>
@tomerdbz tomerdbz force-pushed the 4043157_initial_rto branch from 0bdc812 to 7cee7e9 Compare November 18, 2025 18:43
@greptile-apps
Copy link

greptile-apps bot commented Nov 18, 2025

Greptile Summary

  • Initial TCP retransmission timeout changed from 3 seconds to 1 second per RFC 6298 recommendation
  • Added 500ms maximum limit on TCP timer resolution per RFC 1122 with validation and clamping

Confidence Score: 5/5

  • This PR is safe to merge with no identified risks
  • Changes correctly implement RFC 6298 standards for initial RTO using proper ceiling division to ensure at least 1 second timeout. The helper function get_initial_rto() correctly converts milliseconds to timer ticks. Additional timer resolution validation prevents misconfigurations. All changes are well-commented with RFC references and improve TCP connection failure detection timing.
  • No files require special attention

Important Files Changed

Filename Overview
src/core/lwip/tcp.c Changed initial RTO from 3000ms to 1000ms per RFC 6298 using ceiling division helper function
src/core/util/sys_vars.cpp Added validation to clamp TCP timer resolution to 500ms maximum per RFC 1122

Sequence Diagram

sequenceDiagram
    participant App as "Application"
    participant TCP as "TCP Layer"
    participant Timer as "Timer System"
    participant Net as "Network"

    Note over TCP: "Initial State: New Connection"
    
    App->>TCP: "tcp_pcb_init() or tcp_pcb_recycle()"
    TCP->>TCP: "get_initial_rto()"
    Note over TCP: "Calculate: (1000 + slow_tmr_interval - 1) / slow_tmr_interval"
    TCP->>TCP: "Set rto = 5 ticks (1000ms with default 200ms timer)"
    TCP->>TCP: "Set sv = 5 ticks"
    
    App->>TCP: "connect()"
    TCP->>Net: "Send SYN"
    TCP->>Timer: "Start retransmission timer (rtime=0)"
    
    Timer->>TCP: "tcp_slowtmr() every 200ms"
    TCP->>TCP: "Increment rtime"
    
    alt "ACK received before timeout"
        Net->>TCP: "Receive SYN-ACK"
        TCP->>TCP: "Measure RTT, update RTO using Van Jacobson algorithm"
        Note over TCP: "rto = (sa >> 3) + sv (adaptive)"
        TCP->>Net: "Send ACK"
        TCP->>App: "Connection established"
    else "Timeout after ~1 second (5 ticks)"
        Timer->>TCP: "tcp_slowtmr() - rtime >= rto"
        TCP->>TCP: "Apply backoff: rto = rto << tcp_backoff[nrtx]"
        TCP->>Net: "Retransmit SYN"
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants