-
Notifications
You must be signed in to change notification settings - Fork 29
issue: 4043157 Set initial RTO to 1 second per RFC 6298 #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: vNext
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR modifies the initial TCP Retransmission Timeout (RTO) in XLIO's embedded lwIP TCP/IP stack from 3 seconds to 1 second, bringing the implementation into compliance with RFC 6298 Section 2. The change affects two functions in src/core/lwip/tcp.c: tcp_pcb_init() (which initializes new protocol control blocks) and tcp_pcb_recycle() (which resets PCBs for connection reuse). Both the pcb->rto and pcb->sv (smoothed variance) fields are updated from 3000 / slow_tmr_interval to 1000 / slow_tmr_interval. This initial RTO applies only during the connection establishment phase before the first RTT measurement; once ACKs are received, the TCP stack dynamically calculates RTO using the Van Jacobson algorithm. The XLIO library intercepts socket calls and implements its own TCP/IP stack for RDMA-accelerated networking, so this change ensures its TCP behavior matches both RFC standards and the Linux kernel's TCP_TIMEOUT_INIT constant, improving connection failure detection speed during SYN retransmissions.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| src/core/lwip/tcp.c | 5/5 | Updated initial RTO and smoothed variance from 3000ms to1000ms in tcp_pcb_init() and tcp_pcb_recycle() to comply with RFC 6298 |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- The change is localized to two initialization functions, updates only well-understood timing constants, is backed by RFC 6298 standards, and the new 1-second value matches industry-standard implementations including the Linux kernel; no logic changes or side effects are introduced
- No files require special attention
Sequence Diagram
sequenceDiagram
participant User
participant Application
participant TCP as TCP Stack (XLIO)
participant Network
participant RemoteHost as Remote Host
User->>Application: Connect request
Application->>TCP: tcp_connect()
TCP->>TCP: tcp_pcb_init()<br/>Set initial RTO=1000ms<br/>(RFC 6298 compliant)
TCP->>TCP: tcp_enqueue_flags(TCP_SYN)
TCP->>TCP: set_tcp_state(SYN_SENT)
TCP->>Network: Send SYN packet
TCP->>TCP: Start retransmission timer
alt Connection succeeds
Network->>RemoteHost: SYN
RemoteHost->>Network: SYN-ACK
Network->>TCP: Receive SYN-ACK
TCP->>TCP: Calculate actual RTT<br/>Update RTO dynamically
TCP->>Network: Send ACK
TCP->>TCP: set_tcp_state(ESTABLISHED)
TCP->>Application: connected callback
Application->>User: Connection established
else No response - first retry
Note over TCP: After ~1 second (RTO)
TCP->>TCP: tcp_slowtmr()<br/>Retransmission timeout
TCP->>TCP: tcp_rexmit_rto()
TCP->>Network: Retransmit SYN
else Connection fails after max retries
Note over TCP: After TCP_SYNMAXRTX attempts
TCP->>TCP: tcp_slowtmr()<br/>nrtx == TCP_SYNMAXRTX
TCP->>TCP: tcp_pcb_remove()
TCP->>Application: errf(ERR_TIMEOUT)
Application->>User: Connection failed (faster detection)
end
1 file reviewed, no comments
src/core/lwip/tcp.c
Outdated
| /* Set initial RTO to 1 second as per RFC 6298. | ||
| * This value is used until an RTT measurement is made. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make it 1 line comment for better readability of the code block. If needed, shorten the 2nd sentence or remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/core/lwip/tcp.c
Outdated
| pcb->rto = 3000 / slow_tmr_interval; | ||
| /* Set initial RTO to 1 second as per RFC 6298. | ||
| * This value is used until an RTT measurement is made. */ | ||
| pcb->rto = 1000 / slow_tmr_interval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please handle situation when the slow_tmr_interval is too long. For example:
pcb->rto = max(1000 / slow_tmr_interval, 3);There were scenarios where we configured TCP timer to 1 second and pcb->rto=1 can trigger the event immediately due to timer tick granularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great comment, wdyt about the fix?
src/core/lwip/tcp.c
Outdated
| pcb->rto = 3000 / slow_tmr_interval; | ||
| /* Set initial RTO to 1 second as per RFC 6298. | ||
| * This value is used until an RTT measurement is made. */ | ||
| pcb->rto = 1000 / slow_tmr_interval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] Do we want to make the initial RTO a constant and move it to src/core/lwip/opt.h? I see that we have the same value in 4 places across the code. Personally I don't have a preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
07d6699 to
0bdc812
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR updates the initial TCP Retransmission Timeout (RTO) from 3 seconds to 1 second to comply with RFC 6298 Section 2, which recommends RTO <- 1 second for new connections before RTT measurement.
Key Changes:
- Introduced
get_initial_rto()helper function that calculates 1000ms in timer ticks using ceiling division:(1000 + slow_tmr_interval - 1) / slow_tmr_interval - Updated
tcp_pcb_init()andtcp_pcb_recycle()to use the new helper for bothrtoandsv(smoothed variance) fields - Added 500ms maximum constraint on
tcp_timer_resolution_msecper RFC 1122 Section 4.2.3.2 (delayed ACK timer limit), with validation in two locations - Updated configuration schema and documentation to reflect the 500ms timer limit
Benefits:
- RFC 6298 compliance for initial RTO value
- Faster detection of connection establishment failures (1s vs 3s for first SYN retransmission)
- Consistent with Linux kernel TCP_TIMEOUT_INIT behavior
- Fixes timing issues in TCP_USER_TIMEOUT tests
The implementation correctly converts milliseconds to timer ticks and maintains the Van Jacobson RTT algorithm for ongoing connections once RTT measurements are available.
Confidence Score: 5/5
- This PR is safe to merge - it implements a well-defined RFC standard with correct calculations and comprehensive validation
- The implementation correctly follows RFC 6298 by setting initial RTO to 1 second. The ceiling division in
get_initial_rto()properly converts milliseconds to timer ticks. The change only affects initial connection timing before RTT measurement, with the existing Van Jacobson algorithm taking over once ACKs are received. Additional timer validation (500ms max) ensures proper granularity per RFC 1122. No logic errors or edge cases detected. - No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| src/core/lwip/tcp.c | 5/5 | Core change: Added get_initial_rto() helper and updated tcp_pcb_init() and tcp_pcb_recycle() to set RTO/sv to 1 second using proper timer tick calculation |
| src/core/util/sys_vars.cpp | 5/5 | Added validation to clamp tcp_timer_resolution_msec to 500ms maximum per RFC 1122, ensuring proper timer granularity for RTO calculations |
| src/core/config/descriptor_providers/xlio_config_schema.json | 5/5 | Updated schema to add 500ms maximum constraint for TCP timer resolution with RFC 1122 reference in description |
| README | 5/5 | Documentation update: Added RFC 1122 reference explaining 500ms maximum timer resolution requirement |
Sequence Diagram
sequenceDiagram
participant App as Application
participant TCP as TCP Stack (libxlio)
participant Net as Network
Note over TCP: New Connection Initialization
App->>TCP: tcp_connect() or listen/accept
TCP->>TCP: tcp_pcb_init() or tcp_pcb_recycle()
Note over TCP: rto = get_initial_rto()<br/>= (1000 + slow_tmr_interval - 1) / slow_tmr_interval<br/>sv = get_initial_rto()<br/>sa = 0
TCP->>Net: Send SYN
Note over TCP: Start retransmission timer<br/>Initial RTO = 1 second (was 3 seconds)
alt ACK received within RTO
Net->>TCP: SYN-ACK or ACK
TCP->>TCP: Measure RTT (tcp_ticks - pcb->rttest)
Note over TCP: Van Jacobson Algorithm:<br/>m = rtt - (sa >> 3)<br/>sa += m<br/>sv += |m| - (sv >> 2)<br/>rto = (sa >> 3) + sv
TCP->>App: Connection established
else No ACK within 1 second
Note over TCP: rtime >= rto (1 second)
TCP->>TCP: Double RTO with backoff<br/>rto = ((sa >> 3) + sv) << tcp_backoff[nrtx]
TCP->>Net: Retransmit SYN
Note over TCP: Faster failure detection<br/>compared to 3 second initial RTO
end
4 files reviewed, no comments
Change initial TCP retransmission timeout from 3 seconds to 1 second as recommended by RFC 6298 Section 2, with additional safeguards for timer granularity and RFC 1122 compliance. RFC 6298 states: "Until a round-trip time (RTT) measurement has been made... the sender SHOULD set RTO <- 1 second" This applies to all new TCP connections until an RTT measurement is made. Changes: 1. TCP RTO Calculation (src/core/lwip/tcp.c): - Added get_initial_rto() helper function - Uses round-up division - Prevents division trunc that could cause premature timeouts - Modified tcp_pcb_init() and tcp_pcb_recycle() to use helper 2. Timer Resolution Limits (src/core/util/sys_vars.cpp): - Added RFC 1122 validation for tcp_timer_resolution_msec - Enforces maximum of 500ms per RFC 1122 Section 4.2.3.2 - Logs warning and clamps value if exceeded - Applied to both environment variable and config registry paths 3. Configuration Schema (xlio_config_schema.json): - Added "maximum": 500 constraint to timer_msec - Updated description to reference RFC 1122 requirement - Prevents invalid configurations at schema validation level 4. Documentation (README): - Added RFC 1122 reference to timer_msec documentation Rationale: Previous implementation used simple division (1000 / slow_tmr_interval) which could result in: - 0 ticks when interval > 1000ms (immediate timeout) - 1 tick when interval = 1000ms (fires on next tick) - Insufficient granularity with large timer intervals The new implementation provides defense-in-depth: - Schema validation prevents misconfiguration - Runtime validation enforces RFC 1122 limits (delayed ACK ≤ 500ms) - Round-up division ensures minimum 1 tick without truncation This fixes incorrect SYN retransmission timing where the first retry could occur prematurely or immediately, causing TCP_USER_TIMEOUT tests to fail and connection establishment issues. Benefits: - TCP/IP standards compliance (RFC 6298 and RFC 1122) - Robust handling of timer granularity edge cases - Faster connection establishment failure detection - Prevents problematic timer configurations - Matches Linux kernel TCP_TIMEOUT_INIT behavior Signed-off-by: Tomer Cabouly <[email protected]>
0bdc812 to
7cee7e9
Compare
Greptile Summary
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant App as "Application"
participant TCP as "TCP Layer"
participant Timer as "Timer System"
participant Net as "Network"
Note over TCP: "Initial State: New Connection"
App->>TCP: "tcp_pcb_init() or tcp_pcb_recycle()"
TCP->>TCP: "get_initial_rto()"
Note over TCP: "Calculate: (1000 + slow_tmr_interval - 1) / slow_tmr_interval"
TCP->>TCP: "Set rto = 5 ticks (1000ms with default 200ms timer)"
TCP->>TCP: "Set sv = 5 ticks"
App->>TCP: "connect()"
TCP->>Net: "Send SYN"
TCP->>Timer: "Start retransmission timer (rtime=0)"
Timer->>TCP: "tcp_slowtmr() every 200ms"
TCP->>TCP: "Increment rtime"
alt "ACK received before timeout"
Net->>TCP: "Receive SYN-ACK"
TCP->>TCP: "Measure RTT, update RTO using Van Jacobson algorithm"
Note over TCP: "rto = (sa >> 3) + sv (adaptive)"
TCP->>Net: "Send ACK"
TCP->>App: "Connection established"
else "Timeout after ~1 second (5 ticks)"
Timer->>TCP: "tcp_slowtmr() - rtime >= rto"
TCP->>TCP: "Apply backoff: rto = rto << tcp_backoff[nrtx]"
TCP->>Net: "Retransmit SYN"
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
Description
Change initial TCP retransmission timeout from 3 seconds to 1 second as recommended by RFC 6298 Section 2.
RFC 6298 states: "Until a round-trip time (RTT) measurement has been made... the sender SHOULD set RTO <- 1 second"
This applies to all new TCP connections until an RTT measurement is made. Once the connection is established and ACKs received, TCP calculates actual RTT and updates RTO accordingly using the Van Jacobson algorithm.
Modified functions:
This fixes incorrect SYN retransmission timing where the first retry occurred after ~3 seconds instead of ~1 second, causing TCP_USER_TIMEOUT tests to fail and slower connection failure detection.
Benefits:
What
Set initial RTO to 1 second per RFC 6298
Why ?
RFC Compliance.
How ?
It is optional but for complex PRs please provide information about the design,
architecture, approach, etc.
Change type
What kind of change does this PR introduce?
Check list