Skip to content

Commit 7abca7a

Browse files
tomerdbzgalnoam
authored andcommitted
issue: 4728517 Fix credit prediction mismatch in hw_queue_tx
Fix assertion failure in debug builds and Send Queue waste caused by mismatch between credit prediction (credits_calculate) and actual WQE size calculation (fill_wqe). The bug manifested with multi-SGE packets where total data length was small enough to be inlined (<= 204 bytes). For example, a packet with num_sge=2 and total_data_len=142 bytes would: - credits_calculate(): See num_sge != 1, predict 2 credits (scatter-gather) - fill_wqe(): See data_len <= 204, use inline path, need 3 wqebbs - Result: Assertion failure (wqebbs=3 > credits=2) Additionally, the initial fix using a conservative upper bound (max of 4) caused severe performance degradation: 2-SGE UDP packets >204 bytes would reserve 4 WQEBBs but only use 1 WQEBB, wasting 75% of Send Queue capacity and increasing packet drops in high-PPS scenarios. The root cause was different branching logic: - credits_calculate() checked: num_sge == 1 && length <= 204 - fill_wqe() checked: total_data_len <= 204 (regardless of num_sge) Fix by making credits_calculate() match fill_wqe() logic exactly: 1. Single-SGE fast path: Check length directly, no loop - If length <= 204: use inline formula - Else: return 1 credit (non-inline single SGE) 2. Multi-SGE path: Calculate total length to distinguish inline vs scatter-gather, ensuring accurate predictions - If total <= 204: use inline formula (up to 4 WQEBBs) - Else: use scatter-gather formula (typically 1 WQEBB for 2-SGE UDP) Also fix test synchronization issues: - Add SO_RCVTIMEO to prevent indefinite blocking - Add wait_fork() to properly wait for child process Signed-off-by: Tomer Cabouly <[email protected]>
1 parent 7dbaa6c commit 7abca7a

File tree

2 files changed

+41
-14
lines changed

2 files changed

+41
-14
lines changed

src/core/dev/hw_queue_tx.h

Lines changed: 31 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -129,23 +129,40 @@ class hw_queue_tx : public xlio_ti_owner {
129129
* 3. TSO packet with inline headers
130130
*
131131
* Formulas details:
132-
* 1. WQEBB is 64 bytes, the 1st WQEBB contains ctrl segment, eth segment and 18 bytes of
133-
* inline data. So, we take the 1st WQEBB and number of WQEBBs for the packet minus 18
134-
* bytes.
135-
* 2. Data segment for each scatter-gather element is 16 bytes. Therefore, WQEBB can hold
136-
* up to 4 data segments. The 1st element fits into the 1st WQEBB after the eth segment.
137-
* So, we take the 1st WQEBB and number of WQEBBs for scatter-gather elements minus 1.
138-
* 3. Inline header starts from offset 46 in WQE (2 bytes before 16 bytes alignment).
139-
* Decrease inline header size by 2 to align it to 16 bytes boundary at the right edge.
140-
* This compensates data segments alignment. Add the 2 bytes back and length of
141-
* scatter-gather elements. Take into account that 18 bytes goes to the 1st WQEBB and
142-
* add the 1st WQEBB to the result.
132+
* 1. Inline path (≤204 bytes): 18 bytes of ETH header are inlined into the eth segment.
133+
* WQEBB is 64 bytes, the 1st WQEBB contains ctrl segment (16B), eth segment (32B with
134+
* inline header), and the remaining packet data goes into inline data segment.
135+
* Formula (length + 63 - 18) / 64 + 1 accounts for the 18 bytes already inlined.
136+
*
137+
* 2. Scatter-gather path (>204 bytes): NO inline header (inline_hdr_sz = 0).
138+
* ETH segment is just 16 bytes (1 octoword). Data segment for each scatter-gather
139+
* element is 16 bytes. Therefore, WQEBB can hold up to 4 data segments.
140+
* The 1st data segment fits into the 1st WQEBB after ctrl (1 octoword) + eth (1
141+
* octoword). Formula: (num_sge + 3 - 1) / 4 + 1 accounts for ctrl + eth + N data segments.
143142
*/
144143
if (xlio_send_wr_opcode(*p_send_wqe) != XLIO_IBV_WR_TSO) {
145-
if (p_send_wqe->num_sge == 1 && p_send_wqe->sg_list->length <= 204) {
146-
return (p_send_wqe->sg_list->length + 63U - 18U) / 64U + 1U;
144+
if (p_send_wqe->num_sge == 1) {
145+
if (p_send_wqe->sg_list->length <= 204) {
146+
// Single-SGE inline path (fast path for small packets)
147+
return (p_send_wqe->sg_list->length + 63U - 18U) / 64U + 1U;
148+
} else {
149+
// 1 data segment fits into 1 WQEBB
150+
return 1U;
151+
}
147152
} else {
148-
return (p_send_wqe->num_sge + 3U - 1U) / 4U + 1U;
153+
// Multi-SGE: Calculate total to differentiate inline vs scatter-gather
154+
uint32_t total_len = 0;
155+
for (int i = 0; i < p_send_wqe->num_sge; i++) {
156+
total_len += p_send_wqe->sg_list[i].length;
157+
}
158+
159+
if (total_len <= 204) {
160+
// Multi-SGE inline path
161+
return (total_len + 63U - 18U) / 64U + 1U;
162+
} else {
163+
// Multi-SGE scatter-gather: 1 ctrl + 1 eth + N data segments
164+
return (p_send_wqe->num_sge + 3U - 1U) / 4U + 1U;
165+
}
149166
}
150167
} else {
151168
return (((p_send_wqe->tso.hdr_sz + 15U - 2U) & ~15U) + 2U + p_send_wqe->num_sge * 16U -

tests/gtest/udp/udp_bind.cc

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -470,6 +470,13 @@ class pktinfo : public udp_base {
470470
int fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP);
471471
EXPECT_GT(fd, 0) << "Socket failed to open";
472472

473+
/* Set receive timeout to prevent indefinite blocking */
474+
struct timeval timeout;
475+
timeout.tv_sec = 10;
476+
timeout.tv_usec = 0;
477+
EXPECT_EQ(setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &timeout, sizeof(timeout)), 0)
478+
<< "Failed to set receive timeout";
479+
473480
sockaddr_store_t server_any_sockaddr = {
474481
.addr6 = {
475482
.sin6_family = AF_INET6,
@@ -551,5 +558,8 @@ TEST_F(pktinfo, check_recvmsg_returns_expected_pktinfo)
551558
};
552559

553560
server_func(pid, server_code_under_test);
561+
562+
/* Wait for child process to complete and validate exit status */
563+
ASSERT_EQ(0, wait_fork(pid));
554564
}
555565
}

0 commit comments

Comments
 (0)