Summary
When a DPDK RX application is started while traffic is already arriving on the NIC, DAQIRI can report sustained rx_missed_errors and the application observes missing packets after startup as well as malformed HDS split packets.
This was reproduced with Ethernet RX queue using header/data split. The issue appears to be startup-backlog/backpressure related rather than an application parser failure.
The same receiver software was demonstrated to pass the same test when using Holoscan advanced networking library.
Observed behavior
Starting up receiver application under TX traffic load, we observed:
- DAQIRI initializes successfully.
- RX workers start successfully.
- The stats thread reports
rx_missed_errors: Rx: Dropped <N> packets since last poll 500ms ago
- The application receives traffic, but with packet gaps/partial batches after startup.
- A small number of malformed split packets are also be logged:
Dropped malformed split RX packet ... expected 2 segment(s), found 1
but the volume of these is tiny compared with rx_missed_errors.
In the observed run, malformed split drops were single digits, while rx_missed_errors reached tens of thousands.
Also, in the DPDK RX worker paths, several rte_ring_enqueue(...) calls appear to ignore the return value. If the application-facing ring is full during startup backlog, DAQIRI may lose a burst or leak ownership without surfacing a clear error/counter.
Expected behavior
DAQIRI should be able to startup under traffic and make startup-backlog loss diagnosable:
-
Check and handle all RX-path rte_ring_enqueue(...) failures.
-
Expose/log a specific counter for app-ring enqueue failures.
-
Distinguish clearly between:
- NIC missed packets / ring overflow
- mbuf allocation failures
- application ring full
- malformed/incomplete HDS split packets
-
Avoid leaking or double-freeing burst ownership on enqueue failure.
-
Ideally provide guidance or knobs for live-attach startup backlog.
Environment
- DAQIRI branch: fix-pr-137-accessors
- DAQIRI container: framework-dev 0.1.3
- DPDK raw Ethernet RX
- Header/data split enabled
- ConnectX-class NIC / mlx5
Summary
When a DPDK RX application is started while traffic is already arriving on the NIC, DAQIRI can report sustained
rx_missed_errorsand the application observes missing packets after startup as well as malformed HDS split packets.This was reproduced with Ethernet RX queue using header/data split. The issue appears to be startup-backlog/backpressure related rather than an application parser failure.
The same receiver software was demonstrated to pass the same test when using Holoscan advanced networking library.
Observed behavior
Starting up receiver application under TX traffic load, we observed:
rx_missed_errors:Rx: Dropped <N> packets since last poll 500ms agoDropped malformed split RX packet ... expected 2 segment(s), found 1but the volume of these is tiny compared with rx_missed_errors.
In the observed run, malformed split drops were single digits, while rx_missed_errors reached tens of thousands.
Also, in the DPDK RX worker paths, several rte_ring_enqueue(...) calls appear to ignore the return value. If the application-facing ring is full during startup backlog, DAQIRI may lose a burst or leak ownership without surfacing a clear error/counter.
Expected behavior
DAQIRI should be able to startup under traffic and make startup-backlog loss diagnosable:
Check and handle all RX-path rte_ring_enqueue(...) failures.
Expose/log a specific counter for app-ring enqueue failures.
Distinguish clearly between:
Avoid leaking or double-freeing burst ownership on enqueue failure.
Ideally provide guidance or knobs for live-attach startup backlog.
Environment