-
Notifications
You must be signed in to change notification settings - Fork 696
Description
So, there is a CSIT test that was failing in rls2510, but the reason I call it a "bad" test is a configuration mismatch between CSIT and VPP. See [0] if interested in the messy details. The point is, the test is only executed in coverage jobs (once per release) and it will probably be fixed in next release.
But when I was investigating that test, I noticed the behavior is different between release version and master branch VPP. Previously, packets got dropped but VPP stayed responsive, but now VPP is crashing. Bisect says the first crashing commit is [1]. Core is not always the same, an example is [2]:
#6 0x00007ffff5ea04f8 in unix_signal_handler (signum=11, si=<optimized out>, uc=<optimized out>) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/vlib/unix/main.c:267
#7 <signal handler called>
#8 0x00007fff34914536 in ice_xmit_pkts () from /usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so
#9 0x00007fff350489d7 in rte_eth_tx_burst (port_id=<optimized out>, tx_pkts=0x7fff39de0d00, nb_pkts=11, queue_id=<optimized out>) at /opt/vpp/external/x86_64/include/rte_ethdev.h:6695
#10 tx_burst_vector_internal (vm=0x7fff377af0c0, xd=0x7fff37a95780, mb=0x7fff39de0d00, n_left=30, is_shared=<error reading variable: Incompatible types on DWARF stack>, queue_id=<optimized out>) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/plugins/dpdk/device/device.c:173
#11 dpdk_device_class_tx_fn_icl (vm=0x7fff377af0c0, node=0x7fff39e7fc00, f=<optimized out>) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/plugins/dpdk/device/device.c:465
#12 0x00007ffff5e3969f in dispatch_node (vm=0x7fff377af0c0, node=0x7fff39e7fc00, type=VLIB_NODE_TYPE_INTERNAL, frame=0x7fff39ecd9c0, dispatch_reason=VLIB_NODE_DISPATCH_REASON_PENDING_FRAME, last_time_stamp=201703542081597715) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/vlib/main.c:938
#13 dispatch_pending_node (vm=vm@entry=0x7fff377af0c0, pending_frame_index=pending_frame_index@entry=10, last_time_stamp=201703542081597715) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/vlib/main.c:1096
#14 0x00007ffff5e3c65e in vlib_main_or_worker_loop (vm=0x7fff377af0c0, is_main=0) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/vlib/main.c:1640
#15 vlib_worker_thread_fn (arg=<optimized out>) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/vlib/main.c:2090
GTPUsw tests are all passing, including normal jumbo ones and the non-jumbo ones focused on packet fragmentation (even though those suffer from #3538). Although the fragmentation mechanism is different, normal tests fragment due to intentionally small MTU on hardware interface, this issue happens when the fragmentation is due to unintentionally small MTU on software interface (avoided in GTPUsw tests).
So, this is not a high priority issue, and it does not affect release testing. I am opening it just in case somebody sees a relation to a more important bug.
[0] FDio/csit#4117
[1] https://gerrit.fd.io/r/c/vpp/+/43754
[2] https://logs.fd.io/vex-yul-rot-jenkins-1/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/111/csit_current/0/log.html.gz#s1-s1-s1-s1-s1-t1-k3-k4-k1