Skip to content

VPP crash in bad GTPUhw jumbo test since 43754 #3647

@vrpolakatcisco

Description

@vrpolakatcisco

So, there is a CSIT test that was failing in rls2510, but the reason I call it a "bad" test is a configuration mismatch between CSIT and VPP. See [0] if interested in the messy details. The point is, the test is only executed in coverage jobs (once per release) and it will probably be fixed in next release.

But when I was investigating that test, I noticed the behavior is different between release version and master branch VPP. Previously, packets got dropped but VPP stayed responsive, but now VPP is crashing. Bisect says the first crashing commit is [1]. Core is not always the same, an example is [2]:

#6  0x00007ffff5ea04f8 in unix_signal_handler (signum=11, si=<optimized out>, uc=<optimized out>) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/vlib/unix/main.c:267
#7  <signal handler called>
#8  0x00007fff34914536 in ice_xmit_pkts () from /usr/lib/x86_64-linux-gnu/vpp_plugins/dpdk_plugin.so
#9  0x00007fff350489d7 in rte_eth_tx_burst (port_id=<optimized out>, tx_pkts=0x7fff39de0d00, nb_pkts=11, queue_id=<optimized out>) at /opt/vpp/external/x86_64/include/rte_ethdev.h:6695
#10 tx_burst_vector_internal (vm=0x7fff377af0c0, xd=0x7fff37a95780, mb=0x7fff39de0d00, n_left=30, is_shared=<error reading variable: Incompatible types on DWARF stack>, queue_id=<optimized out>) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/plugins/dpdk/device/device.c:173
#11 dpdk_device_class_tx_fn_icl (vm=0x7fff377af0c0, node=0x7fff39e7fc00, f=<optimized out>) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/plugins/dpdk/device/device.c:465
#12 0x00007ffff5e3969f in dispatch_node (vm=0x7fff377af0c0, node=0x7fff39e7fc00, type=VLIB_NODE_TYPE_INTERNAL, frame=0x7fff39ecd9c0, dispatch_reason=VLIB_NODE_DISPATCH_REASON_PENDING_FRAME, last_time_stamp=201703542081597715) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/vlib/main.c:938
#13 dispatch_pending_node (vm=vm@entry=0x7fff377af0c0, pending_frame_index=pending_frame_index@entry=10, last_time_stamp=201703542081597715) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/vlib/main.c:1096
#14 0x00007ffff5e3c65e in vlib_main_or_worker_loop (vm=0x7fff377af0c0, is_main=0) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/vlib/main.c:1640
#15 vlib_worker_thread_fn (arg=<optimized out>) at /w/workspace/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/src/vlib/main.c:2090

GTPUsw tests are all passing, including normal jumbo ones and the non-jumbo ones focused on packet fragmentation (even though those suffer from #3538). Although the fragmentation mechanism is different, normal tests fragment due to intentionally small MTU on hardware interface, this issue happens when the fragmentation is due to unintentionally small MTU on software interface (avoided in GTPUsw tests).

So, this is not a high priority issue, and it does not affect release testing. I am opening it just in case somebody sees a relation to a more important bug.

[0] FDio/csit#4117
[1] https://gerrit.fd.io/r/c/vpp/+/43754
[2] https://logs.fd.io/vex-yul-rot-jenkins-1/vpp-csit-verify-perf-master-ubuntu2404-x86_64-3n-icx/111/csit_current/0/log.html.gz#s1-s1-s1-s1-s1-t1-k3-k4-k1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions