-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Expected Behavior
Calico should handle UDP GSO packets traversing VXLAN overlay network without causing kernel panics, especially when BPF DSR mode is enabled for external services.
Current Behavior
Nodes experience kernel panics with null pointer dereference in skb_segment() when processing UDP GSO packets over VXLAN. The crash occurs during packet forwarding after VXLAN encapsulation, specifically in the GSO
segmentation path.
Crash backtrace:
PID: 0 TASK: ff16ba3686914000 CPU: 7 COMMAND: "swapper/7"
[exception RIP: skb_segment+558]
RIP: ffffffffaa037f7e RSP: ff3ca05846b44708 RFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ff16ba3aa69472c0 RDI: 0000000000000544
#8 [ff3ca05846b447e0] tcp_gso_segment at ffffffffaa1219c5
#9 [ff3ca05846b44830] inet_gso_segment at ffffffffaa136f4b
#10 [ff3ca05846b44880] skb_mac_gso_segment at ffffffffaa04cb9d
#11 [ff3ca05846b448a8] skb_udp_tunnel_segment at ffffffffaa12bbae
#12 [ff3ca05846b44908] inet_gso_segment at ffffffffaa136f4b
#13 [ff3ca05846b44958] skb_mac_gso_segment at ffffffffaa04cb9d
#14 [ff3ca05846b44980] __skb_gso_segment at ffffffffaa04ccbd
#15 [ff3ca05846b449a8] validate_xmit_skb at ffffffffaa04d15e
#16 [ff3ca05846b449e0] validate_xmit_skb_list at ffffffffaa04d396
#17 [ff3ca05846b44a10] sch_direct_xmit at ffffffffaa0b2e97
#18 [ff3ca05846b44a58] __dev_queue_xmit at ffffffffaa04e2d4
#19 [ff3ca05846b44ad8] ip_finish_output2 at ffffffffaa0f2add
#20 [ff3ca05846b44b30] ip_output at ffffffffaa0f44d0
#21 [ff3ca05846b44b88] ip_forward at ffffffffaa0f024c
#22 [ff3ca05846b44be8] ip_rcv at ffffffffaa0eecba
#23 [ff3ca05846b44c48] __netif_receive_skb_core at ffffffffaa04f808
#24 [ff3ca05846b44ce8] netif_receive_skb_internal at ffffffffaa04f9cd
#25 [ff3ca05846b44d10] napi_gro_receive at ffffffffaa050488
#26 [ff3ca05846b44d30] receive_buf at ffffffffc039a2db [virtio_net]
#27 [ff3ca05846b44e00] virtnet_poll at ffffffffc039af14 [virtio_net]
#28 [ff3ca05846b44ea8] __napi_poll at ffffffffaa050dbd
#29 [ff3ca05846b44ed8] net_rx_action at ffffffffaa051282
#30 [ff3ca05846b44f58] __softirqentry_text_start at ffffffffaa22768c
#31 [ff3ca05846b44fa8] irq_exit_rcu at ffffffffa9901106
#32 [ff3ca05846b44fb8] irq_exit at ffffffffa990111a
#33 [ff3ca05846b44fc0] do_IRQ at ffffffffaa401fdf
--- <IRQ stack> ---
#34 [ff3ca0584630fde8] ret_from_intr at ffffffffaa400b0f
[exception RIP: native_safe_halt+14]
RIP: ffffffffaa22646e RSP: ff3ca0584630fe98 RFLAGS: 00000246
RAX: ffffffffaa2262a0 RBX: 0000000000000007 RCX: 0000000000000001
RDX: 0000000000385a22 RSI: 0000000000000083 RDI: 0000000000000007
RBP: 0000000000000007 R8: 00119b5a99a85ad4 R9: 0000000000000001
R10: 0000000000000012 R11: 0000000000000299 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffffffffffff R15: ff16ba3686914000
ORIG_RAX: ffffffffffffffdc CS: 0010 SS: 0018
#35 [ff3ca0584630fe98] default_idle at ffffffffaa2262aa
#36 [ff3ca0584630fea0] default_idle_call at ffffffffaa2265f4
#37 [ff3ca0584630fec0] do_idle at ffffffffa993550a
#38 [ff3ca0584630ff10] cpu_startup_entry at ffffffffa993579f
#39 [ff3ca0584630ff30] start_secondary at ffffffffa9863ce7
#40 [ff3ca0584630ff50] secondary_startup_64_no_verify at ffffffffa9800146
crash>
The crash occurs in the packet forwarding path through VXLAN tunnel, with skb_udp_tunnel_segment() in the call stack indicating VXLAN processing.
SKB state at crash:
struct skb_shared_info {
gso_size = 1348,
gso_segs = 0,
frag_list = 0x0,
gso_type = 1027, // SKB_GSO_DODGY | SKB_GSO_TCPV4 | SKB_GSO_UDP_TUNNEL
nr_frags = 0 '\000',
dataref = { counter = 1 },
}
The gso_type = 1027 indicates a UDP tunnel (VXLAN) packet containing TCP GSO segments, with frag_list = 0x0 but non-zero gso_size, suggesting corrupted GSO state.
Possible Solution
This appears related to a known kernel bug recently patched: https://lkml.org/lkml/2025/5/29/75
The kernel patch addresses an issue where BPF datapath hooks (particularly bpf_skb_pull_data()) can corrupt SKB_GSO_FRAGLIST packets by pulling fraglist data into the head_skb, breaking the invariant that "head_skb holds
protocol headers plus first gso_size only."
Calico BPF code may trigger this through:
- Unconditional bpf_skb_pull_data() on all packets (felix/bpf-gpl/skb.h:108-110):
if (bpf_skb_pull_data(ctx->skb, min_size + nh_len)) {
CALI_DEBUG("Pull failed (min len)");
return -1;
}
- This is called during header validation for all protocols including UDP, potentially corrupting fraglist GSO packets.
- VXLAN decapsulation preserving GSO state (felix/bpf-gpl/nat4.h:98):
ret = bpf_skb_adjust_room(skb, -extra_hdrsz, BPF_ADJ_ROOM_MAC,
BPF_F_ADJ_ROOM_FIXED_GSO);
- The BPF_F_ADJ_ROOM_FIXED_GSO flag preserves GSO state during decapsulation. If the outer VXLAN packet has corrupted GSO state, it persists.
- No special handling for UDP GSO (felix/bpf-gpl/tc.c:922):
if (!(STATE->ip_proto == IPPROTO_TCP && skb_is_gso(ctx->skb)) &&
ip_is_dnf(ip_hdr(ctx)) && vxlan_encap_too_big(ctx)) {
- Code only checks TCP GSO, ignoring UDP GSO packets which have been supported since kernel 4.18.
Potential Calico fixes:
- Detect UDP GSO packets and skip bpf_skb_pull_data() or linearize them first
- Avoid BPF_F_ADJ_ROOM_FIXED_GSO for UDP packets during VXLAN operations
- Add UDP GSO checks similar to existing TCP GSO handling
Steps to Reproduce (for bugs)
- Deploy Calico 3.26.4 with the following configuration:
apiVersion: crd.projectcalico.org/v1
kind: FelixConfiguration
metadata:
name: default
spec:
bpfEnabled: false
bpfExternalServiceMode: DSR
vxlanVNI: 4096 - Create a LoadBalancer service (e.g., Istio ingress gateway) with cross-node endpoints:
- Service VIP on external network (e.g., 10.70.89.79)
- Backend pods distributed across nodes with VXLAN overlay (Pod CIDR 172.16.0.0/16) - Ensure NIC offloads are enabled:
ethtool -K gso on gro on - Generate sustained high-throughput UDP traffic through the LoadBalancer:
- HTTP/3 (QUIC over UDP) traffic works well as a reproducer
- DNS queries or media streaming also applicable - Observe kernel panic in skb_segment() under load, particularly during peak traffic periods
Context
- Network topology: Worker nodes in 10.70.2.0/24, Pod CIDR 172.16.0.0/16
- VXLAN overlay for cross-node communication
- Running Istio ingress gateway handling mixed HTTP/2 and HTTP/3 traffic
- Crash is reproducible under sustained UDP load but timing is non-deterministic
Your Environment
- Calico version: 3.26.4
- Orchestrator version: Kubernetes 1.28
- Operating System and version: RHEL 8.10
- Kernel version: 4.18.0-553.47.1.el8_10