Skip to content

ice-zc SUnreclaim slab memory leak [PF_RING 9.0.0] #1003

@Ezzahhh

Description

@Ezzahhh

Issue Description

I am running masscan with zero-copy PF_RING (specifically ice_zc). I have noticed that over time as I run masscan with ZC PF_RING that the SUnreclaim slab memory continues to grow and that it appears related or tied to PF_RING usage. For example, if I run masscan without PF_RING then I see no such issue (although of course the scan performance is not as good). I have a suspicion it is related to when the ZC device is being registered/deregistered and VSI is rebuilt where certain memory is not being released causing this unreclaimable slab memory to grow (however, this is only a hunch). The risk here being that over time the node will run out of memory and the OOMkiller will not be able to reclaim this memory and the node will eventually crash.

Environment Details

cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 25
model           : 97
model name      : AMD EPYC 4244P 6-Core Processor
stepping        : 2
microcode       : 0xa60120c
cpu MHz         : 5135.342
cache size      : 1024 KB
physical id     : 0
siblings        : 12
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc amd_ibpb_ret arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso
bogomips        : 7585.88
TLB size        : 3584 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
# cat /proc/net/pf_ring/info
PF_RING Version          : 9.0.0 (9.0.0-stable:04b07b3fa7c18fb1b60513522d88667bc52873db)
Total rings              : 0

Standard (non ZC) Options
Ring slots               : 65536
Slot version             : 21
Capture TX               : Yes [RX+TX]
IP Defragment            : No
Socket Mode              : Standard
Cluster Fragment Queue   : 0
Cluster Fragment Discard : 0
# cat /proc/version
Linux version 6.8.0-85-generic (buildd@lcy02-amd64-106) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #85-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 18 15:26:59 UTC 2025
# sudo pf_ringcfg --list-interfaces
Name: enp8s0f0np0          Driver: ice_zc     RSS:     12   [Running ZC]
Name: enp8s0f1np1          Driver: ice_zc     RSS:     12   [Running ZC]
# sudo lshw -C network  
*-network:0
       description: Ethernet interface
       product: Ethernet Controller E810-XXV for SFP
       vendor: Intel Corporation
       physical id: 0
       bus info: pci@0000:08:00.0
       logical name: enp8s0f0np0
       version: 02
       serial: x
       capacity: 25Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 1000bt-fd 25000bt-fd autonegotiation
       configuration: autonegotiation=off broadcast=yes driver=ice_zc driverversion=1.17.8 duplex=full firmware=4.20 0x8001b91f 1.3346.0 ip=x.x.x.x latency=0 link=yes multicast=yes promiscuous=yes
       resources: iomemory:fc0-fbf iomemory:fc0-fbf irq:24 memory:fcfa000000-fcfbffffff memory:fcfe010000-fcfe01ffff memory:f6300000-f63fffff memory:fcfd000000-fcfdffffff memory:fcfe220000-fcfe41ffff
  *-network:1
       description: Ethernet interface
       product: Ethernet Controller E810-XXV for SFP
       vendor: Intel Corporation
       physical id: 0.1
       bus info: pci@0000:08:00.1
       logical name: enp8s0f1np1
       version: 02
       serial: x
       capacity: 25Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 1000bt-fd 25000bt-fd autonegotiation
       configuration: autonegotiation=off broadcast=yes driver=ice_zc driverversion=1.17.8 duplex=full firmware=4.20 0x8001b91f 1.3346.0 ip=192.168.0.0 latency=0 link=yes multicast=yes
       resources: iomemory:fc0-fbf iomemory:fc0-fbf irq:24 memory:fcf8000000-fcf9ffffff memory:fcfe000000-fcfe00ffff memory:f6200000-f62fffff memory:fcfc000000-fcfcffffff memory:fcfe020000-fcfe21ffff
# lstopo
Machine (31GB total)
  Package L#0
    NUMANode L#0 (P#0 31GB)
    L3 L#0 (32MB)
      L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
        PU L#0 (P#0)
        PU L#1 (P#6)
      L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
        PU L#2 (P#1)
        PU L#3 (P#7)
      L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
        PU L#4 (P#2)
        PU L#5 (P#8)
      L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
        PU L#6 (P#3)
        PU L#7 (P#9)
      L2 L#4 (1024KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
        PU L#8 (P#4)
        PU L#9 (P#10)
      L2 L#5 (1024KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
        PU L#10 (P#5)
        PU L#11 (P#11)
  HostBridge
    PCIBridge
      PCI 01:00.0 (NVMExp)
        Block(Disk) "nvme1n1"
    PCIBridge
      PCI 02:00.0 (NVMExp)
        Block(Disk) "nvme0n1"
    PCIBridge
      PCIBridge
        PCIBridge
          PCIBridge
            PCI 06:00.0 (VGA)
        PCIBridge
          PCI 07:00.0 (SATA)
        PCIBridge
          PCI 08:00.0 (Ethernet)
            Net "enp8s0f0np0"
          PCI 08:00.1 (Ethernet)
            Net "enp8s0f1np1"
        PCIBridge
          PCI 0b:00.0 (SATA)

Metrics related to memory growth

Image

Here is an example of SUnreclaim slab memory growing as masscan runs (note that it is not one continuous masscan run but many jobs run one after another in sequence). Please ignore the gaps (they are unrelated). Large dips are when the node was restarted and the memory was recovered as a result. This metric is node_memory_SUnreclaim_bytes from node-exporter.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions