netkvm: Add mergeable receive buffers support #1446

zjmletang · 2025-11-03T03:42:32Z

Hello everyone — I’ve prepared a preliminary PR that adds support for the netkvm “mergeable” feature. This can effectively reduce memory usage; I’d appreciate it if you could take a moment to review it when you have time. Below are the details/explanations for this PR.

VirtIO Mergeable Receive Buffers Implementation Design

Overview

This document describes the implementation of VirtIO mergeable receive buffers (VIRTIO_NET_F_MRG_RXBUF) support for the Windows NetKVM driver. The implementation optimizes memory usage and reduces buffer allocation overhead for high-throughput network scenarios.

Background

Problem Statement

The traditional receive path allocates large buffers (up to 64KB) to accommodate maximum-sized packets. This approach:

Wastes memory when most packets are small (e.g., TCP ACKs, DNS queries)
Limits the number of available receive buffers due to memory constraints
Increases buffer pool exhaustion under heavy traffic

Solution: Mergeable Receive Buffers

VirtIO mergeable buffers allow:

Small fixed-size buffers (4KB pages) for all packets
Multi-buffer packet assembly for large packets
Better memory utilization and larger buffer pools
Reduced allocation overhead

Design Principles

1. Conditional Activation

Only enable mergeable buffers when BOTH features are present:

VIRTIO_NET_F_MRG_RXBUF (mergeable buffer support)
VIRTIO_F_ANY_LAYOUT (VirtIO 1.0+ combined header+data layout)

Rationale: This is a pragmatic engineering decision. While VIRTIO_NET_F_MRG_RXBUF and VIRTIO_F_ANY_LAYOUT are theoretically independent features per VirtIO specification, we require both for mergeable buffer support to:

Simplify implementation: Combined layout allows single scatter-gather entry per buffer
Reduce code paths: Eliminates need for separate header/data descriptor logic
Minimize testing matrix: Avoids testing mergeable mode with legacy split-descriptor layout
Target modern devices: VirtIO 1.0+ devices (released 2016+) support both features

Legacy VirtIO 0.95 devices lacking ANY_LAYOUT automatically fall back to traditional non-mergeable path.

2. Zero-Allocation Hot Path

All data structures for packet assembly are pre-allocated:

Inline arrays in merge context: Buffer references and physical page arrays use fixed-size inline storage (BufferSequence[17], PhysicalPages[18]) embedded in the context structure, eliminating per-packet heap allocation overhead.
Stack-based storage for buffer references: All temporary tracking uses stack variables or pre-allocated context members, avoiding dynamic memory management in the receive hot path.
Bounds enforced at compile time: Array sizes are compile-time constants (#define VIRTIO_NET_MAX_MRG_BUFS 17), not runtime variables. This allows the compiler to enforce bounds checking and eliminates the need for runtime validation checks, improving performance and safety.

3. Backward Compatibility

Maintain compatibility with existing code paths:

Physical page aliasing preserves ParaNdis_BindRxBufferToPacket logic: The existing MDL binding function always starts from PhysicalPages[1] (legacy design for traditional mode). By creating an alias where both PhysicalPages[0] and [1] point to the same physical memory, this function works correctly for mergeable buffers without modification, reducing regression risk.
Separate creation paths for mergeable vs. traditional buffers: Two independent buffer allocation functions (CreateMergeableRxDescriptor() for small 4KB buffers, CreateRxDescriptorOnInit() for large multi-page buffers) are selected based on feature flags. This separation keeps each mode's logic clean and isolated, avoiding complex conditional branches in shared code.
No changes to core packet processing: After assembly, both modes produce identical MDL chain structures. Upper-layer processing (checksum validation, RSS classification, hardware offload) operates on standard MDL chains regardless of buffer origin. This interface consistency ensures existing packet handling logic works unchanged for both modes.

Architecture

Key Data Structures

1. RxNetDescriptor Extensions

struct _tagRxNetDescriptor {
    // Modified fields:
    USHORT NumPages;                           // Logical page count (exposed to offload engines)
    USHORT NumOwnedPages;                      // Physical ownership (vs. logical pages)
    
    // New fields for mergeable support:
    tCompletePhysicalAddress *OriginalPhysicalPages;  // Saved for restoration
    USHORT MergedBufferCount;                  // Additional buffers (excluding this one)
    pRxNetDescriptor MergedBuffers[16];  // Inline storage (no allocation)
};

Design Notes:

NumPages: Semantic change - now represents logical page count for complete packet after assembly
- Traditional mode: Physical page count = logical page count (same value)
- Mergeable mode: Logical count > physical count (includes pages from additional buffers)
- Must remain accurate for checksum/offload engines to process full packet correctly
NumOwnedPages: Prevents double-free during cleanup (only free owned pages)
- Always equals the actual physical pages this descriptor owns
- Mergeable mode: Always 2 for single-buffer descriptor
- After assembly: Still 2 (doesn't change, only first buffer owns its pages)
OriginalPhysicalPages: Enables pointer restoration after merge assembly
MergedBuffers array: Avoids heap allocation, sized for worst case (17 buffers max)

2. Merge Context Structure

struct _MergeBufferContext {
    pRxNetDescriptor BufferSequence[17];           // Buffer collection
    UINT32 BufferActualLengths[17];                // Received lengths
    UINT16 ExpectedBuffers;                        // From virtio header
    UINT16 CollectedBuffers;                       // Current count
    UINT32 TotalPacketLength;                      // Accumulated size
    tCompletePhysicalAddress PhysicalPages[18];  // Pre-allocated array
};

Design Notes:

VIRTIO_NET_MAX_MRG_BUFS = 17: Calculated from max packet size (65562 bytes / 4096 bytes per buffer)
MAX_MERGED_PHYSICAL_PAGES = 18: First buffer (2 logical pages) + 16 additional buffers
All arrays are compile-time sized to avoid runtime allocation

Physical Page Aliasing Design

Challenge: ParaNdis_BindRxBufferToPacket always starts from PARANDIS_FIRST_RX_DATA_PAGE (index 1).

Solution: Create an alias where both PhysicalPages[0] and PhysicalPages[1] point to the same physical memory.

Traditional mode:              Mergeable mode (aliasing):
PhysicalPages[0] → Header      PhysicalPages[0] ───┐
PhysicalPages[1] → Data page 1                     ├→ Same 4KB page
PhysicalPages[2] → Data page 2 PhysicalPages[1] ───┘

Safety: IsRegionInside() detects the alias during cleanup and skips freeing PhysicalPages[1].

Trade-off: +8 bytes per descriptor (1000 descriptors = 8KB overhead), acceptable for compatibility.

Implementation Details

Buffer Creation Path

CreateMergeableRxDescriptor()

Allocates simplified 4KB buffers for mergeable mode:

Allocate single 4KB physical page
Create 2-entry PhysicalPages array (aliasing design)
Set both entries to point to the same physical page
Create single scatter-gather entry (ANY_LAYOUT mode)
Bind MDL starting from index 1 (compatibility)

Key Parameters:

NumPages = 2 (logical)
NumOwnedPages = 2 (same for single buffer)
BufferSGLength = 1 (combined header+data)
DataStartOffset = nVirtioHeaderSize

Packet Assembly Path

ProcessMergedBuffers()

Main entry point for mergeable packet handling:

Read num_buffers from virtio header
Validate range (1-17 buffers)
Handle single-buffer case (fast path, no assembly)
Initialize merge context
Collect remaining buffers via CollectRemainingMergeBuffers()
Assemble packet via AssembleMergedPacket()

Error Handling:

Invalid num_buffers: Drop packet, reuse first buffer
Collection failure: Drop packet, reuse all collected buffers
Assembly failure: Drop packet, reuse all collected buffers

CollectRemainingMergeBuffers()

Retrieves buffers 2..N from virtqueue:

VirtIO Protocol Guarantee: All buffers for a merged packet are atomically available.

Implementation:

Collect remaining buffers based on num_buffers from virtio header
Store actual received lengths for each buffer

AssembleMergedPacket()

Combines multiple buffers into single packet:

Save buffer references: Store additional buffers in MergedBuffers array
Switch to inline PhysicalPages: Use pre-allocated PhysicalPages array from merge context
Copy page references: First buffer (2 pages) + additional buffers (1 page each)
Create MDLs: New MDLs for additional buffers covering full payload (no header offset)
Update counts: NumPages (logical), MergedBufferCount (additional buffers)

Page Calculation:

totalPages = 2 (first buffer) + (CollectedBuffers - 1) (additional)
           = 1 + CollectedBuffers

MDL Creation: Additional buffers use PhysicalPages[PARANDIS_FIRST_RX_DATA_PAGE] (index 1, aliased to same page as [0] in mergeable mode, for consistency with ParaNdis_BindRxBufferToPacket).

Buffer Reuse Path

ReuseReceiveBufferNoLock()

Enhanced to handle merged packets:

Check MergedBufferCount > 0
Recursively reuse all additional buffers
Call DisassembleMergedPacket() to restore state
Standard reuse logic

DisassembleMergedPacket()

Inverse operation of AssembleMergedPacket():

Free extended MDL chain (keep first buffer's original MDL)
Restore PhysicalPages pointer from inline array to original
Reset NumPages = 2, NumOwnedPages = 2
Clear MergedBufferCount = 0

Result: Buffer returns to pristine single-buffer state for reuse.

Memory Footprint

Per-Descriptor Structure Overhead

Mergeable mode adds new fields to RxNetDescriptor structure:

Additional fields compared to traditional mode:
   +8 bytes (OriginalPhysicalPages pointer)
   +2 bytes (NumOwnedPages)
   +2 bytes (MergedBufferCount)
   +128 bytes (MergedBuffers array, 16 * 8 bytes)
   = +140 bytes per descriptor

Per-Queue Context Overhead

Mergeable mode adds _MergeBufferContext to each RX queue:

BufferSequence[17]:         17 * 8  = 136 bytes
BufferActualLengths[17]:    17 * 4  = 68 bytes
ExpectedBuffers:            2 bytes
CollectedBuffers:           2 bytes
TotalPacketLength:          4 bytes
PhysicalPages[18]:          18 * 24 = 432 bytes
                            Total:    644 bytes per queue

Per-Buffer Physical Memory Allocation

Actual shared memory (DMA-capable) allocated per buffer:

Traditional mode:  Up to 18 physical pages per buffer (max ~72KB)
                   Layout: 1 header page + up to 17 data pages
                   Note: Must pre-allocate for worst-case packet size
                   
Mergeable mode:    1 physical page per buffer (4KB)
                   Layout: Single 4KB page (header + data combined)
                   Note: 2 logical pages via aliasing, but same physical page

Total Impact (Example: 4096 buffers/queue, 1 queue)

Additional metadata overhead:
  Descriptor structure: 4096 * 140 bytes = 574 KB
  Queue context:        1 * 644 bytes   = 0.6 KB
  Total overhead:                         574.6 KB

Shared memory savings:
  Traditional: 4096 * 18 pages * 4KB = ~288 MB (worst-case pre-allocation)
  Mergeable:   4096 * 1 page * 4KB   = ~16 MB
  Net savings:                          ~272 MB (94% reduction)

Conclusion: Minimal metadata overhead (~575 KB) enables significant shared memory savings (~272 MB).

Error Handling

Protocol Violations

Invalid num_buffers (0 or >17):
- Log error, drop packet
- Reuse first buffer immediately
Missing buffers (GetBuf returns NULL):
- Log protocol violation error
- Reuse all collected buffers
Buffer overflow (>16 additional buffers):
- Log critical error, drop packet
- Reuse all collected buffers
- Should never happen (pre-validated)

Resource Exhaustion

MDL allocation failure:
- Log error, abort packet assembly, return NULL
- Caller reuses all collected buffers
- Partial MDL chain is cleaned up automatically
- Entire packet is dropped (ensures data integrity)
- Rare in practice

Appendix: Code Changes Summary

Key Functions Added

CreateMergeableRxDescriptor(): Simplified buffer creation
ProcessMergedBuffers(): Main assembly coordinator
CollectRemainingMergeBuffers(): Buffer collection
AssembleMergedPacket(): Multi-buffer packet assembly
DisassembleMergedPacket(): State restoration for reuse
ReuseCollectedBuffers(): Batch buffer return
ProcessReceivedPacket(): Encapsulates packet analysis, filtering, and RSS processing

Key Functions Modified

ReuseReceiveBufferNoLock(): Added merged packet handling
CreateRxDescriptorOnInit(): Added path routing
ProcessRxRing(): Integrated mergeable path

Add NumOwnedPages to track descriptor-owned memory pages for cleanup. Add OriginalPhysicalPages to save original page array pointer before merge operations. Add MergedBufferCount and MergedBuffers array for inline storage of merged buffer pointers, eliminating dynamic allocation in hot path. Maximum 16 additional buffers supported per VirtIO spec (65562 bytes / 4096 bytes per buffer). Signed-off-by: Zhang JianMing <[email protected]>

Add _MergeBufferContext structure with pre-allocated arrays for mergeable packet assembly to avoid hot-path allocation. Includes buffer sequence tracking, actual lengths, and physical page storage (max 18 pages). Add method declarations: ProcessMergedBuffers for initial processing, CollectRemainingMergeBuffers for gathering additional buffers, AssembleMergedPacket for packet construction, ReuseCollectedBuffers for error cleanup, and DisassembleMergedPacket for buffer state restoration. Signed-off-by: Zhang JianMing <[email protected]>

Implement CollectRemainingMergeBuffers to gather additional buffers from virtqueue based on num_buffers field. Implement AssembleMergedPacket to consolidate multiple buffers into single descriptor by expanding PhysicalPages array and creating MDL chain covering all payload data. Implement ReuseCollectedBuffers for error-path cleanup. Implement DisassembleMergedPacket to restore original buffer state and return merged buffers to pool. These methods enable zero-copy assembly of large packets split across multiple virtio buffers while maintaining backward compatibility with single-buffer mode. Signed-off-by: Zhang JianMing <[email protected]>

gemini-code-assist · 2025-11-03T03:42:58Z

Summary of Changes

Hello @zjmletang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant optimization to the NetKVM driver by adding support for VirtIO mergeable receive buffers. This change fundamentally alters how network packets are received, moving from large, potentially wasteful buffers to smaller, 4KB page-sized buffers that can be dynamically merged to form larger packets. The primary goal is to drastically reduce memory consumption and allocation overhead, particularly in scenarios with high volumes of small packets. The implementation is carefully designed to ensure a high-performance, zero-allocation hot path for packet assembly while maintaining full backward compatibility with existing driver components and targeting modern VirtIO device capabilities.

Highlights

Mergeable Receive Buffers Support: Implements support for VirtIO mergeable receive buffers (VIRTIO_NET_F_MRG_RXBUF) in the Windows NetKVM driver, enabling more efficient memory usage for network packet reception.
Memory Optimization: Significantly reduces memory footprint by allocating small 4KB pages per buffer instead of large, fixed-size buffers (up to 64KB), especially beneficial for handling numerous small packets like TCP ACKs or DNS queries.
Conditional Activation: The mergeable buffer feature is activated only when both VIRTIO_NET_F_MRG_RXBUF and VIRTIO_F_ANY_LAYOUT are present, simplifying the implementation and targeting modern VirtIO 1.0+ devices.
Zero-Allocation Hot Path: Ensures high performance by utilizing pre-allocated inline arrays and stack-based storage for packet assembly, thereby avoiding dynamic memory allocation overhead in the critical receive hot path.
Backward Compatibility: Maintains compatibility with existing driver logic through techniques like physical page aliasing for legacy functions and separate buffer creation paths for mergeable and traditional modes, minimizing regression risk.
Enhanced Data Structures and Logic: Extends the RxNetDescriptor with new fields (NumOwnedPages, OriginalPhysicalPages, MergedBufferCount, MergedBuffers array) and introduces a _MergeBufferContext for per-queue state management. New functions are added for creating mergeable descriptors, processing, assembling, and disassembling merged packets.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

YanVugenfirer · 2025-11-03T03:43:03Z

[Jenkins CI]: Can one of the admins verify this patch?

gemini-code-assist

Code Review

This pull request introduces support for mergeable receive buffers, a significant feature to optimize memory usage. The implementation is well-designed, following the detailed plan in the PR description. Key aspects like pre-allocating context structures to avoid hot-path allocations, ensuring backward compatibility through physical page aliasing, and clear separation of logic for mergeable vs. traditional paths are well executed. The code is mostly clean and correct. I've found a few areas for improvement, mainly related to code clarity and removing redundant checks, which should enhance maintainability.

gemini-code-assist · 2025-11-03T03:45:04Z

NetKVM/Common/ParaNdis_RX.cpp

+    PMDL pMDL = pBuffer->Holder;
+    USHORT mdlCount = 0;
+
+    while (pMDL && mdlCount < 1)
+    {
+        pMDL = NDIS_MDL_LINKAGE(pMDL);
+        mdlCount++;
+    }
+
+    while (pMDL)
+    {
+        PMDL pNextMDL = NDIS_MDL_LINKAGE(pMDL);
+        NdisFreeMdl(pMDL);
+        pMDL = pNextMDL;
+    }
+
+    pMDL = pBuffer->Holder;
+    if (pMDL)
+    {
+        NDIS_MDL_LINKAGE(pMDL) = NULL;
+    }


The logic for freeing the MDL chain of merged buffers is correct but could be simplified for better readability and maintainability. The current implementation uses a while loop with a counter to skip the first MDL, then another while loop to free the rest, and finally a separate block to unlink the chain. This can be done more concisely by unlinking the chain first and then iterating through the part to be freed.

if (pBuffer->Holder) { PMDL pMdlChainToFree = NDIS_MDL_LINKAGE(pBuffer->Holder); // Unlink the additional MDLs from the primary buffer's MDL NDIS_MDL_LINKAGE(pBuffer->Holder) = NULL; // Free the chain of additional MDLs while (pMdlChainToFree) { PMDL pNextMDL = NDIS_MDL_LINKAGE(pMdlChainToFree); NdisFreeMdl(pMdlChainToFree); pMdlChainToFree = pNextMDL; } }

gemini-code-assist · 2025-11-03T03:45:05Z

NetKVM/Common/ParaNdis_RX.cpp

+    // CRITICAL: Prevent buffer overflow - inline array has limited capacity
+    if (additionalBuffers > MAX_MERGED_BUFFERS)
+    {
+        DPrintf(0,
+                "ERROR: Too many merged buffers %u (max: %u) - dropping packet",
+                m_MergeContext.CollectedBuffers,
+                MAX_MERGED_BUFFERS + 1);
+
+        ReuseCollectedBuffers();
+        return NULL;
+    }


This check for additionalBuffers > MAX_MERGED_BUFFERS is redundant and represents unreachable code. The number of buffers is validated earlier in ProcessMergedBuffers against VIRTIO_NET_MAX_MRG_BUFS (17). Since additionalBuffers is m_MergeContext.CollectedBuffers - 1, and m_MergeContext.CollectedBuffers is at most 17, additionalBuffers can be at most 16. MAX_MERGED_BUFFERS is also 16, so this condition will never be true. Removing this dead code will improve clarity.

gemini-code-assist · 2025-11-03T03:45:05Z

NetKVM/Common/ndis56common.h

+    //                      Range: 0 (single buffer) to 16 (max merged packet)
+    //   MergedBuffersInline: Array storing pointers to the 16 additional buffers
+    //                        (this descriptor itself is not stored in the array)
+#define MAX_MERGED_BUFFERS 16


The macro MAX_MERGED_BUFFERS is defined inside the _tagRxNetDescriptor struct. While syntactically valid, it's unconventional and harms readability. It's standard practice to define macros at the file scope, before the struct that uses them. Please move this definition outside and before the _tagRxNetDescriptor struct.

kostyanf14 · 2025-11-05T08:34:09Z

ok to test

ybendito · 2025-11-06T09:57:22Z

@zjmletang Thank you for the PR, I'll review it as soon as I have a time, but not immediately.

zjmletang added 3 commits November 3, 2025 11:19

zjmletang requested review from YanVugenfirer and ybendito as code owners November 3, 2025 03:42

gemini-code-assist bot reviewed Nov 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

netkvm: Add mergeable receive buffers support #1446

netkvm: Add mergeable receive buffers support #1446

zjmletang commented Nov 3, 2025

Uh oh!

gemini-code-assist bot commented Nov 3, 2025

Uh oh!

YanVugenfirer commented Nov 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 3, 2025

Uh oh!

gemini-code-assist bot Nov 3, 2025

Uh oh!

gemini-code-assist bot Nov 3, 2025

Uh oh!

kostyanf14 commented Nov 5, 2025

Uh oh!

ybendito commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netkvm: Add mergeable receive buffers support #1446

Are you sure you want to change the base?

netkvm: Add mergeable receive buffers support #1446

Conversation

zjmletang commented Nov 3, 2025

VirtIO Mergeable Receive Buffers Implementation Design

Overview

Background

Problem Statement

Solution: Mergeable Receive Buffers

Design Principles

1. Conditional Activation

2. Zero-Allocation Hot Path

3. Backward Compatibility

Architecture

Key Data Structures

1. RxNetDescriptor Extensions

2. Merge Context Structure

Physical Page Aliasing Design

Implementation Details

Buffer Creation Path

CreateMergeableRxDescriptor()

Packet Assembly Path

ProcessMergedBuffers()

CollectRemainingMergeBuffers()

AssembleMergedPacket()

Buffer Reuse Path

ReuseReceiveBufferNoLock()

DisassembleMergedPacket()

Memory Footprint

Per-Descriptor Structure Overhead

Per-Queue Context Overhead

Per-Buffer Physical Memory Allocation

Total Impact (Example: 4096 buffers/queue, 1 queue)

Error Handling

Protocol Violations

Resource Exhaustion

Appendix: Code Changes Summary

Key Functions Added

Key Functions Modified

Uh oh!

gemini-code-assist bot commented Nov 3, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

YanVugenfirer commented Nov 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

kostyanf14 commented Nov 5, 2025

Uh oh!

ybendito commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants