-
Notifications
You must be signed in to change notification settings - Fork 427
netkvm: Add mergeable receive buffers support #1446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add NumOwnedPages to track descriptor-owned memory pages for cleanup. Add OriginalPhysicalPages to save original page array pointer before merge operations. Add MergedBufferCount and MergedBuffers array for inline storage of merged buffer pointers, eliminating dynamic allocation in hot path. Maximum 16 additional buffers supported per VirtIO spec (65562 bytes / 4096 bytes per buffer). Signed-off-by: Zhang JianMing <[email protected]>
Add _MergeBufferContext structure with pre-allocated arrays for mergeable packet assembly to avoid hot-path allocation. Includes buffer sequence tracking, actual lengths, and physical page storage (max 18 pages). Add method declarations: ProcessMergedBuffers for initial processing, CollectRemainingMergeBuffers for gathering additional buffers, AssembleMergedPacket for packet construction, ReuseCollectedBuffers for error cleanup, and DisassembleMergedPacket for buffer state restoration. Signed-off-by: Zhang JianMing <[email protected]>
Implement CollectRemainingMergeBuffers to gather additional buffers from virtqueue based on num_buffers field. Implement AssembleMergedPacket to consolidate multiple buffers into single descriptor by expanding PhysicalPages array and creating MDL chain covering all payload data. Implement ReuseCollectedBuffers for error-path cleanup. Implement DisassembleMergedPacket to restore original buffer state and return merged buffers to pool. These methods enable zero-copy assembly of large packets split across multiple virtio buffers while maintaining backward compatibility with single-buffer mode. Signed-off-by: Zhang JianMing <[email protected]>
Summary of ChangesHello @zjmletang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant optimization to the NetKVM driver by adding support for VirtIO mergeable receive buffers. This change fundamentally alters how network packets are received, moving from large, potentially wasteful buffers to smaller, 4KB page-sized buffers that can be dynamically merged to form larger packets. The primary goal is to drastically reduce memory consumption and allocation overhead, particularly in scenarios with high volumes of small packets. The implementation is carefully designed to ensure a high-performance, zero-allocation hot path for packet assembly while maintaining full backward compatibility with existing driver components and targeting modern VirtIO device capabilities. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
[Jenkins CI]: Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for mergeable receive buffers, a significant feature to optimize memory usage. The implementation is well-designed, following the detailed plan in the PR description. Key aspects like pre-allocating context structures to avoid hot-path allocations, ensuring backward compatibility through physical page aliasing, and clear separation of logic for mergeable vs. traditional paths are well executed. The code is mostly clean and correct. I've found a few areas for improvement, mainly related to code clarity and removing redundant checks, which should enhance maintainability.
| PMDL pMDL = pBuffer->Holder; | ||
| USHORT mdlCount = 0; | ||
|
|
||
| while (pMDL && mdlCount < 1) | ||
| { | ||
| pMDL = NDIS_MDL_LINKAGE(pMDL); | ||
| mdlCount++; | ||
| } | ||
|
|
||
| while (pMDL) | ||
| { | ||
| PMDL pNextMDL = NDIS_MDL_LINKAGE(pMDL); | ||
| NdisFreeMdl(pMDL); | ||
| pMDL = pNextMDL; | ||
| } | ||
|
|
||
| pMDL = pBuffer->Holder; | ||
| if (pMDL) | ||
| { | ||
| NDIS_MDL_LINKAGE(pMDL) = NULL; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for freeing the MDL chain of merged buffers is correct but could be simplified for better readability and maintainability. The current implementation uses a while loop with a counter to skip the first MDL, then another while loop to free the rest, and finally a separate block to unlink the chain. This can be done more concisely by unlinking the chain first and then iterating through the part to be freed.
if (pBuffer->Holder)
{
PMDL pMdlChainToFree = NDIS_MDL_LINKAGE(pBuffer->Holder);
// Unlink the additional MDLs from the primary buffer's MDL
NDIS_MDL_LINKAGE(pBuffer->Holder) = NULL;
// Free the chain of additional MDLs
while (pMdlChainToFree)
{
PMDL pNextMDL = NDIS_MDL_LINKAGE(pMdlChainToFree);
NdisFreeMdl(pMdlChainToFree);
pMdlChainToFree = pNextMDL;
}
}| // CRITICAL: Prevent buffer overflow - inline array has limited capacity | ||
| if (additionalBuffers > MAX_MERGED_BUFFERS) | ||
| { | ||
| DPrintf(0, | ||
| "ERROR: Too many merged buffers %u (max: %u) - dropping packet", | ||
| m_MergeContext.CollectedBuffers, | ||
| MAX_MERGED_BUFFERS + 1); | ||
|
|
||
| ReuseCollectedBuffers(); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check for additionalBuffers > MAX_MERGED_BUFFERS is redundant and represents unreachable code. The number of buffers is validated earlier in ProcessMergedBuffers against VIRTIO_NET_MAX_MRG_BUFS (17). Since additionalBuffers is m_MergeContext.CollectedBuffers - 1, and m_MergeContext.CollectedBuffers is at most 17, additionalBuffers can be at most 16. MAX_MERGED_BUFFERS is also 16, so this condition will never be true. Removing this dead code will improve clarity.
| // Range: 0 (single buffer) to 16 (max merged packet) | ||
| // MergedBuffersInline: Array storing pointers to the 16 additional buffers | ||
| // (this descriptor itself is not stored in the array) | ||
| #define MAX_MERGED_BUFFERS 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro MAX_MERGED_BUFFERS is defined inside the _tagRxNetDescriptor struct. While syntactically valid, it's unconventional and harms readability. It's standard practice to define macros at the file scope, before the struct that uses them. Please move this definition outside and before the _tagRxNetDescriptor struct.
|
ok to test |
|
@zjmletang Thank you for the PR, I'll review it as soon as I have a time, but not immediately. |
Hello everyone — I’ve prepared a preliminary PR that adds support for the netkvm “mergeable” feature. This can effectively reduce memory usage; I’d appreciate it if you could take a moment to review it when you have time. Below are the details/explanations for this PR.
VirtIO Mergeable Receive Buffers Implementation Design
Overview
This document describes the implementation of VirtIO mergeable receive buffers (VIRTIO_NET_F_MRG_RXBUF) support for the Windows NetKVM driver. The implementation optimizes memory usage and reduces buffer allocation overhead for high-throughput network scenarios.
Background
Problem Statement
The traditional receive path allocates large buffers (up to 64KB) to accommodate maximum-sized packets. This approach:
Solution: Mergeable Receive Buffers
VirtIO mergeable buffers allow:
Design Principles
1. Conditional Activation
Only enable mergeable buffers when BOTH features are present:
VIRTIO_NET_F_MRG_RXBUF(mergeable buffer support)VIRTIO_F_ANY_LAYOUT(VirtIO 1.0+ combined header+data layout)Rationale: This is a pragmatic engineering decision. While
VIRTIO_NET_F_MRG_RXBUFandVIRTIO_F_ANY_LAYOUTare theoretically independent features per VirtIO specification, we require both for mergeable buffer support to:Legacy VirtIO 0.95 devices lacking
ANY_LAYOUTautomatically fall back to traditional non-mergeable path.2. Zero-Allocation Hot Path
All data structures for packet assembly are pre-allocated:
Inline arrays in merge context: Buffer references and physical page arrays use fixed-size inline storage (
BufferSequence[17],PhysicalPages[18]) embedded in the context structure, eliminating per-packet heap allocation overhead.Stack-based storage for buffer references: All temporary tracking uses stack variables or pre-allocated context members, avoiding dynamic memory management in the receive hot path.
Bounds enforced at compile time: Array sizes are compile-time constants (
#define VIRTIO_NET_MAX_MRG_BUFS 17), not runtime variables. This allows the compiler to enforce bounds checking and eliminates the need for runtime validation checks, improving performance and safety.3. Backward Compatibility
Maintain compatibility with existing code paths:
Physical page aliasing preserves
ParaNdis_BindRxBufferToPacketlogic: The existing MDL binding function always starts fromPhysicalPages[1](legacy design for traditional mode). By creating an alias where bothPhysicalPages[0]and[1]point to the same physical memory, this function works correctly for mergeable buffers without modification, reducing regression risk.Separate creation paths for mergeable vs. traditional buffers: Two independent buffer allocation functions (
CreateMergeableRxDescriptor()for small 4KB buffers,CreateRxDescriptorOnInit()for large multi-page buffers) are selected based on feature flags. This separation keeps each mode's logic clean and isolated, avoiding complex conditional branches in shared code.No changes to core packet processing: After assembly, both modes produce identical MDL chain structures. Upper-layer processing (checksum validation, RSS classification, hardware offload) operates on standard MDL chains regardless of buffer origin. This interface consistency ensures existing packet handling logic works unchanged for both modes.
Architecture
Key Data Structures
1. RxNetDescriptor Extensions
Design Notes:
NumPages: Semantic change - now represents logical page count for complete packet after assemblyNumOwnedPages: Prevents double-free during cleanup (only free owned pages)OriginalPhysicalPages: Enables pointer restoration after merge assemblyMergedBuffersarray: Avoids heap allocation, sized for worst case (17 buffers max)2. Merge Context Structure
Design Notes:
VIRTIO_NET_MAX_MRG_BUFS = 17: Calculated from max packet size (65562 bytes / 4096 bytes per buffer)MAX_MERGED_PHYSICAL_PAGES = 18: First buffer (2 logical pages) + 16 additional buffersPhysical Page Aliasing Design
Challenge:
ParaNdis_BindRxBufferToPacketalways starts fromPARANDIS_FIRST_RX_DATA_PAGE(index 1).Solution: Create an alias where both
PhysicalPages[0]andPhysicalPages[1]point to the same physical memory.Safety:
IsRegionInside()detects the alias during cleanup and skips freeingPhysicalPages[1].Trade-off: +8 bytes per descriptor (1000 descriptors = 8KB overhead), acceptable for compatibility.
Implementation Details
Buffer Creation Path
CreateMergeableRxDescriptor()
Allocates simplified 4KB buffers for mergeable mode:
PhysicalPagesarray (aliasing design)Key Parameters:
NumPages = 2(logical)NumOwnedPages = 2(same for single buffer)BufferSGLength = 1(combined header+data)DataStartOffset = nVirtioHeaderSizePacket Assembly Path
ProcessMergedBuffers()
Main entry point for mergeable packet handling:
num_buffersfrom virtio headerCollectRemainingMergeBuffers()AssembleMergedPacket()Error Handling:
num_buffers: Drop packet, reuse first bufferCollectRemainingMergeBuffers()
Retrieves buffers 2..N from virtqueue:
VirtIO Protocol Guarantee: All buffers for a merged packet are atomically available.
Implementation:
num_buffersfrom virtio headerAssembleMergedPacket()
Combines multiple buffers into single packet:
MergedBuffersarrayPhysicalPagesarray from merge contextNumPages(logical),MergedBufferCount(additional buffers)Page Calculation:
MDL Creation: Additional buffers use
PhysicalPages[PARANDIS_FIRST_RX_DATA_PAGE](index 1, aliased to same page as [0] in mergeable mode, for consistency withParaNdis_BindRxBufferToPacket).Buffer Reuse Path
ReuseReceiveBufferNoLock()
Enhanced to handle merged packets:
MergedBufferCount > 0DisassembleMergedPacket()to restore stateDisassembleMergedPacket()
Inverse operation of
AssembleMergedPacket():PhysicalPagespointer from inline array to originalNumPages = 2,NumOwnedPages = 2MergedBufferCount = 0Result: Buffer returns to pristine single-buffer state for reuse.
Memory Footprint
Per-Descriptor Structure Overhead
Mergeable mode adds new fields to
RxNetDescriptorstructure:Per-Queue Context Overhead
Mergeable mode adds
_MergeBufferContextto each RX queue:Per-Buffer Physical Memory Allocation
Actual shared memory (DMA-capable) allocated per buffer:
Total Impact (Example: 4096 buffers/queue, 1 queue)
Conclusion: Minimal metadata overhead (~575 KB) enables significant shared memory savings (~272 MB).
Error Handling
Protocol Violations
Invalid
num_buffers(0 or >17):Missing buffers (GetBuf returns NULL):
Buffer overflow (>16 additional buffers):
Resource Exhaustion
Appendix: Code Changes Summary
Key Functions Added
CreateMergeableRxDescriptor(): Simplified buffer creationProcessMergedBuffers(): Main assembly coordinatorCollectRemainingMergeBuffers(): Buffer collectionAssembleMergedPacket(): Multi-buffer packet assemblyDisassembleMergedPacket(): State restoration for reuseReuseCollectedBuffers(): Batch buffer returnProcessReceivedPacket(): Encapsulates packet analysis, filtering, and RSS processingKey Functions Modified
ReuseReceiveBufferNoLock(): Added merged packet handlingCreateRxDescriptorOnInit(): Added path routingProcessRxRing(): Integrated mergeable path