Skip to content

Conversation

@e-ago
Copy link
Contributor

@e-ago e-ago commented Nov 24, 2025

Summary by CodeRabbit

  • New Features

    • Added support for GPU packet I/O with improved memory addressing and queue management.
  • Bug Fixes

    • Enhanced memory synchronization and completion signaling in network packet operations.
  • Chores

    • Upgraded DOCA runtime from 2.8.0 to 3.2.0.
    • Updated GPU networking dependencies including NVIDIA DPDK packages.
    • Refactored GPU kernel architecture for improved packet handling efficiency.
    • Reorganized CUDA compilation configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: eagostini <[email protected]>
Signed-off-by: eagostini <[email protected]>
Signed-off-by: eagostini <[email protected]>
Signed-off-by: eagostini <[email protected]>
Signed-off-by: eagostini <[email protected]>
Signed-off-by: eagostini <[email protected]>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 24, 2025

Greptile Overview

Greptile Summary

This PR upgrades the Advanced Network Operator from DOCA 2.8 to DOCA 3.2, implementing significant architectural changes to the GPU packet processing pipeline.

Key Changes

  • Replaces semaphore-based synchronization with direct packet list structures for CPU-GPU communication
  • Migrates from doca_buf_arr API to direct memory key-based packet addressing
  • Removes DPDK queue setup (now handled internally by DOCA Flow)
  • Updates DOCA Flow configuration to use new DOCA 3.2 APIs (doca_flow_port_cfg_set_dev, RSS structure changes)
  • Disables CUDA separable compilation for compatibility

Issues Found

  • Critical: Dockerfile wget URL has line break causing build failure (line 107-108)
  • Minor: Debug printf statements left in production code

The core GPU kernel logic remains sound with proper synchronization patterns, and the migration to DOCA 3.2 APIs appears architecturally correct.

Confidence Score: 3/5

  • This PR cannot build due to a critical Dockerfile syntax error; once fixed, the DOCA 3.2 migration is sound
  • Score reflects the broken Dockerfile wget command that will prevent Docker build from succeeding. The actual DOCA 3.2 migration code is well-structured with proper API usage, memory management, and GPU synchronization patterns. Minor style issues (debug printfs) don't affect functionality.
  • The Dockerfile requires immediate attention to fix the line break in the wget URL at line 107-108

Important Files Changed

File Analysis

Filename Score Overview
operators/advanced_network/Dockerfile 3/5 Upgrades DOCA to 3.2.0, adds new SDK packages; wget URL has line break causing build failure
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu 4/5 Replaces DOCA 2.8 semaphore-based packet handling with DOCA 3.2 direct packet list API and new RX/TX interfaces
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp 4/5 Refactors DOCA Flow initialization to use new DOCA 3.2 APIs, removes DPDK queue setup, updates RSS configuration
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp 4/5 Implements packet list allocation, dmabuf memory mapping, and new flow configuration for DOCA 3.2; has debug printfs

Sequence Diagram

sequenceDiagram
    participant CPU as CPU Thread
    participant GPU as GPU Kernel
    participant NIC as Network Interface Card
    participant MEM as GPU Memory
    
    Note over CPU,MEM: Initialization Phase (DOCA 3.2)
    CPU->>MEM: Allocate packet list (adv_doca_rx_gpu_info)
    CPU->>NIC: Configure DOCA Flow pipes with RSS
    CPU->>NIC: Apply queue IDs to eth_rxq
    CPU->>GPU: Launch persistent RX kernel
    
    Note over GPU,NIC: Packet Reception Loop
    loop Persistent kernel execution
        GPU->>NIC: doca_gpu_dev_eth_rxq_recv()
        NIC-->>GPU: out_first_pkt_idx, out_pkt_num
        GPU->>NIC: doca_gpu_dev_eth_rxq_get_pkt_addr()
        NIC-->>GPU: Buffer addresses
        GPU->>GPU: Process packets (parse headers, filter)
        GPU->>MEM: Update packet_stats (num_pkts, nbytes, addr)
        GPU->>MEM: doca_gpu_dev_eth_fence_release()
        GPU->>MEM: Set status = READY
        CPU->>MEM: Poll packet_stats.status
        alt Status == READY
            CPU->>MEM: Read packet metadata
            CPU->>CPU: Enqueue to ring buffer
            CPU->>MEM: Set status = FREE
            CPU->>MEM: Increment pkt_idx (circular)
        end
    end
    
    Note over CPU,NIC: Packet Transmission
    CPU->>CPU: Dequeue burst from ring
    CPU->>GPU: Launch TX kernel with packet info
    GPU->>GPU: doca_gpu_dev_eth_txq_reserve_wq_slots()
    loop For each packet
        GPU->>GPU: Calculate packet address
        GPU->>GPU: doca_gpu_dev_eth_txq_wqe_prepare_send()
    end
    GPU->>NIC: doca_gpu_dev_eth_txq_submit()
    NIC->>NIC: Transmit packets
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +107 to +108
RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Line break splits the URL, causing wget to fail

Suggested change
RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \


doca_eth_rxq_get_flow_queue_id(eth_rxq_cpu, &flow_queue_id);
rss_queues[0] = flow_queue_id;
printf("UDP pipe queue %d\n", flow_queue_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Debugging printf left in production code

doca_error_t DocaRxQueue::create_rx_packet_list() {
doca_error_t result;

printf("Alloc packet list for Rxq\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Debugging printf left in production code

@coderabbitai
Copy link

coderabbitai bot commented Nov 24, 2025

Walkthrough

This PR upgrades DOCA from version 2.8.0 to 3.2.0 and refactors GPU packet handling from semaphore-based to packet list-based architectures. Changes include updated build dependencies, modified CUDA kernel implementations, revised function signatures across managers, and integrated DMABuf memory mapping.

Changes

Cohort / File(s) Summary
Build Configuration
operators/advanced_network/Dockerfile, operators/advanced_network/advanced_network/CMakeLists.txt, operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt
DOCA version bumped to 3.2.0 with new system dependencies (build-essential, wget, mlnx-dpdk). CUDA compilation settings (CUDA_SEPARABLE_COMPILATION, CUDA_RESOLVE_DEVICE_SYMBOLS) moved from target properties to global scope in CMake files.
GPU Kernel Interface
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h
Public function signatures updated: receiver kernel now uses pkt_gpu_list and pkt_idx_list instead of semaphores; sender kernel accepts address/mkey and max_pkt_size instead of buffer array. Header includes shifted from DOCA device headers to CUDA and stdint.
GPU Kernel Implementation
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu
Major refactoring replacing semaphore-based packet passing with per-block packet info lists. Receiver kernels now use doca_gpu_dev_eth_rxq_recv with per-packet address computation. Sender kernel switched to address-based WQE approach with doca_gpu_dev_eth_txq_wqe_prepare_send. Kernel launch signatures updated to reflect new parameter types.
Manager Header
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h
Added ALIGN_SIZE macro. Struct adv_doca_rx_gpu_info gains status field. DocaRxQueue replaced semaphore methods with packet list lifecycle methods; DocaTxQueue added pkt_mkey field. DocaMgr::init_doca_flow now accepts DOCA device parameter.
Manager Implementation
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp, operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp
Refactored DOCA flow port initialization to accept device parameter. Replaced semaphore handling with packet list patterns throughout RX/TX paths. Integrated DMABuf memory mapping with fallback to nvidia-peermem. Updated flow pipe creation with new RSS configuration. Constructor signatures modified to include doca_flow_port parameter. Added packet list lifecycle methods and host page size alignment utilities.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • Kernel refactoring logic: The replacement of semaphore-based synchronization with packet list semantics in adv_network_doca_kernels.cu involves non-trivial changes to address computation, fence semantics, and per-block state management. Verify correctness of out_first_pkt_idx calculations and packet boundary handling.
  • Function signature propagation: Multiple interconnected signature changes across kernels, headers, and managers (e.g., doca_receiver_packet_kernel, doca_sender_packet_kernel, DocaRxQueue constructor). Ensure all call sites are correctly updated and parameters are passed in the right order and types.
  • DMABuf integration: New memory mapping paths in adv_network_doca_mgr_obj.cpp with fallback logic require validation of both dmabuf and nvidia-peermem paths, particularly around mkey conversion (htobe32) and memory range setup.
  • DOCA API version compatibility: Verify that updated DOCA 3.2.0 API calls (doca_gpu_dev_eth_rxq_recv, doca_gpu_dev_eth_txq_wqe_prepare_send) match expected signatures and behavior from the DOCA 3.2.0 documentation.
  • Flow configuration changes: Updated RSS and match configuration patterns in UDP pipe creation warrant verification against DOCA 3.2.0 flow API expectations.

Suggested reviewers

  • bhashemian
  • mocsharp
  • jjomier

Pre-merge checks

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: upgrading DOCA to version 3.2 and refactoring the GPUNetIO manager with corresponding API and architecture updates.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)
operators/advanced_network/Dockerfile (1)

112-143: Update comments to reflect package changes in the apt-get install list.

Lines 116–119 reference libdoca-sdk-dma-dev which is not in the current install list (lines 124–143). The comments describe the old package configuration, not the new DOCA 3.2 setup. Update the comments to accurately describe: doca-sdk-gpunetio, doca-sdk-eth, doca-sdk-flow, and their respective -dev variants.

Apply this diff to update the comments:

 # APT installs
 # - cublas, cufft, cusolver, curand dev libs: for matx
 # - ninja-build: for cmake build
 # - pkgconf: to import dpdk and doca in CMake
 # - mlnx-dpdk-dev: for dpdk and gpunetio backends
-# - libdoca-sdk-dma-dev: for gpunetio backend (dependency of doca-gpunetio module)
 # - libdoca-sdk-gpunetio-dev: for gpunetio backend (doca-gpunetio module)
+# - doca-sdk-gpunetio: runtime library for gpunetio backend
+# - doca-sdk-eth: DOCA ethernet library
+# - doca-sdk-flow: DOCA flow library
 # - libdoca-sdk-eth-dev: for gpunetio backend (doca-eth module)
 # - libdoca-sdk-flow-dev: for gpunetio backend (doca-flow module)
 # - mlnx-ofed-kernel-utils: utilities, including ibdev2netdev used by tune_system.py
 # - ibverbs-utils: utilities
 # - python3-pyelftools: used by some mlnx tools
 # - python3-dev: for building python bindings
operators/advanced_network/advanced_network/CMakeLists.txt (1)

1-2: Update copyright header to include current year

Compliance check reports the current year is missing; please update to include 2025 (e.g., 2023-2025).

operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt (2)

1-2: Fix copyright header

Same as other files: include the current year (2025) in the SPDX copyright line.


62-71: Add -ldoca_gpunetio_device to link libraries

The CUDA kernels directly invoke GPUNetIO device-side functions including doca_gpu_dev_eth_rxq_recv(), doca_gpu_dev_eth_rxq_get_pkt_addr(), doca_gpu_dev_eth_fence_release(), doca_gpu_dev_eth_txq_reserve_wq_slots(), doca_gpu_dev_eth_txq_get_wqe_ptr(), doca_gpu_dev_eth_txq_wqe_prepare_send(), and doca_gpu_dev_eth_txq_submit(). Per DOCA 3.2 documentation, libdoca_gpunetio_device.a is a separate static library that must be explicitly linked when GPU device functions are required. Add -ldoca_gpunetio_device to the target_link_libraries() section in the CMakeLists.txt.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (2)

1-2: Align copyright header

Same as the other files: update the SPDX copyright line to
include the current year (e.g., 2023-2025) so the compliance
check passes.


1267-1277: Missing free for batch_gpu_list GPU allocation

You allocate batch_gpu_list / batch_cpu_list with doca_gpu_mem_alloc but never call doca_gpu_mem_free on batch_gpu_list when tearing down rx_core, while you do free eth_rxq_gpu_list, pkt_gpu_list, pkt_idx_gpu_list, and gpu_exit_condition. This leaves a DOCA GPU allocation leaked per worker.

Add a matching doca_gpu_mem_free(tparams->gdev, (void*)batch_gpu_list); before returning.

Also applies to: 1411-1416

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (1)

1-16: Fix copyright header to include current year for compliance.

The compliance check is failing because the header only mentions 2023; the current year (2025) must be included.

For example:

- * SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (2)

1-16: Update copyright header to include 2025.

Same as the C++ file, the compliance check requires the current year to be listed in the header.

- * SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

79-198: Persistent RX kernel: add fence to final flush and fix linting issues.

Verification confirms all issues. The fence omission is real: line 179 shows doca_gpu_dev_eth_fence_release() in the main batch-completion block, but line 197 (final flush after loop exit) lacks this call before setting status = READY. Without the fence, the host may observe READY before the counters are visible, violating memory ordering.

Casting and whitespace issues are also confirmed:

  • Line 94: C-style cast triggers cpplint runtime/casting warning
  • Lines 91, 105–107: Tabs instead of spaces

Required fixes:

  1. Add fence before final status write (line 197):
    if (threadIdx.x == 0) {
      DOCA_GPUNETIO_VOLATILE(packets_stats->num_pkts) = DOCA_GPUNETIO_VOLATILE(tot_pkts_batch);
      DOCA_GPUNETIO_VOLATILE(packets_stats->nbytes) = DOCA_GPUNETIO_VOLATILE(rx_pkt_bytes);
  • doca_gpu_dev_eth_fence_release<DOCA_GPUNETIO_ETH_SYNC_SCOPE_SYS>();
    DOCA_GPUNETIO_VOLATILE(packets_stats->status) = DOCA_GPU_SEMAPHORE_STATUS_READY;
    }

2. Replace C-style cast on line 94 with `reinterpret_cast` and split across lines to avoid length warning.

3. Replace tabs with spaces throughout (lines 91, 105–107, and similar).

</blockquote></details>

</blockquote></details>
🧹 Nitpick comments (16)
operators/advanced_network/advanced_network/CMakeLists.txt (1)

51-52: Clarify or remove commented CUDA properties

The commented CUDA_SEPARABLE_COMPILATION / CUDA_RESOLVE_DEVICE_SYMBOLS lines are no-ops here. If these properties are now configured globally, consider removing the comments to avoid confusion; otherwise, set them via set_target_properties or set_property on the relevant targets.

operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt (1)

52-53: Remove or wire up commented CUDA properties

These commented CUDA_SEPARABLE_COMPILATION / CUDA_RESOLVE_DEVICE_SYMBOLS directives do nothing; if CUDA behavior is controlled elsewhere now, consider dropping them, otherwise enable them via proper target properties.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (3)

65-73: ALIGN_SIZE macro is side‑effectful; prefer a helper

ALIGN_SIZE(size, align) both reassigns its argument and evaluates it multiple times. That’s easy to misuse with expressions or side effects. Consider replacing it with an inline function or a safer macro pattern (e.g., do { (size) = ...; } while (0)), or using a typed helper align_up(size_t size, size_t align).


127-159: Clarify semantics of pkt_list_gpu/pkt_list_cpu

DocaRxQueue now has both pkt_list_gpu and pkt_list_cpu, but only pkt_list_gpu is used from the RX core loop (the CPU side). If pkt_list_gpu actually holds a device pointer and pkt_list_cpu holds the host-mapped view, that’s a correctness bug; if both are host-accessible (e.g., due to UVA), the naming is misleading. It would be safer to (a) use the explicit host pointer in CPU code, and (b) rename/add comments to reflect which field is host vs device.


161-187: Update pkt_mkey comment and remove dead fields

pkt_mkey is a DOCA memory key, but the comment still says “GPU memory address”, which is confusing next to gpu_pkt_addr. Also, the buf_arr/buf_arr_gpu members are commented out here but likely still referenced in older code paths; once you’re sure they’re unused, you can remove them (and any supporting code) entirely.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1)

18-22: Tighten CUDA includes to avoid static‑analysis issues

The header now includes both <cuda.h> and <cuda_runtime_api.h>, and clang static analysis is complaining about cuda.h not being found. Since these declarations only use cudaStream_t, <cuda_runtime_api.h> should be sufficient; you can likely drop <cuda.h> from this file (it’s already included via adv_network_doca_mgr.h anyway), which should make the header friendlier to non‑CUDA tooling.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (3)

347-373: Minor robustness and style issues in DOCA flow pipe setup

A few small items worth tightening:

  • fwd.rss.nr_queues = cnt_defq;; (extra semicolon) is harmless but noisy.
  • In create_default_pipe, you assume idxq == cnt_defq by construction; if config ever diverges, rss_queues and cnt_defq could get out of sync. An assertion or sanity check here would help catch that.
  • doca_flow_pipe_control_add_entry in create_root_pipe is now passed multiple nullptr parameters; consider wrapping these in a small helper or at least naming the fields in comments to make future maintenance safer.

Functionally it looks consistent with the new DOCA 3.2 APIs; these are mostly maintainability nits.

Also applies to: 855-867, 943-975


1367-1368: Minor RX logging and index handling nits

  • The debug log uses pkt_cpu_list[pkt_idx_cpu_list[ridx]], which is confusing and likely not what you want to print (it’s a pointer value, not an index). Consider logging pkt_idx_cpu_list[ridx] instead.
  • pkt_idx_cpu_list[ridx] is incremented modulo MAX_DEFAULT_SEM_X_QUEUE; make sure create_rx_packet_list() allocates at least that many adv_doca_rx_gpu_info entries, or add an assertion.

These are small but will make debugging and future changes safer.

Also applies to: 1387-1390


296-296: Address lint warnings: line length and tabs

Lint is flagging:

  • Line 296 and 600 as over 100 characters.
  • Lines 347–349 for tab characters.

These are straightforward to fix by wrapping long log lines and replacing tabs with spaces to match the project’s style and keep CI clean.

Also applies to: 347-349, 600-600

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (4)

31-43: Add proper include for sysconf and consider documenting fallback.

get_host_page_size() uses sysconf(_SC_PAGESIZE) but this file does not include <unistd.h>, which is the canonical declaration site; relying on transitive includes is brittle. Also, a brief comment that 4096 is the intended default Linux page size fallback would help future maintainers.

-#include <sys/time.h>
+#include <sys/time.h>
+#include <unistd.h>  // for sysconf(_SC_PAGESIZE)

73-96: Host-page alignment for RX buffer looks reasonable but confirm with DOCA 3.2 docs.

Calling doca_eth_rxq_estimate_packet_buf_size followed by ALIGN_SIZE(cyclic_buffer_size, get_host_page_size()); before doca_gpu_mem_alloc is consistent with the DMABuf mapping that follows, but DOCA 3.2’s expectations around page size vs. GPU page size can be subtle, especially when mixing DOCA_GPU_MEM_TYPE_GPU and DOCA_GPU_MEM_TYPE_CPU_GPU.

Please double‑check that:

  • The estimated buffer size is allowed to be rounded up to host page size.
  • The GPU_PAGE_SIZE argument to doca_gpu_mem_alloc is still valid for all mtype values you use here.

202-263: Flow queue ID handling and logging are fine, but avoid printf in favor of HOLOSCAN logging.

The new flow_queue_id plumbing (writing rss_queues[0], applying it to the RXQ, then incrementing) is consistent and easy to follow.

Two minor nits:

  • Prefer HOLOSCAN_LOG_INFO over bare printf for "UDP pipe queue %d\n" so log routing/verbosity stays consistent with the rest of Holoscan.
  • Since DOCA Flow has evolved a bit between releases, please confirm that:
    • htons(cfg.match_.udp_src_/dst_) is the expected byte order for match.outer.udp.l4_port.*.
    • fwd.rss_type = DOCA_FLOW_RESOURCE_TYPE_NON_SHARED; with nr_queues = 1 is the right way to express “no RSS, single queue” in DOCA 3.2.

303-327: create_rx_packet_list / destroy_rx_packet_list look correct; confirm page-size choice and call sites.

The allocation/free pair is straightforward and uses DOCA_GPU_MEM_TYPE_CPU_GPU, which matches the intended “CPU‑visible from GPU” semantics.

A couple of follow‑ups to consider:

  • For DOCA_GPU_MEM_TYPE_CPU_GPU, verify in DOCA 3.2 docs that GPU_PAGE_SIZE is still the correct page size argument; some platforms expect host page size here.
  • If these helpers can be called multiple times per queue lifetime, you may want basic guards to avoid double‑free or leaks (e.g., track whether pkt_list_gpu is already allocated and null it out on free).
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (3)

210-305: Non‑persistent RX kernel: address‑of‑cast and fence pattern; consider explicit error signalling.

For receive_packets_kernel_non_persistent, the new address‑based RX path is consistent with the persistent kernel and looks fine overall.

A few suggestions:

  • Fix the cpplint runtime/casting warning similarly to the persistent kernel:

  • struct adv_doca_rx_gpu_info *packets_stats = &((struct adv_doca_rx_gpu_info *)pkt_gpu_list[blockIdx.x])[pkt_idx_list[blockIdx.x]];

  • auto* pkt_array =
  •  reinterpret_cast<adv_doca_rx_gpu_info*>(pkt_gpu_list[blockIdx.x]);
    
  • adv_doca_rx_gpu_info* packets_stats =
  •  &pkt_array[pkt_idx_list[blockIdx.x]];
    
    
    
  • The fence/write‑ready sequence (nbytes/num_pkts, then doca_gpu_dev_eth_fence_release, then status = READY) matches DOCA guidance, which is good; keep this pattern consistent with the persistent kernel (see previous comment).

  • On doca_gpu_dev_eth_rxq_recv error you set out_pkt_num = 0 but do not update packets_stats->status. If host code polls only on status == READY, it may hang while this kernel has already encountered an error. Consider setting an explicit error status or signalling via a separate flag so the CPU side can handle this cleanly.


315-384: TX kernel: address computation makes sense; add basic reserve error handling and address lint issues.

The reworked send_packets_kernel that uses pkt_buff_addr / pkt_buff_mkey and max_pkt_size looks coherent:

  • addr = pkt_buff_addr + max_pkt_size * ((pkt_idx + gpu_pkt0_idx) % max_pkts) correctly walks a circular buffer.
  • Using base_wqe_idx from doca_gpu_dev_eth_txq_reserve_wq_slots and doca_gpu_dev_eth_txq_get_wqe_ptr is the right pattern.

Two recommended improvements:

  • If doca_gpu_dev_eth_txq_reserve_wq_slots can signal failure (e.g., invalid index), guard against that to avoid writing out of bounds:

  • if (threadIdx.x == 0)

  • base_wqe_idx = doca_gpu_dev_eth_txq_reserve_wq_slots(txq, num_pkts);

  • if (threadIdx.x == 0)
  • base_wqe_idx = doca_gpu_dev_eth_txq_reserve_wq_slots(txq, num_pkts);
    __syncthreads();
  • if (base_wqe_idx == DOCA_GPUNETIO_ETH_INVALID_WQE_IDX) {
  • return;
  • }

(Adjust the sentinel according to the DOCA 3.2 API.)

- Cpplint warnings: tabs and long debug `printf` lines in the `#if DOCA_DEBUG_KERNEL == 2` block—replace tabs with spaces and wrap long lines as needed.

---

`388-452`: **Host wrappers: new kernel signatures are consistent; no functional issues spotted.**

The updated `doca_receiver_packet_kernel` and `doca_sender_packet_kernel` correctly forward the new parameters (packet list pointers, buffer base address, mkey, max packet size) to the respective kernels, and CUDA error checks remain in place.

The only minor suggestion is to keep debug‑style comments (`// fprintf(...)`) either removed or behind a debug macro to avoid confusion.

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 09d588c324cb2e6a246572719c55ec749112cac2 and e20ad5065f270a5b0165e11ded86112673c32c10.

</details>

<details>
<summary>📒 Files selected for processing (8)</summary>

* `operators/advanced_network/Dockerfile` (3 hunks)
* `operators/advanced_network/advanced_network/CMakeLists.txt` (1 hunks)
* `operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt` (1 hunks)
* `operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu` (11 hunks)
* `operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h` (1 hunks)
* `operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp` (25 hunks)
* `operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h` (6 hunks)
* `operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp` (12 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (1)</summary>

<details>
<summary>📚 Learning: 2025-10-22T23:47:42.896Z</summary>

Learnt from: cdinea
Repo: nvidia-holoscan/holohub PR: 1170
File: applications/video_streaming/video_streaming_server/cpp/CMakeLists.txt:109-111
Timestamp: 2025-10-22T23:47:42.896Z
Learning: In the video streaming server application (applications/video_streaming/video_streaming_server/cpp/CMakeLists.txt), bundling libcudart.so.12 from the NGC operator binaries is intentional to ensure consistency with NGC binaries, even though the target links to CUDA::cudart.


**Applied to files:**
- `operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt`
- `operators/advanced_network/advanced_network/CMakeLists.txt`

</details>

</details><details>
<summary>🧬 Code graph analysis (3)</summary>

<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (2)</summary><blockquote>

<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (2)</summary>

* `create_udp_pipe` (202-301)
* `create_udp_pipe` (202-204)

</details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (2)</summary>

* `init_doca_flow` (296-378)
* `init_doca_flow` (296-296)

</details>

</blockquote></details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1)</summary><blockquote>

<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (4)</summary>

* `doca_receiver_packet_kernel` (388-423)
* `doca_receiver_packet_kernel` (388-391)
* `doca_sender_packet_kernel` (425-460)
* `doca_sender_packet_kernel` (425-430)

</details>

</blockquote></details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (1)</summary><blockquote>

<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (2)</summary>

* `doca_receiver_packet_kernel` (388-423)
* `doca_receiver_packet_kernel` (388-391)

</details>

</blockquote></details>

</details><details>
<summary>🪛 Clang (14.0.6)</summary>

<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h</summary>

[error] 19-19: 'cuda.h' file not found

(clang-diagnostic-error)

</details>

</details>
<details>
<summary>🪛 GitHub Actions: Check Compliance</summary>

<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt</summary>

[error] 1-1: Copyright header incomplete: current year not included in the header.

</details>
<details>
<summary>operators/advanced_network/advanced_network/CMakeLists.txt</summary>

[error] 1-1: Copyright header incomplete: current year not included in the header.

</details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h</summary>

[error] 1-1: Copyright header incomplete: current year not included in the header.

</details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu</summary>

[error] 1-1: Copyright header incomplete: current year not included in the header.

</details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h</summary>

[error] 1-1: Copyright header incomplete: current year not included in the header.

</details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp</summary>

[error] 1-1: Copyright header incomplete: current year not included in the header.

</details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp</summary>

[error] 1-1: Copyright header incomplete: current year not included in the header.

</details>

</details>
<details>
<summary>🪛 GitHub Actions: Check linting</summary>

<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu</summary>

[warning] 91-91: Tab found; better to use spaces  [whitespace/tab]

---

[warning] 94-94: Lines should be <= 100 characters long  [whitespace/line_length]

---

[warning] 94-94: Are you taking an address of a cast?  This is dangerous: could be a temp var.  Take the address before doing the cast, rather than after  [runtime/casting]

---

[warning] 105-105: Tab found; better to use spaces  [whitespace/tab]

---

[warning] 106-106: Tab found; better to use spaces  [whitespace/tab]

---

[warning] 107-107: Tab found; better to use spaces  [whitespace/tab]

---

[warning] 107-107: Lines should be <= 100 characters long  [whitespace/line_length]

---

[warning] 160-160: Should have a space between // and comment  [whitespace/comments]

---

[warning] 161-161: Lines should be <= 100 characters long  [whitespace/line_length]

---

[warning] 185-185: Are you taking an address of a cast?  This is dangerous: could be a temp var.  Take the address before doing the cast, rather than after  [runtime/casting]

---

[warning] 211-211: Lines should be <= 100 characters long  [whitespace/line_length]

---

[warning] 223-223: Tab found; better to use spaces  [whitespace/tab]

---

[warning] 223-223: Lines should be <= 100 characters long  [whitespace/line_length]

</details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp</summary>

[warning] 296-296: Lines should be <= 100 characters long  [whitespace/line_length]

---

[warning] 347-347: Tab found; better to use spaces  [whitespace/tab]

---

[warning] 348-348: Tab found; better to use spaces  [whitespace/tab]

---

[warning] 349-349: Tab found; better to use spaces  [whitespace/tab]

---

[warning] 349-349: Lines should be <= 100 characters long  [whitespace/line_length]

---

[warning] 600-600: Lines should be <= 100 characters long  [whitespace/line_length]

</details>

</details>
<details>
<summary>🪛 Hadolint (2.14.0)</summary>

<details>
<summary>operators/advanced_network/Dockerfile</summary>

[error] 108-108: unexpected 'h'
expecting '#', '\', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input

(DL1000)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (9)</summary><blockquote>

<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (1)</summary><blockquote>

`243-281`: **Ensure global flow_queue_id and pipe_queues config align with DOCA expectations**

`init_doca_flow(struct doca_dev* dev, uint16_t port_id, uint8_t rxq_num)` and the class member `flow_queue_id` together drive queue IDs across ports. Because `doca_flow_init` is guarded by `static bool flow_init`, `pipe_queues` is set only from the first `rxq_num`, while `flow_queue_id` continues increasing as you add UDP pipes and default queues on subsequent ports. Please confirm with DOCA 3.2 docs that:
- `pipe_queues` is meant to be per‑process, not per‑port, and
- the total number of queues created across all ports will not exceed the configured `pipe_queues` range.

If not, you may need to precompute a global RX queue count and/or reset `flow_queue_id` per port.

</blockquote></details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1)</summary><blockquote>

`28-38`: **Kernel prototypes match implementation**

The updated signatures for `doca_receiver_packet_kernel` and `doca_sender_packet_kernel` line up with the `.cu` implementation (added `pkt_gpu_list` / `pkt_idx_list`, and new `pkt_buff_addr`/`pkt_buff_mkey`/`max_pkt_size` parameters), so external callers are consistent.

</blockquote></details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (6)</summary><blockquote>

`264-277`: **Verify removal of doca_dpdk_port_probe still enables desired flow mode**

`doca_dpdk_port_probe(ddev[intf.port_id_], "dv_flow_en=2")` is now commented out. With DOCA 3.2 and the new `doca_flow_port_cfg_set_dev` path, this may indeed be redundant, but if you still rely on DV/HWS mode you should confirm that it is now enabled solely via the new flow API calls.

---

`296-378`: **Check DOCA flow global configuration vs per‑port queue usage**

`init_doca_flow()` now:
- calls `doca_flow_cfg_set_pipe_queues(rxq_flow_cfg, rxq_num)` only on the first invocation, and
- uses a single `flow_queue_id` counter across all ports when assigning queue IDs in `create_default_pipe()` and `DocaRxQueue::create_udp_pipe()`.

If you have multiple ports or if the sum of (default queues + flow queues) across ports can exceed the first port’s `rxq_num`, you might end up using queue IDs beyond the configured `pipe_queues`, which DOCA could reject or mis-handle. It may be safer to precompute a global RX queue count (or maximum per port) and use that when initializing DOCA Flow.



Also applies to: 786-846

---

`569-576`: **Device‑aware flow initialization and queue wiring look coherent**

The new path:
- creates `doca_gpu` handles per GPU after `cudaSetDevice/cudaFree(0)`,
- initializes a DOCA flow port via `init_doca_flow(ddev[port], port_id, rxq_count)`,
- passes that `df_port` into each `DocaRxQueue`, and
- wires per‑flow UDP pipes via `create_udp_pipe(flow, rxq_pipe_default, flow_queue_id)` and packet lists via `create_rx_packet_list()`,

is internally consistent with the `adv_network_doca_mgr_obj.cpp` snippet you provided. As long as the DOCA side queue ID and `df_port` semantics match the new 3.2 APIs, this structure looks sound.




Also applies to: 598-605, 663-670, 720-737

---

`1489-1492`: **TX kernel warmup and new sender arguments look correct**

The warmup call and the main `doca_sender_packet_kernel` invocation match the updated signature (stream, txq, `gpu_pkt_addr`, `pkt_mkey`, indices, `max_pkt`, `max_pkt_size`, lengths, `set_completion`). As long as `pkt_mkey` and `gpu_pkt_addr` are set appropriately in `DocaTxQueue` construction, this TX path aligns with the new DOCA 3.2 sender kernel.




Also applies to: 1518-1527

---

`1024-1027`: **Destructor complements new CUDA host allocations**

You now `cudaMallocHost` `burst[idx].pkt_lens[0]` in `initialize()` and free them in `~DocaMgr()` with `cudaFreeHost`, which resolves the prior leak of these pinned buffers. That cleanup looks correct and balanced.



Also applies to: 741-757

---

`1783-1795`: **Graceful shutdown logging is a nice addition**

The extra logs in `shutdown()` (“stopping cores” and “exit gracefully”) will help correlate teardown behavior with RX/TX worker thread joins in logs; no issues from a correctness perspective.

</blockquote></details>
<details>
<summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (1)</summary><blockquote>

`481-538`: **Mirror DMABuf fallback fix in TX path and tighten `mkey` error handling.**

The TX queue’s DMABuf logic has the same fallback issue as RX: when `doca_mmap_set_dmabuf_memrange` fails, `dmabuf_fd` stays non‑negative so the legacy `doca_mmap_set_memrange` path is skipped.

Updating it similarly will make the behavior robust:

```diff
-      result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap,
-                    dmabuf_fd,
-                    gpu_pkt_addr,
-                    0,
-                    tx_buffer_size);
-      if (result != DOCA_SUCCESS) {
-        HOLOSCAN_LOG_ERROR("Failed to set dmabuf memrange for mmap {}", doca_error_get_descr(result));
-      }
+      result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap,
+                                             dmabuf_fd,
+                                             gpu_pkt_addr,
+                                             0,
+                                             tx_buffer_size);
+      if (result != DOCA_SUCCESS) {
+        HOLOSCAN_LOG_ERROR("Failed to set dmabuf memrange for mmap {}", doca_error_get_descr(result));
+        dmabuf_fd = -1;  // Trigger nvidia-peermem fallback below
+      }

Additionally, doca_mmap_get_mkey failure currently just logs and leaves pkt_mkey potentially uninitialized but still used. Given that the mkey is critical for NIC access, it’d be safer to treat this as fatal (e.g., log as critical and ensure the queue is not used until creation is retried or aborted).

Comment on lines +102 to +133
dmabuf_fd = -1;
if (mtype == DOCA_GPU_MEM_TYPE_GPU) {
/* Map GPU memory buffer used to send packets with DMABuf */
result = doca_gpu_dmabuf_fd(gdev, gpu_pkt_addr, cyclic_buffer_size, &(dmabuf_fd));
if (result == DOCA_SUCCESS) {
HOLOSCAN_LOG_INFO("Mapping send queue buffer ({} size {}B dmabuf fd {}) with dmabuf mode",
gpu_pkt_addr,
cyclic_buffer_size,
dmabuf_fd);

result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap,
dmabuf_fd,
gpu_pkt_addr,
0,
cyclic_buffer_size);

/* If failed, use nvidia-peermem legacy method */
result = doca_mmap_set_memrange(pkt_buff_mmap, gpu_pkt_addr, cyclic_buffer_size);
if (result != DOCA_SUCCESS) {
HOLOSCAN_LOG_CRITICAL("Failed to set memrange for mmap {}", doca_error_get_descr(result));
if (result != DOCA_SUCCESS) {
HOLOSCAN_LOG_ERROR("Failed to set dmabuf memrange for mmap {}", doca_error_get_descr(result));
}
}
}
/*
} else {
HOLOSCAN_LOG_INFO("Mapping receive queue buffer (0x{} size {}B dmabuf fd {}) with
dmabuf mode", gpu_pkt_addr, cyclic_buffer_size, dmabuf_fd);
result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap, dmabuf_fd, gpu_pkt_addr, 0,
cyclic_buffer_size); if (result != DOCA_SUCCESS) { HOLOSCAN_LOG_CRITICAL("Failed to set dmabuf
memrange for mmap {}", doca_error_get_descr(result));
}
}
*/

if (dmabuf_fd == -1) {
HOLOSCAN_LOG_INFO("Mapping send queue buffer ({} size {}B) with nvidia-peermem mode",
gpu_pkt_addr,
cyclic_buffer_size);

/* If failed, use nvidia-peermem legacy method */
result = doca_mmap_set_memrange(pkt_buff_mmap, gpu_pkt_addr, cyclic_buffer_size);
if (result != DOCA_SUCCESS) {
HOLOSCAN_LOG_ERROR("Failed to set memrange for mmap {}", doca_error_get_descr(result));
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Enable peermem fallback when DMABuf set_dmabuf_memrange fails.

Right now, if doca_gpu_dmabuf_fd succeeds but doca_mmap_set_dmabuf_memrange fails, dmabuf_fd stays non‑negative, so the peermem fallback (doca_mmap_set_memrange) is skipped and the mmap is left misconfigured. That defeats the purpose of the fallback path.

Reset dmabuf_fd on failure so the legacy path is used:

-      result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap,
-                    dmabuf_fd,
-                    gpu_pkt_addr,
-                    0,
-                    cyclic_buffer_size);
-      if (result != DOCA_SUCCESS) {
-        HOLOSCAN_LOG_ERROR("Failed to set dmabuf memrange for mmap {}", doca_error_get_descr(result));
-      }
+      result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap,
+                                             dmabuf_fd,
+                                             gpu_pkt_addr,
+                                             0,
+                                             cyclic_buffer_size);
+      if (result != DOCA_SUCCESS) {
+        HOLOSCAN_LOG_ERROR("Failed to set dmabuf memrange for mmap {}", doca_error_get_descr(result));
+        dmabuf_fd = -1;  // Trigger nvidia-peermem fallback below
+      }

Also, the log messages talk about “send queue buffer” here even though this is the RX queue; consider updating the text for clarity.


I'll verify the review comment by examining the actual code file and confirming the logic flow.


---

🏁 Script executed:

```shell
#!/bin/bash
# View the file context around lines 102-133 to verify current state and surrounding code
cat operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp | head -150 | tail -80

Length of output: 2899


🏁 Script executed:

#!/bin/bash
# Search for dmabuf_fd usage throughout the file to understand control flow
rg -n "dmabuf_fd" operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp -B2 -A2

Length of output: 1586


🏁 Script executed:

#!/bin/bash
# Check context around this function to determine if it's RX or TX queue
rg -n -B20 "doca_gpu_dmabuf_fd" operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp | head -40

Length of output: 1621


Reset dmabuf_fd on failure to enable the peermem fallback.

The bug is confirmed: if doca_mmap_set_dmabuf_memrange fails at lines 112-120, dmabuf_fd remains non-negative, so the fallback check at line 123 (if (dmabuf_fd == -1)) evaluates false and skips the doca_mmap_set_memrange call. The mmap is left misconfigured.

Add dmabuf_fd = -1; in the error handler (after line 120) to trigger the fallback.

The same bug exists in the TX queue setup around line 499-507 and should be fixed identically.

Also, the log messages at lines 107 and 124 incorrectly say "send queue buffer" for the RX queue—update to "receive queue buffer" or "RX queue buffer" for clarity.

🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp
around lines 102-133, when doca_mmap_set_dmabuf_memrange fails you must reset
dmabuf_fd = -1 in that error path so the peermem fallback executes; update the
error handler immediately after the failing call to set dmabuf_fd = -1 and keep
the existing error log. Apply the identical fix to the TX queue setup around
lines ~499-507 (reset its dmabuf_fd on doca_mmap_set_dmabuf_memrange failure so
the fallback doca_mmap_set_memrange runs). Also correct the log messages at
lines ~107 and ~124 to say "receive queue buffer" or "RX queue buffer" instead
of "send queue buffer".

Comment on lines +1169 to 1173
uintptr_t *pkt_cpu_list, *pkt_gpu_list;
uint32_t *pkt_idx_cpu_list, *pkt_idx_gpu_list;
uint32_t *batch_cpu_list, *batch_gpu_list;
uint32_t *cpu_exit_condition, *gpu_exit_condition;
// int sem_idx[MAX_NUM_RX_QUEUES] = {0};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Potential misuse of GPU vs CPU pointers for RX packet stats (likely crash)

In rx_core:

  • You allocate per‑queue pointer arrays with
    • doca_gpu_mem_alloc(..., &pkt_gpu_list, &pkt_cpu_list);
  • You then set:
    • pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_gpu;
  • Later, on the CPU, you do:
    • packets_stats = &((adv_doca_rx_gpu_info*)(pkt_cpu_list[ridx]))[pkt_idx_cpu_list[ridx]];
    • and read/write packets_stats->status, num_pkts, etc.

If DocaRxQueue::pkt_list_gpu is indeed a device pointer (as the name and presence of a separate pkt_list_cpu member suggest), this is dereferencing a GPU address on the host, which is undefined behavior and will typically crash. It also leaves the host-mapped pkt_list_cpu member unused in this function.

I’d strongly recommend:

  • Using pkt_list_cpu in rx_core for the host view:
-    pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_gpu;
+    pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_cpu;
  • Verifying that the GPU kernel sees the correct (device) base addresses for the same arrays via pkt_gpu_list.

Until this is confirmed/fixed, the RX packet-list path is very likely incorrect.

Also applies to: 1243-1265, 1279-1284, 1338-1390, 1399-1403, 1411-1415


I'll verify this critical issue by examining the codebase structure and the actual code patterns mentioned. Let me start with investigation scripts:


---

🏁 Script executed:

```shell
#!/bin/bash
# Find and display DocaRxQueue structure definition
rg -n "struct DocaRxQueue|class DocaRxQueue" -A 15 --type=cpp --type=h

Length of output: 2253


🏁 Script executed:

#!/bin/bash
# Extract the exact code at the flagged primary lines 1169-1173
sed -n '1165,1180p' operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp

Length of output: 553


🏁 Script executed:

#!/bin/bash
# Look for the assignment pattern: pkt_cpu_list[idx] = ... pkt_list_gpu
rg -n "pkt_cpu_list\[.*\]\s*=.*pkt_list" -C 3 --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/

Length of output: 935


🏁 Script executed:

#!/bin/bash
# Look for CPU-side dereferencing of pkt_cpu_list: packets_stats = &((...pkt_cpu_list...)[...])
rg -n "packets_stats\s*=.*pkt_cpu_list|pkt_cpu_list.*\[.*\]" -C 3 --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/

Length of output: 3999


🏁 Script executed:

#!/bin/bash
# Check all flagged line ranges for the pattern
for range in "1243,1265" "1279,1284" "1338,1390" "1399,1403" "1411,1415"; do
  IFS=',' read start end <<< "$range"
  echo "=== Lines $start-$end ==="
  sed -n "${start},${end}p" operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp | head -20
done

Length of output: 2893



---

🏁 Script executed:

```shell
#!/bin/bash
# Get full DocaRxQueue structure definition
rg -n "struct DocaRxQueue|class DocaRxQueue" -A 50 --type=h operators/advanced_network/advanced_network/managers/gpunetio/

Length of output: 7564


🏁 Script executed:

#!/bin/bash
# Search for pkt_list_cpu vs pkt_list_gpu in DocaRxQueue
rg -n "pkt_list_gpu|pkt_list_cpu" --type=cpp --type=h operators/advanced_network/advanced_network/managers/gpunetio/ -B 2 -A 2

Length of output: 3244


🏁 Script executed:

#!/bin/bash
# Look for GPU kernel code that might use pkt_gpu_list or pkt_list_gpu
rg -n "__global__|__device__" -A 20 --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/ | grep -A 20 -E "pkt_|kernel"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check how pkt_list_gpu is initialized in DocaRxQueue constructor or create_rx_packet_list
rg -n "pkt_list_gpu\s*=" -B 5 -A 5 --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/

Length of output: 0



---

🏁 Script executed:

```shell
#!/bin/bash
# Confirm pkt_gpu_list is passed to GPU kernel (checking kernel launch or params)
rg -n "pkt_gpu_list" -B 3 -A 3 --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp | head -50

Length of output: 2243


🏁 Script executed:

#!/bin/bash
# Verify similar misuse at secondary lines - check line 1243 context for allocation
sed -n '1240,1275p' operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp

Length of output: 1569


🏁 Script executed:

#!/bin/bash
# Double-check if pkt_list_cpu is ever used in rx_core (should be used instead of pkt_list_gpu)
rg -n "pkt_list_cpu" --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp

Length of output: 0


Fix CPU-side dereferencing of GPU pointers (lines 1281, and related)

The review correctly identifies a critical pointer bug. DocaRxQueue provides both pkt_list_gpu and pkt_list_cpu members from unified memory (DOCA_GPU_MEM_TYPE_CPU_GPU at adv_network_doca_mgr_obj.cpp:308). At line 1281, the code mistakenly stores pkt_list_gpu (GPU-addressable view) into the CPU-side array:

pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_gpu;

The CPU then dereferences this at lines 1338 and 1399, reading from packets_stats->status and ->num_pkts. This dereferences a GPU pointer on the host, which is undefined behavior and will crash.

Use pkt_list_cpu instead—the host-mapped view designed for CPU access:

-    pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_gpu;
+    pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_cpu;

The GPU kernel correctly receives pkt_gpu_list at lines 1311 and 1324 for device-side operations.

Comment on lines +107 to +110
RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get update
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix line continuation syntax error in wget URL.

Line 107 breaks the URL without a line continuation character (\). The wget invocation is malformed: the URL ends with doca and line 108 orphans host_3.2.0.... This causes a Hadolint syntax error (DL1000).

Apply this diff to fix the URL:

-RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
+RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \

If the line is too long, use a backslash escape:

-RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
+RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca- \
+    host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get update
RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get update
🧰 Tools
🪛 Hadolint (2.14.0)

[error] 108-108: unexpected 'h'
expecting '#', '', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input

(DL1000)

🤖 Prompt for AI Agents
In operators/advanced_network/Dockerfile around lines 107 to 110 the wget URL is
split across lines without a backslash, leaving "doca" on one line and orphaning
"host_3.2.0..." which breaks the command and triggers Hadolint DL1000; fix it by
joining the URL into a single continuous string or add a line-continuation
backslash at the end of the first line so wget receives the full URL, then keep
the subsequent && apt-get install -y ./doca-host_3.2.0... and && apt-get update
on following lines as before.

@coderabbitai
Copy link

coderabbitai bot commented Nov 24, 2025

Walkthrough

DOCA version upgraded from 2.8.0 to 3.2.0 with corresponding dependencies updated. GPU kernel implementations refactored to transition from semaphore-based to packet-list-based packet tracking, with kernel signatures updated to use address-based buffer access and per-block packet statistics. CUDA compilation flags restructured, and DOCA flow management APIs updated accordingly.

Changes

Cohort / File(s) Summary
Build environment and dependencies
operators/advanced_network/Dockerfile
DOCA version bumped to 3.2.0 with new host package installation step. Added build-essential and wget to common-deps layer. Expanded DOCA package list: replaced libdoca-sdk-dma-dev with doca-sdk-gpunetio-dev, doca-sdk-eth-dev, doca-sdk-flow-dev; added mlnx-dpdk and corresponding -dev variants.
CMake build configuration
operators/advanced_network/advanced_network/CMakeLists.txt, operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt
Moved CUDA_SEPARABLE_COMPILATION and CUDA_RESOLVE_DEVICE_SYMBOLS from per-target set_target_properties to global scope. Removed -ldoca_gpunetio_device static library linkage.
GPU kernel headers
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h
Receiver kernel signature: replaced semaphore parameters (eth_rxq_gpu, sem_gpu, sem_idx_list) with packet-list parameters (pkt_gpu_list, pkt_idx_list). Sender kernel signature: removed buf_arr parameter, added explicit packet buffer addressing (pkt_buff_addr, pkt_buff_mkey, max_pkt_size). Updated includes from DOCA-specific headers to CUDA headers.
GPU kernel implementation
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu
Transitioned receive path from semaphore-based signaling to per-block packet tracking via pkt_gpu_list/pkt_idx_list. Updated packet indexing to use out_first_pkt_idx and out_pkt_num. Replaced buffer retrieval with address-based access via doca_gpu_dev_eth_rxq_get_pkt_addr. Refactored send path to use WQE-based queue submission with pkt_buff_addr and max_pkt_size. Updated completion signaling with DOCA GPU fence/release semantics.
DOCA manager header
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h
Added flow_queue_id and per-queue packet list members (pkt_list_gpu, pkt_list_cpu) to DocaRxQueue. Removed semaphore-related members from DocaRxQueue and buffer array members from DocaTxQueue. Added pkt_mkey to DocaTxQueue. Added status field to adv_doca_rx_gpu_info. Updated init_doca_flow signature to include doca_dev parameter. Updated create_udp_pipe signature to include flow_queue_id reference.
DOCA manager implementation
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp, operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp
Updated init_doca_flow to accept doca_dev parameter. Refactored flow port creation to use doca_flow_port_cfg_set_port_id and doca_flow_port_cfg_set_dev. Replaced semaphore-based GPU memory management with packet-list structures. Introduced host page size helper for buffer alignment. Added DMABuf mapping logic with fallback to legacy mmap. Implemented create_rx_packet_list() and destroy_rx_packet_list() methods. Updated UDP pipe flow configuration with new parser_meta and RSS structures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Areas requiring extra attention:

  • Kernel signature and implementation changes: Verify correctness of the transition from semaphore-based to packet-list-based packet tracking, especially in receive/send paths and synchronization semantics (adv_network_doca_kernels.cu/h).
  • Memory management refactor: Review new DMABuf mapping logic, page alignment with host page size, and packet list allocation/deallocation patterns (adv_network_doca_mgr_obj.cpp).
  • DOCA API version compatibility: Validate all DOCA 3.2.0 API calls, particularly flow port setup, queue configuration, and WQE-based send operations (adv_network_doca_mgr.cpp).
  • GPU synchronization and ordering: Confirm that DOCA GPU fence/release calls and per-block statistics writes maintain correct memory ordering (adv_network_doca_kernels.cu).
  • Interdependent signature changes: Cross-check kernel declarations, definitions, and all call sites match updated parameter lists throughout the codebase.

Suggested reviewers

  • jjomier
  • mocsharp
  • bhashemian

Pre-merge checks

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'New ANO GPUNetIO manager with DOCA 3.2' accurately captures the main change: upgrading the DOCA version from 2.8.0 to 3.2.0 and refactoring the GPUNetIO manager components accordingly.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
operators/advanced_network/advanced_network/CMakeLists.txt (1)

1-1: Update copyright year in header.

The copyright header lists 2023, but pipeline compliance requires the current year. Update to include 2025.

-# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt (1)

1-1: Update copyright year in header.

The copyright header lists 2023, but pipeline compliance requires the current year. Update to include 2025.

-# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (2)

67-73: Ensure adv_doca_rx_gpu_info::status is initialized after allocation

status is read on the CPU side in rx_core to detect DOCA_GPU_SEMAPHORE_STATUS_READY, but the packet-list allocation path (see create_rx_packet_list in adv_network_doca_mgr_obj.cpp) does not currently initialize these entries. That risks undefined initial values occasionally comparing equal to READY, leading to bogus bursts being processed.

Please zero‑initialize the packet list (or explicitly set status = DOCA_GPU_SEMAPHORE_STATUS_FREE) immediately after allocation.


127-159: Critical: Memory leak in pkt_list_gpu/pkt_list_cpu and architectural issue with flow_queue_id scope

The destructor (~DocaRxQueue, lines 178–200 of adv_network_doca_mgr_obj.cpp) never calls destroy_rx_packet_list(), and DocaMgr::shutdown() (lines 1778–1795) does not iterate through the rx queue map to clean up individual queues. Since create_rx_packet_list() allocates GPU/CPU memory via doca_gpu_mem_alloc() (line 308), this memory leaks on every queue destruction.

Additionally, flow_queue_id is a global counter in DocaMgr (line 280) shared across all ports and incremented by both create_udp_pipe() and create_default_pipe(). However, DOCA queue IDs are local to each port/engine instance, not globally unique. Using a single counter for all ports violates DOCA's per-port semantics and will cause incorrect queue assignment.

Fixes required:

  • Call destroy_rx_packet_list() in ~DocaRxQueue() destructor.
  • Ensure DocaMgr::shutdown() cleans up all allocated queues via the rx_q_map_.
  • Replace the global flow_queue_id counter with a per-port counter to match DOCA semantics.
  • Line 132 exceeds the 100-character lint limit (currently 120 chars); wrap the create_udp_pipe declaration.
  • Update SPDX header (line 2) to include 2025: change 2023 to 2023-2025.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (3)

710-737: Handle create_udp_pipe / create_rx_packet_list failures

q_backend->create_udp_pipe(flow, rxq_pipe_default, flow_queue_id); and q_backend->create_rx_packet_list(); both return doca_error_t but their results are ignored. If either fails, the RX worker will later assume valid pipes and packet lists, leading to hard‑to‑diagnose runtime failures.

Please:

  • Capture and check the return values.
  • Log and abort initialization (or at least skip that queue) on error.

1398-1415: Reuse pkt_list_cpu pointer for post‑loop checks

The post‑loop flush uses the same pattern:

packets_stats =
    &((struct adv_doca_rx_gpu_info *)(pkt_cpu_list[ridx]))[pkt_idx_cpu_list[ridx]];
status = DOCA_GPUNETIO_VOLATILE(packets_stats->status);

This will be correct once pkt_cpu_list is fixed to hold pkt_list_cpu, but until then it shares the same host/GPU pointer bug. Ensure the earlier fix is applied here as well.


1158-1183: RX worker: host uses GPU pointer instead of CPU alias for packet list

In rx_core at line 1281, pkt_cpu_list is assigned the GPU pointer (pkt_list_gpu) instead of the CPU alias (pkt_list_cpu):

pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_gpu;  // WRONG

This pointer is then dereferenced on the host at lines 1338 and 1399 when accessing packets_stats. Since doca_gpu_mem_alloc allocates both GPU and CPU pointers for the same memory region (as confirmed in the allocation at adv_network_doca_mgr_obj.cpp:308), dereferencing the GPU address on the host is undefined behavior and will read from invalid memory.

Fix:

pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_cpu;

Also update the debug log at line 1367 to avoid printing the pointer as an integer.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (1)

1-16: Update SPDX years for compliance

This file’s SPDX header only mentions 2023; the compliance check expects the current year (2025) to be included as well (e.g., 2023-2025).

Same applies to the other touched files in this PR.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (2)

1-16: Update SPDX years and address lint complaints

As in other files, add 2025 to the SPDX copyright line.

Also, the lint warnings about tabs, long lines, and comment spacing (//Eth + IP ...) can be fixed mechanically to keep CI green.


210-305: Move null-check before dereferencing eth_rxq_gpu in both kernels; ensure device-side pkt_idx_gpu_list is zero-initialized

The null-check concern applies to both persistent and non-persistent kernels. In both, eth_rxq_gpu[blockIdx.x] is dereferenced before the if (eth_rxq_gpu == NULL) guard, creating a race condition. Reorder the check before the dereference.

Regarding pkt_idx_gpu_list: The CPU view is initialized at line 1282 in rx_core, but the GPU view allocated by doca_gpu_mem_alloc() is not explicitly zeroed. Absent documentation confirming zero-initialization by the DOCA library, add an explicit cudaMemset() on the GPU memory after allocation to ensure the device-side array is initialized before kernel execution.

🧹 Nitpick comments (13)
operators/advanced_network/Dockerfile (1)

107-110: Consider parameterizing the hardcoded artifact URL and version string.

The download URL and .deb filename are tightly coupled to a specific DOCA build artifact version (3.2.0-122000-25.10). This introduces maintenance and reproducibility challenges.

Consider extracting these as build arguments or derived variables:

 ARG DOCA_VERSION=3.2.0
+ARG DOCA_BUILD_RELEASE=122000
+ARG DOCA_BUILD_DATE=25.10

-RUN wget ... doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
-    && apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
+RUN wget -q https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-${DOCA_VERSION}/doca-host_${DOCA_VERSION}-${DOCA_BUILD_RELEASE}-${DOCA_BUILD_DATE}-ubuntu2204_amd64.deb \
+    && apt-get install -y ./doca-host_${DOCA_VERSION}-${DOCA_BUILD_RELEASE}-${DOCA_BUILD_DATE}-ubuntu2204_amd64.deb \

This allows easier bumping of DOCA releases and build identifiers without editing the RUN command inline.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1)

28-38: Consider const‑qualifying read‑only kernel parameters

The updated declarations match the implementations, but several pointers (pkt_gpu_list, pkt_idx_list, batch_list, gpu_pkts_len) are not mutated by the kernels and could be marked const at the interface to better document ownership and avoid accidental writes later.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (3)

65-65: ALIGN_SIZE macro is non‑idiomatic and assumes simple lvalues

ALIGN_SIZE(size, align) mutates size and evaluates the argument multiple times. It’s fine as currently used with plain variables, but it’s fragile if someone later passes an expression.

Consider replacing it with an inline function:

inline uint64_t align_size(uint64_t size, uint64_t align) {
  return ((size + align - 1) / align) * align;
}

and assigning explicitly at call sites.


161-187: Clarify semantics and use of pkt_mkey

DocaTxQueue now exposes pkt_mkey that is consumed by the send kernel. This is fine, but it’s effectively part of the public contract between host code and the CUDA kernel. Consider documenting that pkt_mkey must be an mkey in network byte order (as enforced in the .cpp), to prevent misuse if future code constructs DocaTxQueue differently.


247-281: Document ownership of flow_queue_id in DocaMgr

init_doca_flow now takes a struct doca_dev* and DocaMgr introduces a member uint16_t flow_queue_id = 0 shared across default and per‑flow pipes. Because this ID is plumbed into DOCA RSS queues via doca_eth_rxq_apply_queue_id, its expected lifetime and uniqueness constraints should be explicit (e.g., reset per port vs global across all ports).

Adding a brief comment on the intended scope of flow_queue_id would make the design clearer and help future DOCA upgrades.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (4)

296-343: DOCA 3.2 flow init logic looks sound; consider minor cleanup

The refactored init_doca_flow correctly:

  • Creates/destroys doca_flow_cfg once (via static bool flow_init).
  • Sets queues, mode args, and counters.
  • Uses doca_flow_port_cfg_create/set_port_id/set_dev/port_start and destroys port_cfg on all error paths.

Behaviorally this is fine. You may want to:

  • Use nullptr instead of NULL for C++ consistency.
  • Normalize logging to {} formatting in all branches.

No functional blockers here.


346-374: Replace tabs and long lines in flow‑port setup to satisfy lint

The new doca_flow_port_cfg_* block uses tabs and some lines exceed 100 chars, which is exactly what the lint warnings are flagging. Re‑indent with spaces and wrap the log messages/argument lists to avoid future CI failures.

No behavior change required.


598-605: init_doca_flow call wiring looks correct

Passing ddev[intf.port_id_] into init_doca_flow aligns with the updated signature and associates each DOCA flow port with its underlying DOCA device. The null check and early abort on failure are appropriate.

Nothing to change here beyond potential line‑wrapping for lint.


1794-1795: Shutdown path is clean; consider also freeing RX packet lists

DocaMgr::shutdown now logs “exit gracefully” after joining worker threads. Given the new RX packet-list allocations are owned by DocaRxQueue, this is a good place to ensure all queues have destroyed their packet lists (if not already handled elsewhere), to avoid leaks over multiple init/shutdown cycles.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (3)

73-133: RX cyclic buffer sizing and DMAbuf mapping look reasonable

The sequence:

  • doca_eth_rxq_estimate_packet_buf_size(...) into cyclic_buffer_size.
  • ALIGN_SIZE(cyclic_buffer_size, get_host_page_size());
  • doca_gpu_mem_alloc on gdev with that size.
  • Optional DMAbuf mapping via doca_gpu_dmabuf_fd and doca_mmap_set_dmabuf_memrange, with fallback to doca_mmap_set_memrange.

is consistent and has error logging on each DOCA call. No functional issues stand out here; just consider rewording the log messages (“send queue buffer”) to avoid confusion on the RX path.


202-301: UDP pipe creation aligns with new DOCA flow APIs; minor logging/format nits

The updated create_udp_pipe:

  • Uses parser_meta.outer_l3_type / outer_l4_type for IPv4/UDP.
  • Applies a unique flow_queue_id via doca_eth_rxq_apply_queue_id.
  • Sets forward action to RSS over a single queue and miss to default pipe or drop.
  • Creates a pipe, adds an entry, and processes entries with timeout default_flow_timeout_usec.

Logic looks good. A few small cleanups you may want:

  • Replace printf("UDP pipe queue %d\n", flow_queue_id); with HOLOSCAN_LOG_DEBUG for consistent logging.
  • Use {} formatting instead of %s in HOLOSCAN logs for DOCA errors.
  • Consider documenting that flow_queue_id must remain in sync with the default‑pipe logic in DocaMgr.

408-542: TX queue buffer sizing, DMAbuf mapping, and mkey handling look good

The DocaTxQueue constructor:

  • Aligns tx_buffer_size to host page size via ALIGN_SIZE.
  • Allocates GPU/CPU memory with doca_gpu_mem_alloc.
  • Attempts DMAbuf mapping and falls back to doca_mmap_set_memrange with logging on failure.
  • Starts the mmap and retrieves pkt_mkey via doca_mmap_get_mkey, converting it to network byte order.

This is consistent with how the send kernel expects pkt_buff_addr and pkt_buff_mkey. Main follow‑ups:

  • As with RX, confirm whether DMAbuf FDs need explicit closing once the mmap is set (depends on DOCA API).
  • Optionally document that pkt_mkey is stored already in network byte order for downstream users.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (1)

315-384: send_packets_kernel is consistent with TX buffer layout

The send kernel:

  • Reserves num_pkts WQ slots once (base_wqe_idx in shared mem).
  • Computes each packet addr as pkt_buff_addr + max_pkt_size * ((pkt_idx + gpu_pkt0_idx) % max_pkts), matching the circular buffer logic in get_tx_packet_burst.
  • Prepares send WQEs with or without completion flag and finally calls doca_gpu_dev_eth_txq_submit.

This matches the host-side expectations in tx_core and DocaTxQueue. Just fix tabs/line length per lint and consider splitting the long doca_gpu_dev_eth_txq_wqe_prepare_send calls across lines for readability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09d588c and 5850b3f.

📒 Files selected for processing (8)
  • operators/advanced_network/Dockerfile (3 hunks)
  • operators/advanced_network/advanced_network/CMakeLists.txt (1 hunks)
  • operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt (1 hunks)
  • operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (11 hunks)
  • operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1 hunks)
  • operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (25 hunks)
  • operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (6 hunks)
  • operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (12 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T23:47:42.896Z
Learnt from: cdinea
Repo: nvidia-holoscan/holohub PR: 1170
File: applications/video_streaming/video_streaming_server/cpp/CMakeLists.txt:109-111
Timestamp: 2025-10-22T23:47:42.896Z
Learning: In the video streaming server application (applications/video_streaming/video_streaming_server/cpp/CMakeLists.txt), bundling libcudart.so.12 from the NGC operator binaries is intentional to ensure consistency with NGC binaries, even though the target links to CUDA::cudart.

Applied to files:

  • operators/advanced_network/advanced_network/CMakeLists.txt
  • operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt
🧬 Code graph analysis (3)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (4)
  • doca_receiver_packet_kernel (388-423)
  • doca_receiver_packet_kernel (388-391)
  • doca_sender_packet_kernel (425-460)
  • doca_sender_packet_kernel (425-430)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (2)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (6)
  • create_udp_pipe (202-301)
  • create_udp_pipe (202-204)
  • create_rx_packet_list (303-315)
  • create_rx_packet_list (303-303)
  • destroy_rx_packet_list (317-327)
  • destroy_rx_packet_list (317-317)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (2)
  • init_doca_flow (296-378)
  • init_doca_flow (296-296)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (1)
operators/advanced_network/advanced_network/managers/rivermax/rivermax_mgr_impl/adv_network_rivermax_mgr.cpp (1)
  • cfg (95-95)
🪛 Clang (14.0.6)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h

[error] 19-19: 'cuda.h' file not found

(clang-diagnostic-error)

🪛 GitHub Actions: Check Compliance
operators/advanced_network/advanced_network/CMakeLists.txt

[error] 1-1: Current year not included in the copyright header.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp

[error] 1-1: Current year not included in the copyright header.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h

[error] 1-1: Current year not included in the copyright header.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h

[error] 1-1: Current year not included in the copyright header.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp

[error] 1-1: Current year not included in the copyright header.

operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt

[error] 1-1: Current year not included in the copyright header.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu

[error] 1-1: Current year not included in the copyright header.

🪛 GitHub Actions: Check linting
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp

[error] 296-296: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 347-347: Tab found; better to use spaces. [whitespace/tab]


[error] 348-348: Tab found; better to use spaces. [whitespace/tab]


[error] 349-349: Tab found; better to use spaces. [whitespace/tab]


[error] 360-360: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 362-362: Tab found; better to use spaces. [whitespace/tab]


[error] 363-363: Tab found; better to use spaces. [whitespace/tab]


[error] 372-372: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 384-384: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 842-842: Tab found; better to use spaces. [whitespace/tab]


[error] 843-843: Tab found; better to use spaces. [whitespace/tab]


[error] 844-844: Tab found; better to use spaces. [whitespace/tab]


[error] 845-845: Tab found; better to use spaces. [whitespace/tab]


[error] 859-859: Tab found; better to use spaces. [whitespace/tab]


[error] 860-860: Tab found; better to use spaces. [whitespace/tab]


[error] 861-861: Tab found; better to use spaces. [whitespace/tab]


[error] 862-862: Tab found; better to use spaces. [whitespace/tab]


[error] 863-863: Tab found; better to use spaces. [whitespace/tab]


[error] 864-864: Tab found; better to use spaces. [whitespace/tab]


[error] 865-865: Tab found; better to use spaces. [whitespace/tab]


[error] 866-866: Tab found; better to use spaces. [whitespace/tab]


[error] 867-867: Tab found; better to use spaces. [whitespace/tab]


[error] 868-868: Tab found; better to use spaces. [whitespace/tab]


[error] 969-969: Tab found; better to use spaces. [whitespace/tab]


[error] 970-970: Tab found; better to use spaces. [whitespace/tab]


[error] 971-971: Tab found; better to use spaces. [whitespace/tab]


[error] 972-972: Tab found; better to use spaces. [whitespace/tab]


[error] 973-973: Tab found; better to use spaces. [whitespace/tab]


[error] 974-974: Tab found; better to use spaces. [whitespace/tab]


[error] 1338-1338: Lines should be <= 100 characters long. [whitespace/line_length]

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h

[error] 132-132: Lines should be <= 100 characters long [whitespace/line_length]


[error] 156-156: Lines should be <= 100 characters long [whitespace/line_length]

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp

[error] 37-37: { should almost always be at the end of the previous line [whitespace/braces]

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu

[error] 91-91: Tab found; better to use spaces. [whitespace/tab]


[error] 94-94: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 94-94: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after. [runtime/casting]


[error] 105-105: Tab found; better to use spaces. [whitespace/tab]


[error] 106-106: Tab found; better to use spaces. [whitespace/tab]


[error] 107-107: Tab found; better to use spaces. [whitespace/tab]


[error] 107-107: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 160-160: Should have a space between // and comment. [whitespace/comments]


[error] 161-161: Lines should be <= 100 characters long. [whitespace/line_length]


185-185: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after. [runtime/casting]


[error] 211-211: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 221-221: Tab found; better to use spaces. [whitespace/tab]


[error] 223-223: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 223-223: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after. [runtime/casting]


[error] 232-232: Tab found; better to use spaces. [whitespace/tab]


[error] 233-233: Tab found; better to use spaces. [whitespace/tab]


[error] 234-234: Tab found; better to use spaces. [whitespace/tab]


[error] 234-234: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 284-284: Should have a space between // and comment. [whitespace/comments]


[error] 285-285: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 317-317: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 336-336: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 340-340: Tab found; better to use spaces. [whitespace/tab]


[error] 340-340: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 342-342: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 347-347: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 451-451: Lines should be <= 100 characters long. [whitespace/line_length]

🪛 Hadolint (2.14.0)
operators/advanced_network/Dockerfile

[error] 108-108: unexpected 'h'
expecting '#', '', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input

(DL1000)

🔇 Additional comments (12)
operators/advanced_network/advanced_network/CMakeLists.txt (1)

51-56: CUDA compilation settings are commented out—clarify intent.

Lines 51-52 show CUDA_SEPARABLE_COMPILATION and CUDA_RESOLVE_DEVICE_SYMBOLS commented out. The AI summary describes these as "moved to standalone lines," but they appear disabled rather than relocated. For a DOCA 3.2 upgrade with GPU kernel refactoring, these settings are typically critical for proper device code compilation.

Verify:

  • Are these settings now configured at global scope elsewhere (e.g., parent CMakeLists.txt)?
  • Should they be uncommented?
  • Or are they no longer needed with DOCA 3.2?
operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt (2)

52-58: CUDA compilation settings are commented out—clarify intent.

Lines 52-53 show CUDA_SEPARABLE_COMPILATION and CUDA_RESOLVE_DEVICE_SYMBOLS commented out, matching the same pattern in the parent CMakeLists.txt. Given the GPU kernel refactoring mentioned in the PR summary for DOCA 3.2 (semaphore-based to packet-list-based transition), these settings are typically essential for device code separation and symbol resolution.

Verify:

  • Should these be uncommented and set at the global CMake scope?
  • Are they redundant with DOCA 3.2 defaults?
  • Is this change consistent across all manager implementations?

21-21: Verify DOCA 3.2 API compatibility for removed device library linkage.

Per the AI summary, -ldoca_gpunetio_device was removed from the target_link_libraries. This device-side static library linkage is significant, as the PR involves GPU kernel refactoring (semaphore-based to packet-list-based transitions). Confirm that this removal aligns with DOCA 3.2 API/ABI changes and doesn't break kernel symbol resolution or initialization.

Also applies to: 65-71

operators/advanced_network/Dockerfile (2)

83-89: Version bump and early dependency installation look appropriate.

The DOCA version upgrade from 2.8.0 to 3.2.0 and early installation of build-essential and wget align with the PR objectives and support the subsequent DOCA host package download step. The placement in the common-deps layer is correct.


131-138: Package list updates align with DOCA 3.2.0 upgrade.

The addition of mlnx-dpdk, doca-sdk-gpunetio, doca-sdk-eth, and doca-sdk-flow (and their -dev variants) reflects the transition from the semaphore/buffer-based DOCA 2.8.0 APIs to the packet-list and GPUNetIO architecture in DOCA 3.2.0. The removal of libdoca-sdk-dma-dev is consistent with this shift.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (3)

1337-1390: Packet‑list polling logic is correct but depends on proper initialization

The loop that:

  • Reads packets_stats->status.
  • Enqueues bursts when status == DOCA_GPU_SEMAPHORE_STATUS_READY.
  • Resets status to FREE and advances pkt_idx_cpu_list.

is structurally sound and matches the new packet‑list model. Its correctness hinges on two conditions:

  1. Packet‑list entries are zero‑initialized / set to FREE at creation (see earlier comment).
  2. CPU and GPU agree on index sequencing (host vs device ring index).

Once the initialization issues are addressed, this logic should behave as intended.


1024-1027: DocaMgr destructor now frees TX host buffers – good

The added loop freeing burst[idx].pkt_lens[0] with cudaFreeHost matches the earlier cudaMallocHost in setup_pools_and_rings. That closes a prior leak for TX metadata buffers.

No further changes needed here.


1279-1284: Review comment is factually correct; GPU-side initialization concern is valid

The verification confirms the review's claims:

  1. Persistent kernel ignores pkt_idx_list: Confirmed. The kernel declares a local int pkt_idx = 0 and manages it independently, never reading the pkt_idx_list parameter.

  2. Non-persistent kernel uses pkt_idx_list: Confirmed. Line 223 of adv_network_doca_kernels.cu reads pkt_idx_list[blockIdx.x] directly, and line 293–297 logs it.

  3. Lack of explicit GPU-side initialization: Confirmed. Only the CPU-side pkt_idx_cpu_list[idx] = 0 is set (line 1282). No explicit cudaMemcpy or cudaMemset initializes the GPU copy before kernel launch.

  4. Timing concern with warmup kernel: If ADV_NETWORK_MANAGER_WARMUP_KERNEL is enabled, the warmup kernel (line 1307–1315) runs with persistent=false, invoking the non-persistent kernel that does access pkt_idx_gpu_list. This occurs after CPU initialization but without an explicit synchronization barrier.

The suggestion to explicitly zero-initialize both CPU and GPU copies is reasonable defensive programming, especially for DOCA's coherent memory model where explicit synchronization may be necessary.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (1)

545-572: DocaTxQueue destructor correctly unwinds DOCA resources

Stopping/destroying the ctx, destroying the eth txq, destroying the mmap, freeing GPU memory, and destroying the PE in this order looks correct and matches the allocations in the constructor.

No issues here.

operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (3)

90-107: Persistent RX kernel logic matches new packet‑list model

The use of out_first_pkt_idx, out_pkt_num, and doca_gpu_dev_eth_rxq_get_pkt_addr along with per‑block adv_doca_rx_gpu_info updates is consistent with the host‑side polling logic:

  • First packet’s addr/idx stored once per batch.
  • Total bytes aggregated with atomicAdd_block.
  • When accumulated tot_pkts_batch >= batch_list[blockIdx.x], stats are written, a fence is released, and status is set to READY.

Once the host‑side pointer and initialization issues are addressed, this kernel should behave correctly.


388-423: Receiver kernel wrapper updates look correct

doca_receiver_packet_kernel:

  • Validates rxqn and gpu_exit_condition.
  • Checks for a prior CUDA error via cudaGetLastError.
  • Launches either the persistent or non‑persistent kernel with the updated parameter lists.
  • Checks for post‑launch CUDA errors.

The new pkt_gpu_list / pkt_idx_list parameters are wired correctly from the manager code. No changes needed beyond style/lint.


425-460: Sender kernel wrapper wiring matches new address‑based API

doca_sender_packet_kernel now:

  • Accepts pkt_buff_addr, pkt_buff_mkey, gpu_pkt0_idx, max_pkts, max_pkt_size.
  • Launches send_packets_kernel<<<1, CUDA_BLOCK_THREADS>>> with those parameters.
  • Checks CUDA errors before and after launch.

This matches how DocaTxQueue prepares the buffer and pkt_mkey. All good here.

Comment on lines 79 to +99
__global__ void receive_packets_kernel_persistent(int rxqn, uintptr_t* eth_rxq_gpu,
uintptr_t* sem_gpu, uint32_t* sem_idx_list,
uintptr_t* pkt_gpu_list, uint32_t* pkt_idx_list,
const uint32_t* batch_list, uint32_t* exit_cond) {
doca_error_t ret;
struct doca_gpu_buf* buf_ptr = NULL;
uintptr_t buf_addr;
uint64_t buf_idx;
struct doca_gpu_eth_rxq* rxq = (struct doca_gpu_eth_rxq*)eth_rxq_gpu[blockIdx.x];
struct doca_gpu_semaphore_gpu* sem = (struct doca_gpu_semaphore_gpu*)sem_gpu[blockIdx.x];
int sem_idx = 0;
__shared__ struct adv_doca_rx_gpu_info* stats_global;
#if DOCA_DEBUG_KERNEL == 1
int pkt_idx = 0;
struct eth_ip_udp_hdr* hdr;
uint8_t* payload;
#endif
__shared__ uint32_t rx_pkt_num;

__shared__ uint64_t out_first_pkt_idx;
__shared__ uint32_t out_pkt_num;
__shared__ uint32_t rx_pkt_bytes;
__shared__ uint64_t rx_buf_idx;
uint32_t pktb = 0;
uint32_t tot_pkts_batch = 0;
struct adv_doca_rx_gpu_info *packets_stats = &((struct adv_doca_rx_gpu_info *)pkt_gpu_list[blockIdx.x])[pkt_idx];

// Warmup
if (eth_rxq_gpu == NULL) return;

if (threadIdx.x == 0) {
if (threadIdx.x == 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against null eth_rxq_gpu before dereferencing in RX kernels

Both RX kernels (receive_packets_kernel_persistent and _non_persistent) currently do:

struct doca_gpu_eth_rxq* rxq =
    (struct doca_gpu_eth_rxq*)eth_rxq_gpu[blockIdx.x];
...
if (eth_rxq_gpu == NULL) return;

The dereference happens before the null check. In rx_core, the warmup call to doca_receiver_packet_kernel passes eth_rxq_gpu as nullptr (guarded by a macro today), which would be undefined behavior if that path is ever enabled.

Please move the null check to the very top of each kernel, before any indexing, e.g.:

if (eth_rxq_gpu == nullptr || pkt_gpu_list == nullptr) return;
struct doca_gpu_eth_rxq* rxq =
    (struct doca_gpu_eth_rxq*)eth_rxq_gpu[blockIdx.x];

This will make the warmup path safe even if enabled later.

🧰 Tools
🪛 GitHub Actions: Check linting

[error] 91-91: Tab found; better to use spaces. [whitespace/tab]


[error] 94-94: Lines should be <= 100 characters long. [whitespace/line_length]


[error] 94-94: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after. [runtime/casting]

🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu
around lines 79-99, the kernel dereferences eth_rxq_gpu and pkt_gpu_list before
checking for NULL which is undefined behavior; move the null checks to the very
beginning of the kernel (before any use of eth_rxq_gpu, pkt_gpu_list, or other
pointer parameters) and return early if either is NULL, then perform the
blockIdx.x indexing to obtain rxq and packets_stats; apply the same change to
the corresponding non-persistent RX kernel so both warmup paths are safe.

Comment on lines +19 to +21
#include <cuda.h>
#include <cuda_runtime_api.h>
#include <cstdint>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Drop <cuda.h> to fix clang header error and rely on <cuda_runtime_api.h>

The header only needs cudaStream_t; <cuda_runtime_api.h> already provides it. Including <cuda.h> is causing the reported clang diagnostic (“'cuda.h' file not found”) when this header is parsed without CUDA SDK available. Removing <cuda.h> here (and relying on the existing CUDA headers in the .cu translation unit) should unblock static analysis without changing behavior.

Also, the SPDX header still lists only 2023; please add the current year (2025) to satisfy the compliance check.

🧰 Tools
🪛 Clang (14.0.6)

[error] 19-19: 'cuda.h' file not found

(clang-diagnostic-error)

🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h
around lines 19 to 21, remove the #include <cuda.h> line so the header only
includes <cuda_runtime_api.h> and <cstdint> (cudaStream_t is provided by
cuda_runtime_api.h) to avoid clang "'cuda.h' file not found" diagnostics when
parsing without the CUDA SDK; additionally update the SPDX header year range to
include 2025 (e.g., change 2023 to 2023-2025) to satisfy the compliance check.

Comment on lines +31 to +43
/*
* Retrieve host page size
*
* @return: host page size
*/
static size_t get_host_page_size(void)
{
long ret = sysconf(_SC_PAGESIZE);
if (ret == -1)
return 4096; // 4KB, default Linux page size

return (size_t)ret;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Include <unistd.h> for sysconf / _SC_PAGESIZE

get_host_page_size() calls sysconf(_SC_PAGESIZE), but this file doesn’t include <unistd.h>, where both sysconf and _SC_PAGESIZE are declared. This will generally fail to compile or at least emit warnings.

Add:

#include <unistd.h>

near the top of the file.

🧰 Tools
🪛 GitHub Actions: Check linting

[error] 37-37: { should almost always be at the end of the previous line [whitespace/braces]

🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp
around lines 31 to 43, the function get_host_page_size uses sysconf and
_SC_PAGESIZE but the file doesn't include <unistd.h>; add an #include for
<unistd.h> at the top of the file (with other system includes) so sysconf and
_SC_PAGESIZE are declared and the code compiles cleanly.

Comment on lines +303 to +327
doca_error_t DocaRxQueue::create_rx_packet_list() {
doca_error_t result;

printf("Alloc packet list for Rxq\n");

result = doca_gpu_mem_alloc(gdev, MAX_DEFAULT_SEM_X_QUEUE * sizeof(struct adv_doca_rx_gpu_info), GPU_PAGE_SIZE, DOCA_GPU_MEM_TYPE_CPU_GPU, (void**)&pkt_list_gpu, (void**)&pkt_list_cpu);
if (result != DOCA_SUCCESS) {
HOLOSCAN_LOG_ERROR("Failed allocate packet list memory: {}", doca_error_get_descr(result));
return DOCA_ERROR_BAD_STATE;
}

return result;
}

doca_error_t DocaRxQueue::destroy_rx_packet_list() {
doca_error_t result;

result = doca_gpu_mem_free(gdev, pkt_list_gpu);
if (result != DOCA_SUCCESS) {
HOLOSCAN_LOG_ERROR("Failed destroy packet list memory: {}", doca_error_get_descr(result));
return DOCA_ERROR_BAD_STATE;
}

return result;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Initialize and free RX packet-list properly

create_rx_packet_list():

result = doca_gpu_mem_alloc(
    gdev,
    MAX_DEFAULT_SEM_X_QUEUE * sizeof(struct adv_doca_rx_gpu_info),
    GPU_PAGE_SIZE,
    DOCA_GPU_MEM_TYPE_CPU_GPU,
    (void**)&pkt_list_gpu,
    (void**)&pkt_list_cpu);

allocates the packet-list ring but:

  • Does not initialize any fields, including status, which the CPU later polls for READY. This should be cleared (e.g., memset(pkt_list_cpu, 0, ...) or an explicit loop setting status = DOCA_GPU_SEMAPHORE_STATUS_FREE).
  • Is not paired with a call to destroy_rx_packet_list() in the DocaRxQueue destructor, so the allocated memory will leak unless destroyed elsewhere.

destroy_rx_packet_list() correctly calls doca_gpu_mem_free(gdev, pkt_list_gpu) but is currently unused.

Please:

  1. Zero‑initialize the packet-list after allocation.
  2. Ensure destroy_rx_packet_list() is invoked from the DocaRxQueue destructor or from the owning manager’s shutdown path.
🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp
around lines 303-327, the RX packet-list allocated in create_rx_packet_list() is
not initialized and the destroy_rx_packet_list() is not invoked from the
DocaRxQueue destructor, causing uninitialized-status reads and a leak; after
successful doca_gpu_mem_alloc() zero-initialize the CPU-side array (e.g.,
memset(pkt_list_cpu, 0, MAX_DEFAULT_SEM_X_QUEUE * sizeof(struct
adv_doca_rx_gpu_info)) or loop and set each entry’s status =
DOCA_GPU_SEMAPHORE_STATUS_FREE) and then ensure destroy_rx_packet_list() is
called from DocaRxQueue::~DocaRxQueue() (or the owning manager shutdown) to free
pkt_list_gpu via doca_gpu_mem_free().

Comment on lines +569 to 571
cudaSetDevice(gpu_dev);
cudaFree(0);
doca_ret = doca_gpu_create(gpu_bdf, &gdev[gpu_dev]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Check cudaSetDevice return value

cudaSetDevice(gpu_dev); is currently unchecked. If it fails (e.g., invalid index or CUDA not initialized), the subsequent cudaFree(0); and DOCA GPU create will behave unpredictably.

Recommend capturing and checking the return code, logging and aborting initialization on failure.


🏁 Script executed:

#!/bin/bash
# Locate the file and examine code around lines 569-571
fd -t f "adv_network_doca_mgr.cpp" --exec cat {} \;

Length of output: 61814


🏁 Script executed:

#!/bin/bash
# Search for all cudaSetDevice usage in the codebase
rg -n "cudaSetDevice" -A2 -B2 --type=cpp

Length of output: 2642


🏁 Script executed:

#!/bin/bash
# Check for cuda error handling patterns in the repository
rg -n "cudaError\|cudaGetErrorString\|CUDA_CHECK\|checkCudaErrors" --type=cpp | head -20

Length of output: 0



**Check `cudaSetDevice` return value**

The function `cudaSetDevice` returns `cudaError_t` and should be checked for errors. It can return `cudaErrorInitializationError`, `cudaErrorInsufficientDriver`, or `cudaErrorNoDevice`, and as of CUDA 12.0, it is very important to check the return value of `cudaSetDevice()` for initialization errors. Unchecked failures will cause subsequent CUDA operations to behave unpredictably or operate on the wrong device.

Capture and check the return code at lines 569, 1216, and 1474. Log errors and abort initialization on failure. Note: this pattern also appears in `operators/advanced_network/advanced_network/manager.cpp` at line 142.

---

🌐 Web query:

```web
CUDA cudaSetDevice return value error handling

💡 Result:

Short answer: cudaSetDevice returns a cudaError_t and must be checked like any CUDA Runtime API call. Possible documented return values include cudaSuccess, cudaErrorInvalidDevice and cudaErrorSetOnActiveProcess (and it may also surface earlier asynchronous errors). Use cudaGetErrorString() to get a human-readable message. [1][2]

Minimal example (C/C++):
int dev = 0;
cudaError_t err = cudaSetDevice(dev);
if (err != cudaSuccess) {
fprintf(stderr, "cudaSetDevice(%d) failed: %s\n", dev, cudaGetErrorString(err));
/* handle/fail/cleanup */
}

Notes:

  • cudaErrorInvalidDevice means the device ID is invalid.
  • cudaErrorSetOnActiveProcess means the runtime/driver context is already active on the host thread and you cannot set the device then.
  • Many runtime calls can also return errors from prior asynchronous operations; checking and handling errors early helps diagnose failures. [1][2]

Sources:
[1] NVIDIA cudaSetDevice documentation. [2] NVIDIA guide on handling CUDA runtime errors.


Check cudaSetDevice return value at lines 569, 1216, and 1474

cudaSetDevice returns cudaError_t and must be checked. Possible documented return values include cudaSuccess, cudaErrorInvalidDevice and cudaErrorSetOnActiveProcess, and unchecked failures will cause subsequent CUDA operations to behave unpredictably.

Capture and check the return code at each call site. Use cudaGetErrorString() to get a human-readable message and abort initialization on error. This pattern also appears in operators/advanced_network/advanced_network/manager.cpp at line 142.

🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp
around lines 569, 1216, and 1474, the calls to cudaSetDevice are currently
unchecked; capture the cudaError_t return for each call, test for cudaSuccess,
and on error log the device number and the human-readable string from
cudaGetErrorString(return_code) then abort initialization (return error / exit)
so subsequent CUDA calls are not executed on a failed device set; apply the same
pattern used in operators/advanced_network/advanced_network/manager.cpp line
142.

Comment on lines +107 to +110
RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get update
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix Dockerfile syntax error in wget command — line continuation is malformed.

The URL on line 107 is incomplete and continues incorrectly on line 108. The string "doca" at the end of line 107 lacks a continuation character (\), causing Hadolint to fail with "unexpected 'h'" on line 108.

Reconstructing the intended command, the URL should be fully specified on a single logical line using proper shell continuation:

-RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca
-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
+RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \

Additionally, consider adding error handling and cleanup to the wget command to match Docker best practices and the pattern used elsewhere in the file:

-RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
+RUN wget -q https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
    && apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
    && apt-get update \
-    && apt-get update
+    && rm -f ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get update
RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get update
🧰 Tools
🪛 Hadolint (2.14.0)

[error] 108-108: unexpected 'h'
expecting '#', '', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input

(DL1000)

&& curl -fsSL ${DOCA_REPO_LINK}/GPG-KEY-Mellanox.pub | gpg --dearmor | tee ${LOCAL_GPG_KEY_PATH} > /dev/null \
&& echo "deb [signed-by=${LOCAL_GPG_KEY_PATH}] ${DOCA_REPO_LINK} ./" | tee /etc/apt/sources.list.d/mellanox.list

RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use the version variable from above?


RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing on version variable

* applications needs to exit
*/
printf("Receive UDP kernel error %d rxpkts %d error %d\n", ret, rx_pkt_num, ret);
printf("Receive UDP kernel error %d rxpkts %d error %d\n", ret, out_pkt_num, ret);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a printf slows the whole kernel down even if it's not used. Probably should wrap in a macro.

}

create_semaphore();
// create_semaphore();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did the semaphores go?

}


#if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that by removing the semaphores there's no longer synchronization with the CPU, so now the example assumes the CPU has no idea when packets are received and when to launch kernels?

}

create_semaphore();
// create_semaphore();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not needed, remove?

Comment on lines +19 to +20
#include <cuda.h>
#include <cuda_runtime_api.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned there were no cuda dependencies?

Comment on lines +52 to +54
# CUDA_SEPARABLE_COMPILATION ON
# CUDA_RESOLVE_DEVICE_SYMBOLS ON

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment on lines +51 to +52
# CUDA_SEPARABLE_COMPILATION ON
# CUDA_RESOLVE_DEVICE_SYMBOLS ON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?


WORKDIR /opt

RUN apt-get update && apt-get install -y build-essential wget
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use curl for licensing reasons

libcurand-dev-12-6 \
ninja-build \
pkgconf \
mlnx-dpdk \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not needed since it already has -dev below

Comment on lines +131 to 132
mlnx-dpdk \
mlnx-dpdk-dev \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cliffburdick will this create conflicts when using upstream dpdk? Can doca use dpdk from upstream if available instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mlnx version installs in /opt so it shouldn't have any issues

Comment on lines +133 to +137
doca-sdk-gpunetio \
libdoca-sdk-gpunetio-dev \
doca-sdk-eth \
libdoca-sdk-eth-dev \
doca-sdk-flow \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the -dev don't depend on the non dev packages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants