-
Notifications
You must be signed in to change notification settings - Fork 114
New ANO GPUNetIO manager with DOCA 3.2 #1250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: eagostini <[email protected]>
Signed-off-by: eagostini <[email protected]>
Signed-off-by: eagostini <[email protected]>
Signed-off-by: eagostini <[email protected]>
Signed-off-by: eagostini <[email protected]>
Signed-off-by: eagostini <[email protected]>
Greptile OverviewGreptile SummaryThis PR upgrades the Advanced Network Operator from DOCA 2.8 to DOCA 3.2, implementing significant architectural changes to the GPU packet processing pipeline. Key Changes
Issues Found
The core GPU kernel logic remains sound with proper synchronization patterns, and the migration to DOCA 3.2 APIs appears architecturally correct. Confidence Score: 3/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant CPU as CPU Thread
participant GPU as GPU Kernel
participant NIC as Network Interface Card
participant MEM as GPU Memory
Note over CPU,MEM: Initialization Phase (DOCA 3.2)
CPU->>MEM: Allocate packet list (adv_doca_rx_gpu_info)
CPU->>NIC: Configure DOCA Flow pipes with RSS
CPU->>NIC: Apply queue IDs to eth_rxq
CPU->>GPU: Launch persistent RX kernel
Note over GPU,NIC: Packet Reception Loop
loop Persistent kernel execution
GPU->>NIC: doca_gpu_dev_eth_rxq_recv()
NIC-->>GPU: out_first_pkt_idx, out_pkt_num
GPU->>NIC: doca_gpu_dev_eth_rxq_get_pkt_addr()
NIC-->>GPU: Buffer addresses
GPU->>GPU: Process packets (parse headers, filter)
GPU->>MEM: Update packet_stats (num_pkts, nbytes, addr)
GPU->>MEM: doca_gpu_dev_eth_fence_release()
GPU->>MEM: Set status = READY
CPU->>MEM: Poll packet_stats.status
alt Status == READY
CPU->>MEM: Read packet metadata
CPU->>CPU: Enqueue to ring buffer
CPU->>MEM: Set status = FREE
CPU->>MEM: Increment pkt_idx (circular)
end
end
Note over CPU,NIC: Packet Transmission
CPU->>CPU: Dequeue burst from ring
CPU->>GPU: Launch TX kernel with packet info
GPU->>GPU: doca_gpu_dev_eth_txq_reserve_wq_slots()
loop For each packet
GPU->>GPU: Calculate packet address
GPU->>GPU: doca_gpu_dev_eth_txq_wqe_prepare_send()
end
GPU->>NIC: doca_gpu_dev_eth_txq_submit()
NIC->>NIC: Transmit packets
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 files reviewed, 3 comments
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca | ||
| host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: Line break splits the URL, causing wget to fail
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca | |
| host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | |
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ |
|
|
||
| doca_eth_rxq_get_flow_queue_id(eth_rxq_cpu, &flow_queue_id); | ||
| rss_queues[0] = flow_queue_id; | ||
| printf("UDP pipe queue %d\n", flow_queue_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Debugging printf left in production code
| doca_error_t DocaRxQueue::create_rx_packet_list() { | ||
| doca_error_t result; | ||
|
|
||
| printf("Alloc packet list for Rxq\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Debugging printf left in production code
WalkthroughThis PR upgrades DOCA from version 2.8.0 to 3.2.0 and refactors GPU packet handling from semaphore-based to packet list-based architectures. Changes include updated build dependencies, modified CUDA kernel implementations, revised function signatures across managers, and integrated DMABuf memory mapping. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Suggested reviewers
Pre-merge checks✅ Passed checks (2 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (9)
operators/advanced_network/Dockerfile (1)
112-143: Update comments to reflect package changes in the apt-get install list.Lines 116–119 reference
libdoca-sdk-dma-devwhich is not in the current install list (lines 124–143). The comments describe the old package configuration, not the new DOCA 3.2 setup. Update the comments to accurately describe:doca-sdk-gpunetio,doca-sdk-eth,doca-sdk-flow, and their respective-devvariants.Apply this diff to update the comments:
# APT installs # - cublas, cufft, cusolver, curand dev libs: for matx # - ninja-build: for cmake build # - pkgconf: to import dpdk and doca in CMake # - mlnx-dpdk-dev: for dpdk and gpunetio backends -# - libdoca-sdk-dma-dev: for gpunetio backend (dependency of doca-gpunetio module) # - libdoca-sdk-gpunetio-dev: for gpunetio backend (doca-gpunetio module) +# - doca-sdk-gpunetio: runtime library for gpunetio backend +# - doca-sdk-eth: DOCA ethernet library +# - doca-sdk-flow: DOCA flow library # - libdoca-sdk-eth-dev: for gpunetio backend (doca-eth module) # - libdoca-sdk-flow-dev: for gpunetio backend (doca-flow module) # - mlnx-ofed-kernel-utils: utilities, including ibdev2netdev used by tune_system.py # - ibverbs-utils: utilities # - python3-pyelftools: used by some mlnx tools # - python3-dev: for building python bindingsoperators/advanced_network/advanced_network/CMakeLists.txt (1)
1-2: Update copyright header to include current yearCompliance check reports the current year is missing; please update to include 2025 (e.g.,
2023-2025).operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt (2)
1-2: Fix copyright headerSame as other files: include the current year (2025) in the SPDX copyright line.
62-71: Add-ldoca_gpunetio_deviceto link librariesThe CUDA kernels directly invoke GPUNetIO device-side functions including
doca_gpu_dev_eth_rxq_recv(),doca_gpu_dev_eth_rxq_get_pkt_addr(),doca_gpu_dev_eth_fence_release(),doca_gpu_dev_eth_txq_reserve_wq_slots(),doca_gpu_dev_eth_txq_get_wqe_ptr(),doca_gpu_dev_eth_txq_wqe_prepare_send(), anddoca_gpu_dev_eth_txq_submit(). Per DOCA 3.2 documentation, libdoca_gpunetio_device.a is a separate static library that must be explicitly linked when GPU device functions are required. Add-ldoca_gpunetio_deviceto thetarget_link_libraries()section in the CMakeLists.txt.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (2)
1-2: Align copyright headerSame as the other files: update the SPDX copyright line to
include the current year (e.g.,2023-2025) so the compliance
check passes.
1267-1277: Missing free for batch_gpu_list GPU allocationYou allocate
batch_gpu_list/batch_cpu_listwithdoca_gpu_mem_allocbut never calldoca_gpu_mem_freeonbatch_gpu_listwhen tearing downrx_core, while you do freeeth_rxq_gpu_list,pkt_gpu_list,pkt_idx_gpu_list, andgpu_exit_condition. This leaves a DOCA GPU allocation leaked per worker.Add a matching
doca_gpu_mem_free(tparams->gdev, (void*)batch_gpu_list);before returning.Also applies to: 1411-1416
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (1)
1-16: Fix copyright header to include current year for compliance.The compliance check is failing because the header only mentions 2023; the current year (2025) must be included.
For example:
- * SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (2)
1-16: Update copyright header to include 2025.Same as the C++ file, the compliance check requires the current year to be listed in the header.
- * SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
79-198: Persistent RX kernel: add fence to final flush and fix linting issues.Verification confirms all issues. The fence omission is real: line 179 shows
doca_gpu_dev_eth_fence_release()in the main batch-completion block, but line 197 (final flush after loop exit) lacks this call before settingstatus = READY. Without the fence, the host may observeREADYbefore the counters are visible, violating memory ordering.Casting and whitespace issues are also confirmed:
- Line 94: C-style cast triggers cpplint runtime/casting warning
- Lines 91, 105–107: Tabs instead of spaces
Required fixes:
- Add fence before final status write (line 197):
if (threadIdx.x == 0) { DOCA_GPUNETIO_VOLATILE(packets_stats->num_pkts) = DOCA_GPUNETIO_VOLATILE(tot_pkts_batch); DOCA_GPUNETIO_VOLATILE(packets_stats->nbytes) = DOCA_GPUNETIO_VOLATILE(rx_pkt_bytes);
- doca_gpu_dev_eth_fence_release<DOCA_GPUNETIO_ETH_SYNC_SCOPE_SYS>();
DOCA_GPUNETIO_VOLATILE(packets_stats->status) = DOCA_GPU_SEMAPHORE_STATUS_READY;
}2. Replace C-style cast on line 94 with `reinterpret_cast` and split across lines to avoid length warning. 3. Replace tabs with spaces throughout (lines 91, 105–107, and similar). </blockquote></details> </blockquote></details>🧹 Nitpick comments (16)
operators/advanced_network/advanced_network/CMakeLists.txt (1)
51-52: Clarify or remove commented CUDA propertiesThe commented
CUDA_SEPARABLE_COMPILATION/CUDA_RESOLVE_DEVICE_SYMBOLSlines are no-ops here. If these properties are now configured globally, consider removing the comments to avoid confusion; otherwise, set them viaset_target_propertiesorset_propertyon the relevant targets.operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt (1)
52-53: Remove or wire up commented CUDA propertiesThese commented
CUDA_SEPARABLE_COMPILATION/CUDA_RESOLVE_DEVICE_SYMBOLSdirectives do nothing; if CUDA behavior is controlled elsewhere now, consider dropping them, otherwise enable them via proper target properties.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (3)
65-73: ALIGN_SIZE macro is side‑effectful; prefer a helper
ALIGN_SIZE(size, align)both reassigns its argument and evaluates it multiple times. That’s easy to misuse with expressions or side effects. Consider replacing it with an inline function or a safer macro pattern (e.g.,do { (size) = ...; } while (0)), or using a typed helperalign_up(size_t size, size_t align).
127-159: Clarify semantics of pkt_list_gpu/pkt_list_cpu
DocaRxQueuenow has bothpkt_list_gpuandpkt_list_cpu, but onlypkt_list_gpuis used from the RX core loop (the CPU side). Ifpkt_list_gpuactually holds a device pointer andpkt_list_cpuholds the host-mapped view, that’s a correctness bug; if both are host-accessible (e.g., due to UVA), the naming is misleading. It would be safer to (a) use the explicit host pointer in CPU code, and (b) rename/add comments to reflect which field is host vs device.
161-187: Update pkt_mkey comment and remove dead fields
pkt_mkeyis a DOCA memory key, but the comment still says “GPU memory address”, which is confusing next togpu_pkt_addr. Also, thebuf_arr/buf_arr_gpumembers are commented out here but likely still referenced in older code paths; once you’re sure they’re unused, you can remove them (and any supporting code) entirely.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1)
18-22: Tighten CUDA includes to avoid static‑analysis issuesThe header now includes both
<cuda.h>and<cuda_runtime_api.h>, and clang static analysis is complaining aboutcuda.hnot being found. Since these declarations only usecudaStream_t,<cuda_runtime_api.h>should be sufficient; you can likely drop<cuda.h>from this file (it’s already included viaadv_network_doca_mgr.hanyway), which should make the header friendlier to non‑CUDA tooling.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (3)
347-373: Minor robustness and style issues in DOCA flow pipe setupA few small items worth tightening:
fwd.rss.nr_queues = cnt_defq;;(extra semicolon) is harmless but noisy.- In
create_default_pipe, you assumeidxq == cnt_defqby construction; if config ever diverges,rss_queuesandcnt_defqcould get out of sync. An assertion or sanity check here would help catch that.doca_flow_pipe_control_add_entryincreate_root_pipeis now passed multiplenullptrparameters; consider wrapping these in a small helper or at least naming the fields in comments to make future maintenance safer.Functionally it looks consistent with the new DOCA 3.2 APIs; these are mostly maintainability nits.
Also applies to: 855-867, 943-975
1367-1368: Minor RX logging and index handling nits
- The debug log uses
pkt_cpu_list[pkt_idx_cpu_list[ridx]], which is confusing and likely not what you want to print (it’s a pointer value, not an index). Consider loggingpkt_idx_cpu_list[ridx]instead.pkt_idx_cpu_list[ridx]is incremented moduloMAX_DEFAULT_SEM_X_QUEUE; make surecreate_rx_packet_list()allocates at least that manyadv_doca_rx_gpu_infoentries, or add an assertion.These are small but will make debugging and future changes safer.
Also applies to: 1387-1390
296-296: Address lint warnings: line length and tabsLint is flagging:
- Line 296 and 600 as over 100 characters.
- Lines 347–349 for tab characters.
These are straightforward to fix by wrapping long log lines and replacing tabs with spaces to match the project’s style and keep CI clean.
Also applies to: 347-349, 600-600
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (4)
31-43: Add proper include forsysconfand consider documenting fallback.
get_host_page_size()usessysconf(_SC_PAGESIZE)but this file does not include<unistd.h>, which is the canonical declaration site; relying on transitive includes is brittle. Also, a brief comment that 4096 is the intended default Linux page size fallback would help future maintainers.-#include <sys/time.h> +#include <sys/time.h> +#include <unistd.h> // for sysconf(_SC_PAGESIZE)
73-96: Host-page alignment for RX buffer looks reasonable but confirm with DOCA 3.2 docs.Calling
doca_eth_rxq_estimate_packet_buf_sizefollowed byALIGN_SIZE(cyclic_buffer_size, get_host_page_size());beforedoca_gpu_mem_allocis consistent with the DMABuf mapping that follows, but DOCA 3.2’s expectations around page size vs. GPU page size can be subtle, especially when mixingDOCA_GPU_MEM_TYPE_GPUandDOCA_GPU_MEM_TYPE_CPU_GPU.Please double‑check that:
- The estimated buffer size is allowed to be rounded up to host page size.
- The
GPU_PAGE_SIZEargument todoca_gpu_mem_allocis still valid for allmtypevalues you use here.
202-263: Flow queue ID handling and logging are fine, but avoidprintfin favor of HOLOSCAN logging.The new
flow_queue_idplumbing (writingrss_queues[0], applying it to the RXQ, then incrementing) is consistent and easy to follow.Two minor nits:
- Prefer
HOLOSCAN_LOG_INFOover bareprintffor"UDP pipe queue %d\n"so log routing/verbosity stays consistent with the rest of Holoscan.- Since DOCA Flow has evolved a bit between releases, please confirm that:
htons(cfg.match_.udp_src_/dst_)is the expected byte order formatch.outer.udp.l4_port.*.fwd.rss_type = DOCA_FLOW_RESOURCE_TYPE_NON_SHARED;withnr_queues = 1is the right way to express “no RSS, single queue” in DOCA 3.2.
303-327:create_rx_packet_list/destroy_rx_packet_listlook correct; confirm page-size choice and call sites.The allocation/free pair is straightforward and uses
DOCA_GPU_MEM_TYPE_CPU_GPU, which matches the intended “CPU‑visible from GPU” semantics.A couple of follow‑ups to consider:
- For
DOCA_GPU_MEM_TYPE_CPU_GPU, verify in DOCA 3.2 docs thatGPU_PAGE_SIZEis still the correct page size argument; some platforms expect host page size here.- If these helpers can be called multiple times per queue lifetime, you may want basic guards to avoid double‑free or leaks (e.g., track whether
pkt_list_gpuis already allocated and null it out on free).operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (3)
210-305: Non‑persistent RX kernel: address‑of‑cast and fence pattern; consider explicit error signalling.For
receive_packets_kernel_non_persistent, the new address‑based RX path is consistent with the persistent kernel and looks fine overall.A few suggestions:
Fix the cpplint
runtime/castingwarning similarly to the persistent kernel:struct adv_doca_rx_gpu_info *packets_stats = &((struct adv_doca_rx_gpu_info *)pkt_gpu_list[blockIdx.x])[pkt_idx_list[blockIdx.x]];
- auto* pkt_array =
reinterpret_cast<adv_doca_rx_gpu_info*>(pkt_gpu_list[blockIdx.x]);- adv_doca_rx_gpu_info* packets_stats =
&pkt_array[pkt_idx_list[blockIdx.x]];
The fence/write‑ready sequence (
nbytes/num_pkts, thendoca_gpu_dev_eth_fence_release, thenstatus = READY) matches DOCA guidance, which is good; keep this pattern consistent with the persistent kernel (see previous comment).On
doca_gpu_dev_eth_rxq_recverror you setout_pkt_num = 0but do not updatepackets_stats->status. If host code polls only onstatus == READY, it may hang while this kernel has already encountered an error. Consider setting an explicit error status or signalling via a separate flag so the CPU side can handle this cleanly.
315-384: TX kernel: address computation makes sense; add basic reserve error handling and address lint issues.The reworked
send_packets_kernelthat usespkt_buff_addr/pkt_buff_mkeyandmax_pkt_sizelooks coherent:
addr = pkt_buff_addr + max_pkt_size * ((pkt_idx + gpu_pkt0_idx) % max_pkts)correctly walks a circular buffer.- Using
base_wqe_idxfromdoca_gpu_dev_eth_txq_reserve_wq_slotsanddoca_gpu_dev_eth_txq_get_wqe_ptris the right pattern.Two recommended improvements:
If
doca_gpu_dev_eth_txq_reserve_wq_slotscan signal failure (e.g., invalid index), guard against that to avoid writing out of bounds:if (threadIdx.x == 0)
base_wqe_idx = doca_gpu_dev_eth_txq_reserve_wq_slots(txq, num_pkts);
- if (threadIdx.x == 0)
- base_wqe_idx = doca_gpu_dev_eth_txq_reserve_wq_slots(txq, num_pkts);
__syncthreads();- if (base_wqe_idx == DOCA_GPUNETIO_ETH_INVALID_WQE_IDX) {
- return;
- }
(Adjust the sentinel according to the DOCA 3.2 API.) - Cpplint warnings: tabs and long debug `printf` lines in the `#if DOCA_DEBUG_KERNEL == 2` block—replace tabs with spaces and wrap long lines as needed. --- `388-452`: **Host wrappers: new kernel signatures are consistent; no functional issues spotted.** The updated `doca_receiver_packet_kernel` and `doca_sender_packet_kernel` correctly forward the new parameters (packet list pointers, buffer base address, mkey, max packet size) to the respective kernels, and CUDA error checks remain in place. The only minor suggestion is to keep debug‑style comments (`// fprintf(...)`) either removed or behind a debug macro to avoid confusion. </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 09d588c324cb2e6a246572719c55ec749112cac2 and e20ad5065f270a5b0165e11ded86112673c32c10. </details> <details> <summary>📒 Files selected for processing (8)</summary> * `operators/advanced_network/Dockerfile` (3 hunks) * `operators/advanced_network/advanced_network/CMakeLists.txt` (1 hunks) * `operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt` (1 hunks) * `operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu` (11 hunks) * `operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h` (1 hunks) * `operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp` (25 hunks) * `operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h` (6 hunks) * `operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp` (12 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (1)</summary> <details> <summary>📚 Learning: 2025-10-22T23:47:42.896Z</summary>Learnt from: cdinea
Repo: nvidia-holoscan/holohub PR: 1170
File: applications/video_streaming/video_streaming_server/cpp/CMakeLists.txt:109-111
Timestamp: 2025-10-22T23:47:42.896Z
Learning: In the video streaming server application (applications/video_streaming/video_streaming_server/cpp/CMakeLists.txt), bundling libcudart.so.12 from the NGC operator binaries is intentional to ensure consistency with NGC binaries, even though the target links to CUDA::cudart.**Applied to files:** - `operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt` - `operators/advanced_network/advanced_network/CMakeLists.txt` </details> </details><details> <summary>🧬 Code graph analysis (3)</summary> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (2)</summary><blockquote> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (2)</summary> * `create_udp_pipe` (202-301) * `create_udp_pipe` (202-204) </details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (2)</summary> * `init_doca_flow` (296-378) * `init_doca_flow` (296-296) </details> </blockquote></details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1)</summary><blockquote> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (4)</summary> * `doca_receiver_packet_kernel` (388-423) * `doca_receiver_packet_kernel` (388-391) * `doca_sender_packet_kernel` (425-460) * `doca_sender_packet_kernel` (425-430) </details> </blockquote></details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (1)</summary><blockquote> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (2)</summary> * `doca_receiver_packet_kernel` (388-423) * `doca_receiver_packet_kernel` (388-391) </details> </blockquote></details> </details><details> <summary>🪛 Clang (14.0.6)</summary> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h</summary> [error] 19-19: 'cuda.h' file not found (clang-diagnostic-error) </details> </details> <details> <summary>🪛 GitHub Actions: Check Compliance</summary> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt</summary> [error] 1-1: Copyright header incomplete: current year not included in the header. </details> <details> <summary>operators/advanced_network/advanced_network/CMakeLists.txt</summary> [error] 1-1: Copyright header incomplete: current year not included in the header. </details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h</summary> [error] 1-1: Copyright header incomplete: current year not included in the header. </details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu</summary> [error] 1-1: Copyright header incomplete: current year not included in the header. </details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h</summary> [error] 1-1: Copyright header incomplete: current year not included in the header. </details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp</summary> [error] 1-1: Copyright header incomplete: current year not included in the header. </details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp</summary> [error] 1-1: Copyright header incomplete: current year not included in the header. </details> </details> <details> <summary>🪛 GitHub Actions: Check linting</summary> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu</summary> [warning] 91-91: Tab found; better to use spaces [whitespace/tab] --- [warning] 94-94: Lines should be <= 100 characters long [whitespace/line_length] --- [warning] 94-94: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after [runtime/casting] --- [warning] 105-105: Tab found; better to use spaces [whitespace/tab] --- [warning] 106-106: Tab found; better to use spaces [whitespace/tab] --- [warning] 107-107: Tab found; better to use spaces [whitespace/tab] --- [warning] 107-107: Lines should be <= 100 characters long [whitespace/line_length] --- [warning] 160-160: Should have a space between // and comment [whitespace/comments] --- [warning] 161-161: Lines should be <= 100 characters long [whitespace/line_length] --- [warning] 185-185: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after [runtime/casting] --- [warning] 211-211: Lines should be <= 100 characters long [whitespace/line_length] --- [warning] 223-223: Tab found; better to use spaces [whitespace/tab] --- [warning] 223-223: Lines should be <= 100 characters long [whitespace/line_length] </details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp</summary> [warning] 296-296: Lines should be <= 100 characters long [whitespace/line_length] --- [warning] 347-347: Tab found; better to use spaces [whitespace/tab] --- [warning] 348-348: Tab found; better to use spaces [whitespace/tab] --- [warning] 349-349: Tab found; better to use spaces [whitespace/tab] --- [warning] 349-349: Lines should be <= 100 characters long [whitespace/line_length] --- [warning] 600-600: Lines should be <= 100 characters long [whitespace/line_length] </details> </details> <details> <summary>🪛 Hadolint (2.14.0)</summary> <details> <summary>operators/advanced_network/Dockerfile</summary> [error] 108-108: unexpected 'h' expecting '#', '\', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input (DL1000) </details> </details> </details> <details> <summary>🔇 Additional comments (9)</summary><blockquote> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (1)</summary><blockquote> `243-281`: **Ensure global flow_queue_id and pipe_queues config align with DOCA expectations** `init_doca_flow(struct doca_dev* dev, uint16_t port_id, uint8_t rxq_num)` and the class member `flow_queue_id` together drive queue IDs across ports. Because `doca_flow_init` is guarded by `static bool flow_init`, `pipe_queues` is set only from the first `rxq_num`, while `flow_queue_id` continues increasing as you add UDP pipes and default queues on subsequent ports. Please confirm with DOCA 3.2 docs that: - `pipe_queues` is meant to be per‑process, not per‑port, and - the total number of queues created across all ports will not exceed the configured `pipe_queues` range. If not, you may need to precompute a global RX queue count and/or reset `flow_queue_id` per port. </blockquote></details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1)</summary><blockquote> `28-38`: **Kernel prototypes match implementation** The updated signatures for `doca_receiver_packet_kernel` and `doca_sender_packet_kernel` line up with the `.cu` implementation (added `pkt_gpu_list` / `pkt_idx_list`, and new `pkt_buff_addr`/`pkt_buff_mkey`/`max_pkt_size` parameters), so external callers are consistent. </blockquote></details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (6)</summary><blockquote> `264-277`: **Verify removal of doca_dpdk_port_probe still enables desired flow mode** `doca_dpdk_port_probe(ddev[intf.port_id_], "dv_flow_en=2")` is now commented out. With DOCA 3.2 and the new `doca_flow_port_cfg_set_dev` path, this may indeed be redundant, but if you still rely on DV/HWS mode you should confirm that it is now enabled solely via the new flow API calls. --- `296-378`: **Check DOCA flow global configuration vs per‑port queue usage** `init_doca_flow()` now: - calls `doca_flow_cfg_set_pipe_queues(rxq_flow_cfg, rxq_num)` only on the first invocation, and - uses a single `flow_queue_id` counter across all ports when assigning queue IDs in `create_default_pipe()` and `DocaRxQueue::create_udp_pipe()`. If you have multiple ports or if the sum of (default queues + flow queues) across ports can exceed the first port’s `rxq_num`, you might end up using queue IDs beyond the configured `pipe_queues`, which DOCA could reject or mis-handle. It may be safer to precompute a global RX queue count (or maximum per port) and use that when initializing DOCA Flow. Also applies to: 786-846 --- `569-576`: **Device‑aware flow initialization and queue wiring look coherent** The new path: - creates `doca_gpu` handles per GPU after `cudaSetDevice/cudaFree(0)`, - initializes a DOCA flow port via `init_doca_flow(ddev[port], port_id, rxq_count)`, - passes that `df_port` into each `DocaRxQueue`, and - wires per‑flow UDP pipes via `create_udp_pipe(flow, rxq_pipe_default, flow_queue_id)` and packet lists via `create_rx_packet_list()`, is internally consistent with the `adv_network_doca_mgr_obj.cpp` snippet you provided. As long as the DOCA side queue ID and `df_port` semantics match the new 3.2 APIs, this structure looks sound. Also applies to: 598-605, 663-670, 720-737 --- `1489-1492`: **TX kernel warmup and new sender arguments look correct** The warmup call and the main `doca_sender_packet_kernel` invocation match the updated signature (stream, txq, `gpu_pkt_addr`, `pkt_mkey`, indices, `max_pkt`, `max_pkt_size`, lengths, `set_completion`). As long as `pkt_mkey` and `gpu_pkt_addr` are set appropriately in `DocaTxQueue` construction, this TX path aligns with the new DOCA 3.2 sender kernel. Also applies to: 1518-1527 --- `1024-1027`: **Destructor complements new CUDA host allocations** You now `cudaMallocHost` `burst[idx].pkt_lens[0]` in `initialize()` and free them in `~DocaMgr()` with `cudaFreeHost`, which resolves the prior leak of these pinned buffers. That cleanup looks correct and balanced. Also applies to: 741-757 --- `1783-1795`: **Graceful shutdown logging is a nice addition** The extra logs in `shutdown()` (“stopping cores” and “exit gracefully”) will help correlate teardown behavior with RX/TX worker thread joins in logs; no issues from a correctness perspective. </blockquote></details> <details> <summary>operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (1)</summary><blockquote> `481-538`: **Mirror DMABuf fallback fix in TX path and tighten `mkey` error handling.** The TX queue’s DMABuf logic has the same fallback issue as RX: when `doca_mmap_set_dmabuf_memrange` fails, `dmabuf_fd` stays non‑negative so the legacy `doca_mmap_set_memrange` path is skipped. Updating it similarly will make the behavior robust: ```diff - result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap, - dmabuf_fd, - gpu_pkt_addr, - 0, - tx_buffer_size); - if (result != DOCA_SUCCESS) { - HOLOSCAN_LOG_ERROR("Failed to set dmabuf memrange for mmap {}", doca_error_get_descr(result)); - } + result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap, + dmabuf_fd, + gpu_pkt_addr, + 0, + tx_buffer_size); + if (result != DOCA_SUCCESS) { + HOLOSCAN_LOG_ERROR("Failed to set dmabuf memrange for mmap {}", doca_error_get_descr(result)); + dmabuf_fd = -1; // Trigger nvidia-peermem fallback below + }Additionally,
doca_mmap_get_mkeyfailure currently just logs and leavespkt_mkeypotentially uninitialized but still used. Given that the mkey is critical for NIC access, it’d be safer to treat this as fatal (e.g., log as critical and ensure the queue is not used until creation is retried or aborted).
| dmabuf_fd = -1; | ||
| if (mtype == DOCA_GPU_MEM_TYPE_GPU) { | ||
| /* Map GPU memory buffer used to send packets with DMABuf */ | ||
| result = doca_gpu_dmabuf_fd(gdev, gpu_pkt_addr, cyclic_buffer_size, &(dmabuf_fd)); | ||
| if (result == DOCA_SUCCESS) { | ||
| HOLOSCAN_LOG_INFO("Mapping send queue buffer ({} size {}B dmabuf fd {}) with dmabuf mode", | ||
| gpu_pkt_addr, | ||
| cyclic_buffer_size, | ||
| dmabuf_fd); | ||
|
|
||
| result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap, | ||
| dmabuf_fd, | ||
| gpu_pkt_addr, | ||
| 0, | ||
| cyclic_buffer_size); | ||
|
|
||
| /* If failed, use nvidia-peermem legacy method */ | ||
| result = doca_mmap_set_memrange(pkt_buff_mmap, gpu_pkt_addr, cyclic_buffer_size); | ||
| if (result != DOCA_SUCCESS) { | ||
| HOLOSCAN_LOG_CRITICAL("Failed to set memrange for mmap {}", doca_error_get_descr(result)); | ||
| if (result != DOCA_SUCCESS) { | ||
| HOLOSCAN_LOG_ERROR("Failed to set dmabuf memrange for mmap {}", doca_error_get_descr(result)); | ||
| } | ||
| } | ||
| } | ||
| /* | ||
| } else { | ||
| HOLOSCAN_LOG_INFO("Mapping receive queue buffer (0x{} size {}B dmabuf fd {}) with | ||
| dmabuf mode", gpu_pkt_addr, cyclic_buffer_size, dmabuf_fd); | ||
| result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap, dmabuf_fd, gpu_pkt_addr, 0, | ||
| cyclic_buffer_size); if (result != DOCA_SUCCESS) { HOLOSCAN_LOG_CRITICAL("Failed to set dmabuf | ||
| memrange for mmap {}", doca_error_get_descr(result)); | ||
| } | ||
| } | ||
| */ | ||
|
|
||
| if (dmabuf_fd == -1) { | ||
| HOLOSCAN_LOG_INFO("Mapping send queue buffer ({} size {}B) with nvidia-peermem mode", | ||
| gpu_pkt_addr, | ||
| cyclic_buffer_size); | ||
|
|
||
| /* If failed, use nvidia-peermem legacy method */ | ||
| result = doca_mmap_set_memrange(pkt_buff_mmap, gpu_pkt_addr, cyclic_buffer_size); | ||
| if (result != DOCA_SUCCESS) { | ||
| HOLOSCAN_LOG_ERROR("Failed to set memrange for mmap {}", doca_error_get_descr(result)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Enable peermem fallback when DMABuf set_dmabuf_memrange fails.
Right now, if doca_gpu_dmabuf_fd succeeds but doca_mmap_set_dmabuf_memrange fails, dmabuf_fd stays non‑negative, so the peermem fallback (doca_mmap_set_memrange) is skipped and the mmap is left misconfigured. That defeats the purpose of the fallback path.
Reset dmabuf_fd on failure so the legacy path is used:
- result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap,
- dmabuf_fd,
- gpu_pkt_addr,
- 0,
- cyclic_buffer_size);
- if (result != DOCA_SUCCESS) {
- HOLOSCAN_LOG_ERROR("Failed to set dmabuf memrange for mmap {}", doca_error_get_descr(result));
- }
+ result = doca_mmap_set_dmabuf_memrange(pkt_buff_mmap,
+ dmabuf_fd,
+ gpu_pkt_addr,
+ 0,
+ cyclic_buffer_size);
+ if (result != DOCA_SUCCESS) {
+ HOLOSCAN_LOG_ERROR("Failed to set dmabuf memrange for mmap {}", doca_error_get_descr(result));
+ dmabuf_fd = -1; // Trigger nvidia-peermem fallback below
+ }Also, the log messages talk about “send queue buffer” here even though this is the RX queue; consider updating the text for clarity.
I'll verify the review comment by examining the actual code file and confirming the logic flow.
---
🏁 Script executed:
```shell
#!/bin/bash
# View the file context around lines 102-133 to verify current state and surrounding code
cat operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp | head -150 | tail -80
Length of output: 2899
🏁 Script executed:
#!/bin/bash
# Search for dmabuf_fd usage throughout the file to understand control flow
rg -n "dmabuf_fd" operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp -B2 -A2Length of output: 1586
🏁 Script executed:
#!/bin/bash
# Check context around this function to determine if it's RX or TX queue
rg -n -B20 "doca_gpu_dmabuf_fd" operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp | head -40Length of output: 1621
Reset dmabuf_fd on failure to enable the peermem fallback.
The bug is confirmed: if doca_mmap_set_dmabuf_memrange fails at lines 112-120, dmabuf_fd remains non-negative, so the fallback check at line 123 (if (dmabuf_fd == -1)) evaluates false and skips the doca_mmap_set_memrange call. The mmap is left misconfigured.
Add dmabuf_fd = -1; in the error handler (after line 120) to trigger the fallback.
The same bug exists in the TX queue setup around line 499-507 and should be fixed identically.
Also, the log messages at lines 107 and 124 incorrectly say "send queue buffer" for the RX queue—update to "receive queue buffer" or "RX queue buffer" for clarity.
🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp
around lines 102-133, when doca_mmap_set_dmabuf_memrange fails you must reset
dmabuf_fd = -1 in that error path so the peermem fallback executes; update the
error handler immediately after the failing call to set dmabuf_fd = -1 and keep
the existing error log. Apply the identical fix to the TX queue setup around
lines ~499-507 (reset its dmabuf_fd on doca_mmap_set_dmabuf_memrange failure so
the fallback doca_mmap_set_memrange runs). Also correct the log messages at
lines ~107 and ~124 to say "receive queue buffer" or "RX queue buffer" instead
of "send queue buffer".
| uintptr_t *pkt_cpu_list, *pkt_gpu_list; | ||
| uint32_t *pkt_idx_cpu_list, *pkt_idx_gpu_list; | ||
| uint32_t *batch_cpu_list, *batch_gpu_list; | ||
| uint32_t *cpu_exit_condition, *gpu_exit_condition; | ||
| // int sem_idx[MAX_NUM_RX_QUEUES] = {0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Potential misuse of GPU vs CPU pointers for RX packet stats (likely crash)
In rx_core:
- You allocate per‑queue pointer arrays with
doca_gpu_mem_alloc(..., &pkt_gpu_list, &pkt_cpu_list);
- You then set:
pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_gpu;
- Later, on the CPU, you do:
packets_stats = &((adv_doca_rx_gpu_info*)(pkt_cpu_list[ridx]))[pkt_idx_cpu_list[ridx]];- and read/write
packets_stats->status,num_pkts, etc.
If DocaRxQueue::pkt_list_gpu is indeed a device pointer (as the name and presence of a separate pkt_list_cpu member suggest), this is dereferencing a GPU address on the host, which is undefined behavior and will typically crash. It also leaves the host-mapped pkt_list_cpu member unused in this function.
I’d strongly recommend:
- Using
pkt_list_cpuinrx_corefor the host view:
- pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_gpu;
+ pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_cpu;- Verifying that the GPU kernel sees the correct (device) base addresses for the same arrays via
pkt_gpu_list.
Until this is confirmed/fixed, the RX packet-list path is very likely incorrect.
Also applies to: 1243-1265, 1279-1284, 1338-1390, 1399-1403, 1411-1415
I'll verify this critical issue by examining the codebase structure and the actual code patterns mentioned. Let me start with investigation scripts:
---
🏁 Script executed:
```shell
#!/bin/bash
# Find and display DocaRxQueue structure definition
rg -n "struct DocaRxQueue|class DocaRxQueue" -A 15 --type=cpp --type=h
Length of output: 2253
🏁 Script executed:
#!/bin/bash
# Extract the exact code at the flagged primary lines 1169-1173
sed -n '1165,1180p' operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cppLength of output: 553
🏁 Script executed:
#!/bin/bash
# Look for the assignment pattern: pkt_cpu_list[idx] = ... pkt_list_gpu
rg -n "pkt_cpu_list\[.*\]\s*=.*pkt_list" -C 3 --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/Length of output: 935
🏁 Script executed:
#!/bin/bash
# Look for CPU-side dereferencing of pkt_cpu_list: packets_stats = &((...pkt_cpu_list...)[...])
rg -n "packets_stats\s*=.*pkt_cpu_list|pkt_cpu_list.*\[.*\]" -C 3 --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/Length of output: 3999
🏁 Script executed:
#!/bin/bash
# Check all flagged line ranges for the pattern
for range in "1243,1265" "1279,1284" "1338,1390" "1399,1403" "1411,1415"; do
IFS=',' read start end <<< "$range"
echo "=== Lines $start-$end ==="
sed -n "${start},${end}p" operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp | head -20
doneLength of output: 2893
---
🏁 Script executed:
```shell
#!/bin/bash
# Get full DocaRxQueue structure definition
rg -n "struct DocaRxQueue|class DocaRxQueue" -A 50 --type=h operators/advanced_network/advanced_network/managers/gpunetio/
Length of output: 7564
🏁 Script executed:
#!/bin/bash
# Search for pkt_list_cpu vs pkt_list_gpu in DocaRxQueue
rg -n "pkt_list_gpu|pkt_list_cpu" --type=cpp --type=h operators/advanced_network/advanced_network/managers/gpunetio/ -B 2 -A 2Length of output: 3244
🏁 Script executed:
#!/bin/bash
# Look for GPU kernel code that might use pkt_gpu_list or pkt_list_gpu
rg -n "__global__|__device__" -A 20 --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/ | grep -A 20 -E "pkt_|kernel"Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check how pkt_list_gpu is initialized in DocaRxQueue constructor or create_rx_packet_list
rg -n "pkt_list_gpu\s*=" -B 5 -A 5 --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/Length of output: 0
---
🏁 Script executed:
```shell
#!/bin/bash
# Confirm pkt_gpu_list is passed to GPU kernel (checking kernel launch or params)
rg -n "pkt_gpu_list" -B 3 -A 3 --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp | head -50
Length of output: 2243
🏁 Script executed:
#!/bin/bash
# Verify similar misuse at secondary lines - check line 1243 context for allocation
sed -n '1240,1275p' operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cppLength of output: 1569
🏁 Script executed:
#!/bin/bash
# Double-check if pkt_list_cpu is ever used in rx_core (should be used instead of pkt_list_gpu)
rg -n "pkt_list_cpu" --type=cpp operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cppLength of output: 0
Fix CPU-side dereferencing of GPU pointers (lines 1281, and related)
The review correctly identifies a critical pointer bug. DocaRxQueue provides both pkt_list_gpu and pkt_list_cpu members from unified memory (DOCA_GPU_MEM_TYPE_CPU_GPU at adv_network_doca_mgr_obj.cpp:308). At line 1281, the code mistakenly stores pkt_list_gpu (GPU-addressable view) into the CPU-side array:
pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_gpu;The CPU then dereferences this at lines 1338 and 1399, reading from packets_stats->status and ->num_pkts. This dereferences a GPU pointer on the host, which is undefined behavior and will crash.
Use pkt_list_cpu instead—the host-mapped view designed for CPU access:
- pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_gpu;
+ pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_cpu;The GPU kernel correctly receives pkt_gpu_list at lines 1311 and 1324 for device-side operations.
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca | ||
| host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | ||
| && apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | ||
| && apt-get update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix line continuation syntax error in wget URL.
Line 107 breaks the URL without a line continuation character (\). The wget invocation is malformed: the URL ends with doca and line 108 orphans host_3.2.0.... This causes a Hadolint syntax error (DL1000).
Apply this diff to fix the URL:
-RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
+RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \If the line is too long, use a backslash escape:
-RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca
-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
+RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca- \
+ host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca | |
| host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | |
| && apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | |
| && apt-get update | |
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | |
| && apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | |
| && apt-get update |
🧰 Tools
🪛 Hadolint (2.14.0)
[error] 108-108: unexpected 'h'
expecting '#', '', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input
(DL1000)
🤖 Prompt for AI Agents
In operators/advanced_network/Dockerfile around lines 107 to 110 the wget URL is
split across lines without a backslash, leaving "doca" on one line and orphaning
"host_3.2.0..." which breaks the command and triggers Hadolint DL1000; fix it by
joining the URL into a single continuous string or add a line-continuation
backslash at the end of the first line so wget receives the full URL, then keep
the subsequent && apt-get install -y ./doca-host_3.2.0... and && apt-get update
on following lines as before.
WalkthroughDOCA version upgraded from 2.8.0 to 3.2.0 with corresponding dependencies updated. GPU kernel implementations refactored to transition from semaphore-based to packet-list-based packet tracking, with kernel signatures updated to use address-based buffer access and per-block packet statistics. CUDA compilation flags restructured, and DOCA flow management APIs updated accordingly. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Areas requiring extra attention:
Suggested reviewers
Pre-merge checks✅ Passed checks (2 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
operators/advanced_network/advanced_network/CMakeLists.txt (1)
1-1: Update copyright year in header.The copyright header lists 2023, but pipeline compliance requires the current year. Update to include 2025.
-# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt (1)
1-1: Update copyright year in header.The copyright header lists 2023, but pipeline compliance requires the current year. Update to include 2025.
-# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (2)
67-73: Ensureadv_doca_rx_gpu_info::statusis initialized after allocation
statusis read on the CPU side inrx_coreto detectDOCA_GPU_SEMAPHORE_STATUS_READY, but the packet-list allocation path (seecreate_rx_packet_listinadv_network_doca_mgr_obj.cpp) does not currently initialize these entries. That risks undefined initial values occasionally comparing equal toREADY, leading to bogus bursts being processed.Please zero‑initialize the packet list (or explicitly set
status = DOCA_GPU_SEMAPHORE_STATUS_FREE) immediately after allocation.
127-159: Critical: Memory leak inpkt_list_gpu/pkt_list_cpuand architectural issue withflow_queue_idscopeThe destructor (
~DocaRxQueue, lines 178–200 of adv_network_doca_mgr_obj.cpp) never callsdestroy_rx_packet_list(), andDocaMgr::shutdown()(lines 1778–1795) does not iterate through the rx queue map to clean up individual queues. Sincecreate_rx_packet_list()allocates GPU/CPU memory viadoca_gpu_mem_alloc()(line 308), this memory leaks on every queue destruction.Additionally,
flow_queue_idis a global counter inDocaMgr(line 280) shared across all ports and incremented by bothcreate_udp_pipe()andcreate_default_pipe(). However, DOCA queue IDs are local to each port/engine instance, not globally unique. Using a single counter for all ports violates DOCA's per-port semantics and will cause incorrect queue assignment.Fixes required:
- Call
destroy_rx_packet_list()in~DocaRxQueue()destructor.- Ensure
DocaMgr::shutdown()cleans up all allocated queues via the rx_q_map_.- Replace the global
flow_queue_idcounter with a per-port counter to match DOCA semantics.- Line 132 exceeds the 100-character lint limit (currently 120 chars); wrap the
create_udp_pipedeclaration.- Update SPDX header (line 2) to include 2025: change
2023to2023-2025.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (3)
710-737: Handlecreate_udp_pipe/create_rx_packet_listfailures
q_backend->create_udp_pipe(flow, rxq_pipe_default, flow_queue_id);andq_backend->create_rx_packet_list();both returndoca_error_tbut their results are ignored. If either fails, the RX worker will later assume valid pipes and packet lists, leading to hard‑to‑diagnose runtime failures.Please:
- Capture and check the return values.
- Log and abort initialization (or at least skip that queue) on error.
1398-1415: Reusepkt_list_cpupointer for post‑loop checksThe post‑loop flush uses the same pattern:
packets_stats = &((struct adv_doca_rx_gpu_info *)(pkt_cpu_list[ridx]))[pkt_idx_cpu_list[ridx]]; status = DOCA_GPUNETIO_VOLATILE(packets_stats->status);This will be correct once
pkt_cpu_listis fixed to holdpkt_list_cpu, but until then it shares the same host/GPU pointer bug. Ensure the earlier fix is applied here as well.
1158-1183: RX worker: host uses GPU pointer instead of CPU alias for packet listIn
rx_coreat line 1281,pkt_cpu_listis assigned the GPU pointer (pkt_list_gpu) instead of the CPU alias (pkt_list_cpu):pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_gpu; // WRONGThis pointer is then dereferenced on the host at lines 1338 and 1399 when accessing
packets_stats. Sincedoca_gpu_mem_allocallocates both GPU and CPU pointers for the same memory region (as confirmed in the allocation atadv_network_doca_mgr_obj.cpp:308), dereferencing the GPU address on the host is undefined behavior and will read from invalid memory.Fix:
pkt_cpu_list[idx] = (uintptr_t)tparams->rxqw[idx].rxq->pkt_list_cpu;Also update the debug log at line 1367 to avoid printing the pointer as an integer.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (1)
1-16: Update SPDX years for complianceThis file’s SPDX header only mentions 2023; the compliance check expects the current year (2025) to be included as well (e.g.,
2023-2025).Same applies to the other touched files in this PR.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (2)
1-16: Update SPDX years and address lint complaintsAs in other files, add 2025 to the SPDX copyright line.
Also, the lint warnings about tabs, long lines, and comment spacing (
//Eth + IP ...) can be fixed mechanically to keep CI green.
210-305: Move null-check before dereferencingeth_rxq_gpuin both kernels; ensure device-sidepkt_idx_gpu_listis zero-initializedThe null-check concern applies to both persistent and non-persistent kernels. In both,
eth_rxq_gpu[blockIdx.x]is dereferenced before theif (eth_rxq_gpu == NULL)guard, creating a race condition. Reorder the check before the dereference.Regarding
pkt_idx_gpu_list: The CPU view is initialized at line 1282 in rx_core, but the GPU view allocated bydoca_gpu_mem_alloc()is not explicitly zeroed. Absent documentation confirming zero-initialization by the DOCA library, add an explicitcudaMemset()on the GPU memory after allocation to ensure the device-side array is initialized before kernel execution.
🧹 Nitpick comments (13)
operators/advanced_network/Dockerfile (1)
107-110: Consider parameterizing the hardcoded artifact URL and version string.The download URL and
.debfilename are tightly coupled to a specific DOCA build artifact version (3.2.0-122000-25.10). This introduces maintenance and reproducibility challenges.Consider extracting these as build arguments or derived variables:
ARG DOCA_VERSION=3.2.0 +ARG DOCA_BUILD_RELEASE=122000 +ARG DOCA_BUILD_DATE=25.10 -RUN wget ... doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ - && apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ +RUN wget -q https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-${DOCA_VERSION}/doca-host_${DOCA_VERSION}-${DOCA_BUILD_RELEASE}-${DOCA_BUILD_DATE}-ubuntu2204_amd64.deb \ + && apt-get install -y ./doca-host_${DOCA_VERSION}-${DOCA_BUILD_RELEASE}-${DOCA_BUILD_DATE}-ubuntu2204_amd64.deb \This allows easier bumping of DOCA releases and build identifiers without editing the RUN command inline.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1)
28-38: Consider const‑qualifying read‑only kernel parametersThe updated declarations match the implementations, but several pointers (
pkt_gpu_list,pkt_idx_list,batch_list,gpu_pkts_len) are not mutated by the kernels and could be markedconstat the interface to better document ownership and avoid accidental writes later.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (3)
65-65: ALIGN_SIZE macro is non‑idiomatic and assumes simple lvalues
ALIGN_SIZE(size, align)mutatessizeand evaluates the argument multiple times. It’s fine as currently used with plain variables, but it’s fragile if someone later passes an expression.Consider replacing it with an inline function:
inline uint64_t align_size(uint64_t size, uint64_t align) { return ((size + align - 1) / align) * align; }and assigning explicitly at call sites.
161-187: Clarify semantics and use ofpkt_mkey
DocaTxQueuenow exposespkt_mkeythat is consumed by the send kernel. This is fine, but it’s effectively part of the public contract between host code and the CUDA kernel. Consider documenting thatpkt_mkeymust be an mkey in network byte order (as enforced in the.cpp), to prevent misuse if future code constructsDocaTxQueuedifferently.
247-281: Document ownership offlow_queue_idin DocaMgr
init_doca_flownow takes astruct doca_dev*andDocaMgrintroduces a memberuint16_t flow_queue_id = 0shared across default and per‑flow pipes. Because this ID is plumbed into DOCA RSS queues viadoca_eth_rxq_apply_queue_id, its expected lifetime and uniqueness constraints should be explicit (e.g., reset per port vs global across all ports).Adding a brief comment on the intended scope of
flow_queue_idwould make the design clearer and help future DOCA upgrades.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (4)
296-343: DOCA 3.2 flow init logic looks sound; consider minor cleanupThe refactored
init_doca_flowcorrectly:
- Creates/destroys
doca_flow_cfgonce (viastatic bool flow_init).- Sets queues, mode args, and counters.
- Uses
doca_flow_port_cfg_create/set_port_id/set_dev/port_startand destroysport_cfgon all error paths.Behaviorally this is fine. You may want to:
- Use
nullptrinstead ofNULLfor C++ consistency.- Normalize logging to
{}formatting in all branches.No functional blockers here.
346-374: Replace tabs and long lines in flow‑port setup to satisfy lintThe new
doca_flow_port_cfg_*block uses tabs and some lines exceed 100 chars, which is exactly what the lint warnings are flagging. Re‑indent with spaces and wrap the log messages/argument lists to avoid future CI failures.No behavior change required.
598-605: init_doca_flow call wiring looks correctPassing
ddev[intf.port_id_]intoinit_doca_flowaligns with the updated signature and associates each DOCA flow port with its underlying DOCA device. The null check and early abort on failure are appropriate.Nothing to change here beyond potential line‑wrapping for lint.
1794-1795: Shutdown path is clean; consider also freeing RX packet lists
DocaMgr::shutdownnow logs “exit gracefully” after joining worker threads. Given the new RX packet-list allocations are owned byDocaRxQueue, this is a good place to ensure all queues have destroyed their packet lists (if not already handled elsewhere), to avoid leaks over multiple init/shutdown cycles.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (3)
73-133: RX cyclic buffer sizing and DMAbuf mapping look reasonableThe sequence:
doca_eth_rxq_estimate_packet_buf_size(...)intocyclic_buffer_size.ALIGN_SIZE(cyclic_buffer_size, get_host_page_size());doca_gpu_mem_allocongdevwith that size.- Optional DMAbuf mapping via
doca_gpu_dmabuf_fdanddoca_mmap_set_dmabuf_memrange, with fallback todoca_mmap_set_memrange.is consistent and has error logging on each DOCA call. No functional issues stand out here; just consider rewording the log messages (“send queue buffer”) to avoid confusion on the RX path.
202-301: UDP pipe creation aligns with new DOCA flow APIs; minor logging/format nitsThe updated
create_udp_pipe:
- Uses
parser_meta.outer_l3_type/outer_l4_typefor IPv4/UDP.- Applies a unique
flow_queue_idviadoca_eth_rxq_apply_queue_id.- Sets forward action to RSS over a single queue and miss to default pipe or drop.
- Creates a pipe, adds an entry, and processes entries with timeout
default_flow_timeout_usec.Logic looks good. A few small cleanups you may want:
- Replace
printf("UDP pipe queue %d\n", flow_queue_id);withHOLOSCAN_LOG_DEBUGfor consistent logging.- Use
{}formatting instead of%sin HOLOSCAN logs for DOCA errors.- Consider documenting that
flow_queue_idmust remain in sync with the default‑pipe logic inDocaMgr.
408-542: TX queue buffer sizing, DMAbuf mapping, and mkey handling look goodThe DocaTxQueue constructor:
- Aligns
tx_buffer_sizeto host page size viaALIGN_SIZE.- Allocates GPU/CPU memory with
doca_gpu_mem_alloc.- Attempts DMAbuf mapping and falls back to
doca_mmap_set_memrangewith logging on failure.- Starts the mmap and retrieves
pkt_mkeyviadoca_mmap_get_mkey, converting it to network byte order.This is consistent with how the send kernel expects
pkt_buff_addrandpkt_buff_mkey. Main follow‑ups:
- As with RX, confirm whether DMAbuf FDs need explicit closing once the mmap is set (depends on DOCA API).
- Optionally document that
pkt_mkeyis stored already in network byte order for downstream users.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (1)
315-384: send_packets_kernel is consistent with TX buffer layoutThe send kernel:
- Reserves
num_pktsWQ slots once (base_wqe_idxin shared mem).- Computes each packet addr as
pkt_buff_addr + max_pkt_size * ((pkt_idx + gpu_pkt0_idx) % max_pkts), matching the circular buffer logic inget_tx_packet_burst.- Prepares send WQEs with or without completion flag and finally calls
doca_gpu_dev_eth_txq_submit.This matches the host-side expectations in
tx_coreandDocaTxQueue. Just fix tabs/line length per lint and consider splitting the longdoca_gpu_dev_eth_txq_wqe_prepare_sendcalls across lines for readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
operators/advanced_network/Dockerfile(3 hunks)operators/advanced_network/advanced_network/CMakeLists.txt(1 hunks)operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt(1 hunks)operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu(11 hunks)operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h(1 hunks)operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp(25 hunks)operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h(6 hunks)operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp(12 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-22T23:47:42.896Z
Learnt from: cdinea
Repo: nvidia-holoscan/holohub PR: 1170
File: applications/video_streaming/video_streaming_server/cpp/CMakeLists.txt:109-111
Timestamp: 2025-10-22T23:47:42.896Z
Learning: In the video streaming server application (applications/video_streaming/video_streaming_server/cpp/CMakeLists.txt), bundling libcudart.so.12 from the NGC operator binaries is intentional to ensure consistency with NGC binaries, even though the target links to CUDA::cudart.
Applied to files:
operators/advanced_network/advanced_network/CMakeLists.txtoperators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt
🧬 Code graph analysis (3)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h (1)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (4)
doca_receiver_packet_kernel(388-423)doca_receiver_packet_kernel(388-391)doca_sender_packet_kernel(425-460)doca_sender_packet_kernel(425-430)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h (2)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (6)
create_udp_pipe(202-301)create_udp_pipe(202-204)create_rx_packet_list(303-315)create_rx_packet_list(303-303)destroy_rx_packet_list(317-327)destroy_rx_packet_list(317-317)operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (2)
init_doca_flow(296-378)init_doca_flow(296-296)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (1)
operators/advanced_network/advanced_network/managers/rivermax/rivermax_mgr_impl/adv_network_rivermax_mgr.cpp (1)
cfg(95-95)
🪛 Clang (14.0.6)
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h
[error] 19-19: 'cuda.h' file not found
(clang-diagnostic-error)
🪛 GitHub Actions: Check Compliance
operators/advanced_network/advanced_network/CMakeLists.txt
[error] 1-1: Current year not included in the copyright header.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp
[error] 1-1: Current year not included in the copyright header.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h
[error] 1-1: Current year not included in the copyright header.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h
[error] 1-1: Current year not included in the copyright header.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp
[error] 1-1: Current year not included in the copyright header.
operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt
[error] 1-1: Current year not included in the copyright header.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu
[error] 1-1: Current year not included in the copyright header.
🪛 GitHub Actions: Check linting
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp
[error] 296-296: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 347-347: Tab found; better to use spaces. [whitespace/tab]
[error] 348-348: Tab found; better to use spaces. [whitespace/tab]
[error] 349-349: Tab found; better to use spaces. [whitespace/tab]
[error] 360-360: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 362-362: Tab found; better to use spaces. [whitespace/tab]
[error] 363-363: Tab found; better to use spaces. [whitespace/tab]
[error] 372-372: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 384-384: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 842-842: Tab found; better to use spaces. [whitespace/tab]
[error] 843-843: Tab found; better to use spaces. [whitespace/tab]
[error] 844-844: Tab found; better to use spaces. [whitespace/tab]
[error] 845-845: Tab found; better to use spaces. [whitespace/tab]
[error] 859-859: Tab found; better to use spaces. [whitespace/tab]
[error] 860-860: Tab found; better to use spaces. [whitespace/tab]
[error] 861-861: Tab found; better to use spaces. [whitespace/tab]
[error] 862-862: Tab found; better to use spaces. [whitespace/tab]
[error] 863-863: Tab found; better to use spaces. [whitespace/tab]
[error] 864-864: Tab found; better to use spaces. [whitespace/tab]
[error] 865-865: Tab found; better to use spaces. [whitespace/tab]
[error] 866-866: Tab found; better to use spaces. [whitespace/tab]
[error] 867-867: Tab found; better to use spaces. [whitespace/tab]
[error] 868-868: Tab found; better to use spaces. [whitespace/tab]
[error] 969-969: Tab found; better to use spaces. [whitespace/tab]
[error] 970-970: Tab found; better to use spaces. [whitespace/tab]
[error] 971-971: Tab found; better to use spaces. [whitespace/tab]
[error] 972-972: Tab found; better to use spaces. [whitespace/tab]
[error] 973-973: Tab found; better to use spaces. [whitespace/tab]
[error] 974-974: Tab found; better to use spaces. [whitespace/tab]
[error] 1338-1338: Lines should be <= 100 characters long. [whitespace/line_length]
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.h
[error] 132-132: Lines should be <= 100 characters long [whitespace/line_length]
[error] 156-156: Lines should be <= 100 characters long [whitespace/line_length]
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp
[error] 37-37: { should almost always be at the end of the previous line [whitespace/braces]
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu
[error] 91-91: Tab found; better to use spaces. [whitespace/tab]
[error] 94-94: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 94-94: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after. [runtime/casting]
[error] 105-105: Tab found; better to use spaces. [whitespace/tab]
[error] 106-106: Tab found; better to use spaces. [whitespace/tab]
[error] 107-107: Tab found; better to use spaces. [whitespace/tab]
[error] 107-107: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 160-160: Should have a space between // and comment. [whitespace/comments]
[error] 161-161: Lines should be <= 100 characters long. [whitespace/line_length]
185-185: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after. [runtime/casting]
[error] 211-211: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 221-221: Tab found; better to use spaces. [whitespace/tab]
[error] 223-223: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 223-223: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after. [runtime/casting]
[error] 232-232: Tab found; better to use spaces. [whitespace/tab]
[error] 233-233: Tab found; better to use spaces. [whitespace/tab]
[error] 234-234: Tab found; better to use spaces. [whitespace/tab]
[error] 234-234: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 284-284: Should have a space between // and comment. [whitespace/comments]
[error] 285-285: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 317-317: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 336-336: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 340-340: Tab found; better to use spaces. [whitespace/tab]
[error] 340-340: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 342-342: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 347-347: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 451-451: Lines should be <= 100 characters long. [whitespace/line_length]
🪛 Hadolint (2.14.0)
operators/advanced_network/Dockerfile
[error] 108-108: unexpected 'h'
expecting '#', '', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input
(DL1000)
🔇 Additional comments (12)
operators/advanced_network/advanced_network/CMakeLists.txt (1)
51-56: CUDA compilation settings are commented out—clarify intent.Lines 51-52 show CUDA_SEPARABLE_COMPILATION and CUDA_RESOLVE_DEVICE_SYMBOLS commented out. The AI summary describes these as "moved to standalone lines," but they appear disabled rather than relocated. For a DOCA 3.2 upgrade with GPU kernel refactoring, these settings are typically critical for proper device code compilation.
Verify:
- Are these settings now configured at global scope elsewhere (e.g., parent CMakeLists.txt)?
- Should they be uncommented?
- Or are they no longer needed with DOCA 3.2?
operators/advanced_network/advanced_network/managers/gpunetio/CMakeLists.txt (2)
52-58: CUDA compilation settings are commented out—clarify intent.Lines 52-53 show CUDA_SEPARABLE_COMPILATION and CUDA_RESOLVE_DEVICE_SYMBOLS commented out, matching the same pattern in the parent CMakeLists.txt. Given the GPU kernel refactoring mentioned in the PR summary for DOCA 3.2 (semaphore-based to packet-list-based transition), these settings are typically essential for device code separation and symbol resolution.
Verify:
- Should these be uncommented and set at the global CMake scope?
- Are they redundant with DOCA 3.2 defaults?
- Is this change consistent across all manager implementations?
21-21: Verify DOCA 3.2 API compatibility for removed device library linkage.Per the AI summary,
-ldoca_gpunetio_devicewas removed from the target_link_libraries. This device-side static library linkage is significant, as the PR involves GPU kernel refactoring (semaphore-based to packet-list-based transitions). Confirm that this removal aligns with DOCA 3.2 API/ABI changes and doesn't break kernel symbol resolution or initialization.Also applies to: 65-71
operators/advanced_network/Dockerfile (2)
83-89: Version bump and early dependency installation look appropriate.The DOCA version upgrade from 2.8.0 to 3.2.0 and early installation of
build-essentialandwgetalign with the PR objectives and support the subsequent DOCA host package download step. The placement in thecommon-depslayer is correct.
131-138: Package list updates align with DOCA 3.2.0 upgrade.The addition of
mlnx-dpdk,doca-sdk-gpunetio,doca-sdk-eth, anddoca-sdk-flow(and their-devvariants) reflects the transition from the semaphore/buffer-based DOCA 2.8.0 APIs to the packet-list and GPUNetIO architecture in DOCA 3.2.0. The removal oflibdoca-sdk-dma-devis consistent with this shift.operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp (3)
1337-1390: Packet‑list polling logic is correct but depends on proper initializationThe loop that:
- Reads
packets_stats->status.- Enqueues bursts when
status == DOCA_GPU_SEMAPHORE_STATUS_READY.- Resets
statustoFREEand advancespkt_idx_cpu_list.is structurally sound and matches the new packet‑list model. Its correctness hinges on two conditions:
- Packet‑list entries are zero‑initialized / set to
FREEat creation (see earlier comment).- CPU and GPU agree on index sequencing (host vs device ring index).
Once the initialization issues are addressed, this logic should behave as intended.
1024-1027: DocaMgr destructor now frees TX host buffers – goodThe added loop freeing
burst[idx].pkt_lens[0]withcudaFreeHostmatches the earliercudaMallocHostinsetup_pools_and_rings. That closes a prior leak for TX metadata buffers.No further changes needed here.
1279-1284: Review comment is factually correct; GPU-side initialization concern is validThe verification confirms the review's claims:
Persistent kernel ignores
pkt_idx_list: Confirmed. The kernel declares a localint pkt_idx = 0and manages it independently, never reading thepkt_idx_listparameter.Non-persistent kernel uses
pkt_idx_list: Confirmed. Line 223 ofadv_network_doca_kernels.cureadspkt_idx_list[blockIdx.x]directly, and line 293–297 logs it.Lack of explicit GPU-side initialization: Confirmed. Only the CPU-side
pkt_idx_cpu_list[idx] = 0is set (line 1282). No explicitcudaMemcpyorcudaMemsetinitializes the GPU copy before kernel launch.Timing concern with warmup kernel: If
ADV_NETWORK_MANAGER_WARMUP_KERNELis enabled, the warmup kernel (line 1307–1315) runs withpersistent=false, invoking the non-persistent kernel that does accesspkt_idx_gpu_list. This occurs after CPU initialization but without an explicit synchronization barrier.The suggestion to explicitly zero-initialize both CPU and GPU copies is reasonable defensive programming, especially for DOCA's coherent memory model where explicit synchronization may be necessary.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp (1)
545-572: DocaTxQueue destructor correctly unwinds DOCA resourcesStopping/destroying the ctx, destroying the eth txq, destroying the mmap, freeing GPU memory, and destroying the PE in this order looks correct and matches the allocations in the constructor.
No issues here.
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu (3)
90-107: Persistent RX kernel logic matches new packet‑list modelThe use of
out_first_pkt_idx,out_pkt_num, anddoca_gpu_dev_eth_rxq_get_pkt_addralong with per‑blockadv_doca_rx_gpu_infoupdates is consistent with the host‑side polling logic:
- First packet’s addr/idx stored once per batch.
- Total bytes aggregated with
atomicAdd_block.- When accumulated
tot_pkts_batch >= batch_list[blockIdx.x], stats are written, a fence is released, andstatusis set toREADY.Once the host‑side pointer and initialization issues are addressed, this kernel should behave correctly.
388-423: Receiver kernel wrapper updates look correct
doca_receiver_packet_kernel:
- Validates
rxqnandgpu_exit_condition.- Checks for a prior CUDA error via
cudaGetLastError.- Launches either the persistent or non‑persistent kernel with the updated parameter lists.
- Checks for post‑launch CUDA errors.
The new
pkt_gpu_list/pkt_idx_listparameters are wired correctly from the manager code. No changes needed beyond style/lint.
425-460: Sender kernel wrapper wiring matches new address‑based API
doca_sender_packet_kernelnow:
- Accepts
pkt_buff_addr,pkt_buff_mkey,gpu_pkt0_idx,max_pkts,max_pkt_size.- Launches
send_packets_kernel<<<1, CUDA_BLOCK_THREADS>>>with those parameters.- Checks CUDA errors before and after launch.
This matches how
DocaTxQueueprepares the buffer andpkt_mkey. All good here.
| __global__ void receive_packets_kernel_persistent(int rxqn, uintptr_t* eth_rxq_gpu, | ||
| uintptr_t* sem_gpu, uint32_t* sem_idx_list, | ||
| uintptr_t* pkt_gpu_list, uint32_t* pkt_idx_list, | ||
| const uint32_t* batch_list, uint32_t* exit_cond) { | ||
| doca_error_t ret; | ||
| struct doca_gpu_buf* buf_ptr = NULL; | ||
| uintptr_t buf_addr; | ||
| uint64_t buf_idx; | ||
| struct doca_gpu_eth_rxq* rxq = (struct doca_gpu_eth_rxq*)eth_rxq_gpu[blockIdx.x]; | ||
| struct doca_gpu_semaphore_gpu* sem = (struct doca_gpu_semaphore_gpu*)sem_gpu[blockIdx.x]; | ||
| int sem_idx = 0; | ||
| __shared__ struct adv_doca_rx_gpu_info* stats_global; | ||
| #if DOCA_DEBUG_KERNEL == 1 | ||
| int pkt_idx = 0; | ||
| struct eth_ip_udp_hdr* hdr; | ||
| uint8_t* payload; | ||
| #endif | ||
| __shared__ uint32_t rx_pkt_num; | ||
|
|
||
| __shared__ uint64_t out_first_pkt_idx; | ||
| __shared__ uint32_t out_pkt_num; | ||
| __shared__ uint32_t rx_pkt_bytes; | ||
| __shared__ uint64_t rx_buf_idx; | ||
| uint32_t pktb = 0; | ||
| uint32_t tot_pkts_batch = 0; | ||
| struct adv_doca_rx_gpu_info *packets_stats = &((struct adv_doca_rx_gpu_info *)pkt_gpu_list[blockIdx.x])[pkt_idx]; | ||
|
|
||
| // Warmup | ||
| if (eth_rxq_gpu == NULL) return; | ||
|
|
||
| if (threadIdx.x == 0) { | ||
| if (threadIdx.x == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against null eth_rxq_gpu before dereferencing in RX kernels
Both RX kernels (receive_packets_kernel_persistent and _non_persistent) currently do:
struct doca_gpu_eth_rxq* rxq =
(struct doca_gpu_eth_rxq*)eth_rxq_gpu[blockIdx.x];
...
if (eth_rxq_gpu == NULL) return;The dereference happens before the null check. In rx_core, the warmup call to doca_receiver_packet_kernel passes eth_rxq_gpu as nullptr (guarded by a macro today), which would be undefined behavior if that path is ever enabled.
Please move the null check to the very top of each kernel, before any indexing, e.g.:
if (eth_rxq_gpu == nullptr || pkt_gpu_list == nullptr) return;
struct doca_gpu_eth_rxq* rxq =
(struct doca_gpu_eth_rxq*)eth_rxq_gpu[blockIdx.x];This will make the warmup path safe even if enabled later.
🧰 Tools
🪛 GitHub Actions: Check linting
[error] 91-91: Tab found; better to use spaces. [whitespace/tab]
[error] 94-94: Lines should be <= 100 characters long. [whitespace/line_length]
[error] 94-94: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after. [runtime/casting]
🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.cu
around lines 79-99, the kernel dereferences eth_rxq_gpu and pkt_gpu_list before
checking for NULL which is undefined behavior; move the null checks to the very
beginning of the kernel (before any use of eth_rxq_gpu, pkt_gpu_list, or other
pointer parameters) and return early if either is NULL, then perform the
blockIdx.x indexing to obtain rxq and packets_stats; apply the same change to
the corresponding non-persistent RX kernel so both warmup paths are safe.
| #include <cuda.h> | ||
| #include <cuda_runtime_api.h> | ||
| #include <cstdint> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop <cuda.h> to fix clang header error and rely on <cuda_runtime_api.h>
The header only needs cudaStream_t; <cuda_runtime_api.h> already provides it. Including <cuda.h> is causing the reported clang diagnostic (“'cuda.h' file not found”) when this header is parsed without CUDA SDK available. Removing <cuda.h> here (and relying on the existing CUDA headers in the .cu translation unit) should unblock static analysis without changing behavior.
Also, the SPDX header still lists only 2023; please add the current year (2025) to satisfy the compliance check.
🧰 Tools
🪛 Clang (14.0.6)
[error] 19-19: 'cuda.h' file not found
(clang-diagnostic-error)
🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_kernels.h
around lines 19 to 21, remove the #include <cuda.h> line so the header only
includes <cuda_runtime_api.h> and <cstdint> (cudaStream_t is provided by
cuda_runtime_api.h) to avoid clang "'cuda.h' file not found" diagnostics when
parsing without the CUDA SDK; additionally update the SPDX header year range to
include 2025 (e.g., change 2023 to 2023-2025) to satisfy the compliance check.
| /* | ||
| * Retrieve host page size | ||
| * | ||
| * @return: host page size | ||
| */ | ||
| static size_t get_host_page_size(void) | ||
| { | ||
| long ret = sysconf(_SC_PAGESIZE); | ||
| if (ret == -1) | ||
| return 4096; // 4KB, default Linux page size | ||
|
|
||
| return (size_t)ret; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include <unistd.h> for sysconf / _SC_PAGESIZE
get_host_page_size() calls sysconf(_SC_PAGESIZE), but this file doesn’t include <unistd.h>, where both sysconf and _SC_PAGESIZE are declared. This will generally fail to compile or at least emit warnings.
Add:
#include <unistd.h>near the top of the file.
🧰 Tools
🪛 GitHub Actions: Check linting
[error] 37-37: { should almost always be at the end of the previous line [whitespace/braces]
🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp
around lines 31 to 43, the function get_host_page_size uses sysconf and
_SC_PAGESIZE but the file doesn't include <unistd.h>; add an #include for
<unistd.h> at the top of the file (with other system includes) so sysconf and
_SC_PAGESIZE are declared and the code compiles cleanly.
| doca_error_t DocaRxQueue::create_rx_packet_list() { | ||
| doca_error_t result; | ||
|
|
||
| printf("Alloc packet list for Rxq\n"); | ||
|
|
||
| result = doca_gpu_mem_alloc(gdev, MAX_DEFAULT_SEM_X_QUEUE * sizeof(struct adv_doca_rx_gpu_info), GPU_PAGE_SIZE, DOCA_GPU_MEM_TYPE_CPU_GPU, (void**)&pkt_list_gpu, (void**)&pkt_list_cpu); | ||
| if (result != DOCA_SUCCESS) { | ||
| HOLOSCAN_LOG_ERROR("Failed allocate packet list memory: {}", doca_error_get_descr(result)); | ||
| return DOCA_ERROR_BAD_STATE; | ||
| } | ||
|
|
||
| return result; | ||
| } | ||
|
|
||
| doca_error_t DocaRxQueue::destroy_rx_packet_list() { | ||
| doca_error_t result; | ||
|
|
||
| result = doca_gpu_mem_free(gdev, pkt_list_gpu); | ||
| if (result != DOCA_SUCCESS) { | ||
| HOLOSCAN_LOG_ERROR("Failed destroy packet list memory: {}", doca_error_get_descr(result)); | ||
| return DOCA_ERROR_BAD_STATE; | ||
| } | ||
|
|
||
| return result; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize and free RX packet-list properly
create_rx_packet_list():
result = doca_gpu_mem_alloc(
gdev,
MAX_DEFAULT_SEM_X_QUEUE * sizeof(struct adv_doca_rx_gpu_info),
GPU_PAGE_SIZE,
DOCA_GPU_MEM_TYPE_CPU_GPU,
(void**)&pkt_list_gpu,
(void**)&pkt_list_cpu);allocates the packet-list ring but:
- Does not initialize any fields, including
status, which the CPU later polls forREADY. This should be cleared (e.g.,memset(pkt_list_cpu, 0, ...)or an explicit loop settingstatus = DOCA_GPU_SEMAPHORE_STATUS_FREE). - Is not paired with a call to
destroy_rx_packet_list()in theDocaRxQueuedestructor, so the allocated memory will leak unless destroyed elsewhere.
destroy_rx_packet_list() correctly calls doca_gpu_mem_free(gdev, pkt_list_gpu) but is currently unused.
Please:
- Zero‑initialize the packet-list after allocation.
- Ensure
destroy_rx_packet_list()is invoked from theDocaRxQueuedestructor or from the owning manager’s shutdown path.
🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr_obj.cpp
around lines 303-327, the RX packet-list allocated in create_rx_packet_list() is
not initialized and the destroy_rx_packet_list() is not invoked from the
DocaRxQueue destructor, causing uninitialized-status reads and a leak; after
successful doca_gpu_mem_alloc() zero-initialize the CPU-side array (e.g.,
memset(pkt_list_cpu, 0, MAX_DEFAULT_SEM_X_QUEUE * sizeof(struct
adv_doca_rx_gpu_info)) or loop and set each entry’s status =
DOCA_GPU_SEMAPHORE_STATUS_FREE) and then ensure destroy_rx_packet_list() is
called from DocaRxQueue::~DocaRxQueue() (or the owning manager shutdown) to free
pkt_list_gpu via doca_gpu_mem_free().
| cudaSetDevice(gpu_dev); | ||
| cudaFree(0); | ||
| doca_ret = doca_gpu_create(gpu_bdf, &gdev[gpu_dev]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Check cudaSetDevice return value
cudaSetDevice(gpu_dev); is currently unchecked. If it fails (e.g., invalid index or CUDA not initialized), the subsequent cudaFree(0); and DOCA GPU create will behave unpredictably.
Recommend capturing and checking the return code, logging and aborting initialization on failure.
🏁 Script executed:
#!/bin/bash
# Locate the file and examine code around lines 569-571
fd -t f "adv_network_doca_mgr.cpp" --exec cat {} \;Length of output: 61814
🏁 Script executed:
#!/bin/bash
# Search for all cudaSetDevice usage in the codebase
rg -n "cudaSetDevice" -A2 -B2 --type=cppLength of output: 2642
🏁 Script executed:
#!/bin/bash
# Check for cuda error handling patterns in the repository
rg -n "cudaError\|cudaGetErrorString\|CUDA_CHECK\|checkCudaErrors" --type=cpp | head -20Length of output: 0
**Check `cudaSetDevice` return value**
The function `cudaSetDevice` returns `cudaError_t` and should be checked for errors. It can return `cudaErrorInitializationError`, `cudaErrorInsufficientDriver`, or `cudaErrorNoDevice`, and as of CUDA 12.0, it is very important to check the return value of `cudaSetDevice()` for initialization errors. Unchecked failures will cause subsequent CUDA operations to behave unpredictably or operate on the wrong device.
Capture and check the return code at lines 569, 1216, and 1474. Log errors and abort initialization on failure. Note: this pattern also appears in `operators/advanced_network/advanced_network/manager.cpp` at line 142.
---
🌐 Web query:
```web
CUDA cudaSetDevice return value error handling
💡 Result:
Short answer: cudaSetDevice returns a cudaError_t and must be checked like any CUDA Runtime API call. Possible documented return values include cudaSuccess, cudaErrorInvalidDevice and cudaErrorSetOnActiveProcess (and it may also surface earlier asynchronous errors). Use cudaGetErrorString() to get a human-readable message. [1][2]
Minimal example (C/C++):
int dev = 0;
cudaError_t err = cudaSetDevice(dev);
if (err != cudaSuccess) {
fprintf(stderr, "cudaSetDevice(%d) failed: %s\n", dev, cudaGetErrorString(err));
/* handle/fail/cleanup */
}
Notes:
- cudaErrorInvalidDevice means the device ID is invalid.
- cudaErrorSetOnActiveProcess means the runtime/driver context is already active on the host thread and you cannot set the device then.
- Many runtime calls can also return errors from prior asynchronous operations; checking and handling errors early helps diagnose failures. [1][2]
Sources:
[1] NVIDIA cudaSetDevice documentation. [2] NVIDIA guide on handling CUDA runtime errors.
Check cudaSetDevice return value at lines 569, 1216, and 1474
cudaSetDevice returns cudaError_t and must be checked. Possible documented return values include cudaSuccess, cudaErrorInvalidDevice and cudaErrorSetOnActiveProcess, and unchecked failures will cause subsequent CUDA operations to behave unpredictably.
Capture and check the return code at each call site. Use cudaGetErrorString() to get a human-readable message and abort initialization on error. This pattern also appears in operators/advanced_network/advanced_network/manager.cpp at line 142.
🤖 Prompt for AI Agents
In
operators/advanced_network/advanced_network/managers/gpunetio/adv_network_doca_mgr.cpp
around lines 569, 1216, and 1474, the calls to cudaSetDevice are currently
unchecked; capture the cudaError_t return for each call, test for cudaSuccess,
and on error log the device number and the human-readable string from
cudaGetErrorString(return_code) then abort initialization (return error / exit)
so subsequent CUDA calls are not executed on a failed device set; apply the same
pattern used in operators/advanced_network/advanced_network/manager.cpp line
142.
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca | ||
| host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | ||
| && apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | ||
| && apt-get update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Dockerfile syntax error in wget command — line continuation is malformed.
The URL on line 107 is incomplete and continues incorrectly on line 108. The string "doca" at the end of line 107 lacks a continuation character (\), causing Hadolint to fail with "unexpected 'h'" on line 108.
Reconstructing the intended command, the URL should be fully specified on a single logical line using proper shell continuation:
-RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca
-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
+RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \Additionally, consider adding error handling and cleanup to the wget command to match Docker best practices and the pattern used elsewhere in the file:
-RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
+RUN wget -q https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \
&& apt-get update \
- && apt-get update
+ && rm -f ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca | |
| host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | |
| && apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | |
| && apt-get update | |
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | |
| && apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | |
| && apt-get update |
🧰 Tools
🪛 Hadolint (2.14.0)
[error] 108-108: unexpected 'h'
expecting '#', '', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input
(DL1000)
| && curl -fsSL ${DOCA_REPO_LINK}/GPG-KEY-Mellanox.pub | gpg --dearmor | tee ${LOCAL_GPG_KEY_PATH} > /dev/null \ | ||
| && echo "deb [signed-by=${LOCAL_GPG_KEY_PATH}] ${DOCA_REPO_LINK} ./" | tee /etc/apt/sources.list.d/mellanox.list | ||
|
|
||
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this use the version variable from above?
|
|
||
| RUN wget https://urm.nvidia.com/artifactory/sw-nbu-int-generic-local/doca-repo-3.2.0/doca-repo-3.2.0-122000/doca | ||
| host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ | ||
| && apt-get install -y ./doca-host_3.2.0-122000-25.10-ubuntu2204_amd64.deb \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing on version variable
| * applications needs to exit | ||
| */ | ||
| printf("Receive UDP kernel error %d rxpkts %d error %d\n", ret, rx_pkt_num, ret); | ||
| printf("Receive UDP kernel error %d rxpkts %d error %d\n", ret, out_pkt_num, ret); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a printf slows the whole kernel down even if it's not used. Probably should wrap in a macro.
| } | ||
|
|
||
| create_semaphore(); | ||
| // create_semaphore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did the semaphores go?
| } | ||
|
|
||
|
|
||
| #if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that by removing the semaphores there's no longer synchronization with the CPU, so now the example assumes the CPU has no idea when packets are received and when to launch kernels?
| } | ||
|
|
||
| create_semaphore(); | ||
| // create_semaphore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not needed, remove?
| #include <cuda.h> | ||
| #include <cuda_runtime_api.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned there were no cuda dependencies?
| # CUDA_SEPARABLE_COMPILATION ON | ||
| # CUDA_RESOLVE_DEVICE_SYMBOLS ON | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
| # CUDA_SEPARABLE_COMPILATION ON | ||
| # CUDA_RESOLVE_DEVICE_SYMBOLS ON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
|
|
||
| WORKDIR /opt | ||
|
|
||
| RUN apt-get update && apt-get install -y build-essential wget |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use curl for licensing reasons
| libcurand-dev-12-6 \ | ||
| ninja-build \ | ||
| pkgconf \ | ||
| mlnx-dpdk \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not needed since it already has -dev below
| mlnx-dpdk \ | ||
| mlnx-dpdk-dev \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cliffburdick will this create conflicts when using upstream dpdk? Can doca use dpdk from upstream if available instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mlnx version installs in /opt so it shouldn't have any issues
| doca-sdk-gpunetio \ | ||
| libdoca-sdk-gpunetio-dev \ | ||
| doca-sdk-eth \ | ||
| libdoca-sdk-eth-dev \ | ||
| doca-sdk-flow \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the -dev don't depend on the non dev packages?
Summary by CodeRabbit
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.