feat: Transition to FullyContiguous Host and Disk layouts #3090

oandreeva-nv · 2025-09-17T17:12:19Z

Overview:

Fixes DIS-546

Summary by CodeRabbit

New Features
- Configure memory layout per device, host, and disk via Python and builder APIs.
- Inter-layout transfers supported (contiguous ↔ layer-separated) with validation and debug assertions.
- Layout verification utilities and reports, including address/size checks and optional checksums.
Tests
- Extensive coverage for inter-layout copies (CPU/GPU), alignment scenarios, and GDS-compatible disk operations.

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Configure memory layout per device/host/disk from Python (choose FullyContiguous or LayerSeparate) when creating workers or registering KV caches.
- Automatic device layout detection when unspecified, with safe defaults.
- Seamless data transfers between contiguous and layer-separated layouts across host, device, and disk.
- New layout verification utilities to validate region addresses, sizes, and compatibility, providing clearer diagnostics for misconfigurations.

copy-pr-bot · 2025-09-17T17:12:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

lib/llm/src/block_manager/block/transfer/cuda.rs

lib/llm/src/block_manager/block/transfer/nixl.rs

coderabbitai · 2025-09-17T17:50:21Z

Walkthrough

Introduces per-edge layout types (device/host/disk) across workers and Python bindings; replaces a boolean layout flag with explicit LayoutType. Adds auto-detection for device layout during KV-cache registration. Implements inter-layout CUDA copy paths (FullyContiguous ↔ LayerSeparate). Expands layout utilities with verification APIs and worker-side verifiers. Adds extensive tests, minor Dockerfile newline.

Changes

Cohort / File(s)	Summary
Python bindings: distributed worker + re-exports `lib/bindings/python/rust/llm/block_manager/distributed/worker.rs`, `lib/bindings/python/rust/llm/block_manager/distributed.rs`	Adds PyLayoutType enum and conversions to LayoutType; extends KvbmWorker::new and PyO3 signatures with optional device/host/disk layout params; re-exports PyLayoutType.
vLLM connector workers `lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs`, `lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs`	Extends register_kv_caches with optional per-edge LayoutType; adds device layout auto-detection from first tensor; replaces is_fully_contiguous_layout with device/host/disk layout setters; updates Python bindings.
Distributed worker core `lib/llm/src/block_manager/distributed/worker.rs`	Replaces boolean layout flag with device_layout_type, host_layout_type, disk_layout_type; infers device layout based on selected LayoutType; applies host/disk layouts independently.
Layout core and utilities `lib/llm/src/block_manager/layout.rs`, `lib/llm/src/block_manager/layout/utils.rs`	Adjusts FullyContiguous region sizing; adds public verification APIs (verify_memory_regions, expected_memory_address, etc.) for both layouts; introduces LayerSeparateConfig (crate-private); exposes public utils and worker_verification module with WorkerLayoutVerifier, results/stats, and compatibility checks; refactors alignment helpers.
CUDA transfer paths `lib/llm/src/block_manager/block/transfer/cuda.rs`	Adds copy paths for FullyContiguous↔LayerSeparate transfers with assertions; retains FC↔FC memcpy; introduces layout transfer tests.
Offload tests (GDS and cross-layout) `lib/llm/src/block_manager/offload.rs`	Updates existing test assertions; adds GDS-focused tests, helpers, and integrity checks across layouts; includes ignored and diagnostic tests.
Container `container/Dockerfile.vllm`	Appends trailing newline.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Py as Python Caller
  participant PyW as PyKvConnectorWorker
  participant Conn as KvConnectorWorker
  participant BM as KvbmWorkerConfig Builder
  participant L as LayoutType

  Py->>PyW: register_kv_caches(..., device_layout_type?, host_layout_type?, disk_layout_type?)
  Note right of PyW: Map PyLayoutType -> LayoutType (optional)
  PyW->>Conn: register_kv_caches(..., dev?, host?, disk?)

  alt device_layout_type provided
    Conn->>BM: device_layout_type(dev)
  else auto-detect
    Conn->>Conn: capture first tensor shape
    Conn->>L: layer_separate_auto(shape, num_device_blocks) or default
    Conn->>BM: device_layout_type(detected or default)
  end

  alt host_layout_type provided
    Conn->>BM: host_layout_type(host)
  else
    Conn->>BM: host_layout_type(FullyContiguous)
  end

  alt disk_layout_type provided
    Conn->>BM: disk_layout_type(disk)
  else
    Conn->>BM: disk_layout_type(FullyContiguous)
  end

  BM-->>Py: configured worker ready

sequenceDiagram
  autonumber
  participant Src as Source Layout
  participant Cpy as copy_block (CUDA)
  participant Dst as Destination Layout

  Src->>Cpy: request copy (S_layout, D_layout)
  alt FC -> FC
    Cpy->>Dst: single memcpy
  else FC -> LayerSeparate
    loop layers x outer
      Cpy->>Dst: scatter copy chunks into per-layer regions
    end
    Note right of Cpy: assert src_offset == src_size
  else LayerSeparate -> FC
    loop layers x outer
      Cpy->>Dst: gather copy chunks into contiguous buffer
    end
    Note right of Cpy: assert dst_offset == dst_size
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: kvbm + connector #2258 — Introduces and wires initial layout-type handling in distributed and vLLM connector workers; this PR extends and refines that pathway with per-edge types and auto-detection.

Poem

Hop hop, I lay out the lanes,
Pages and layers in tidy chains.
Scatter, gather—whoosh on GPU,
Host and disk get a layout, too.
I thump with joy on verified ground,
Checksums crisp, no bytes unsound.
Carrots aligned at 4K—profound! 🥕

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The description includes the required template headings but leaves the Details and Where should the reviewer start sections empty and retains placeholder text in Related Issues, while the Overview only states “Fixes DIS-546” without describing the changes.	Please expand the Overview to summarize the PR’s purpose and key changes, populate the Details section with a description of what was modified, specify which files or modules reviewers should focus on, and update Related Issues with the actual issue number.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title concisely highlights a real aspect of the changeset by focusing on transitioning host and disk layouts to FullyContiguous, though it omits mention of device layout configuration and inter-layout transfer support; it remains specific and not misleading.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c5e2be2 and 953f70b.

📒 Files selected for processing (5)

container/Dockerfile.vllm (1 hunks)
lib/bindings/python/rust/llm/block_manager/distributed.rs (1 hunks)
lib/bindings/python/rust/llm/block_manager/distributed/worker.rs (4 hunks)
lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (2 hunks)
lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (9 hunks)

✅ Files skipped from review due to trivial changes (1)

container/Dockerfile.vllm

🔇 Additional comments (14)

lib/bindings/python/rust/llm/block_manager/distributed.rs (1)

12-12: LGTM! Clean re-export of PyLayoutType.

The addition of PyLayoutType to the public re-export list aligns with the PR's goal to expose layout configuration to Python users.

lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (2)

25-25: LGTM! Necessary import for layout type specification.

The LayoutType import enables the use of specific layout types in the worker configuration.

138-140: LGTM! Explicit layout type configuration replaces boolean flag.

The transition from the boolean is_fully_contiguous_layout flag to per-component layout configuration provides more granular control and is consistently applied across device, host, and disk layouts. The use of LayoutType::FullyContiguous maintains the existing behavior.

lib/bindings/python/rust/llm/block_manager/distributed/worker.rs (4)

14-14: LGTM! Necessary import for layout type conversion.

The LayoutType import is required for the From<PyLayoutType> conversion implementation.

16-49: Well-designed PyLayoutType enum with proper Python integration.

The implementation includes:

Clear enum variants (FullyContiguous, LayerSeparate)

Proper PyO3 attributes for Python bindings (#[pyclass(eq, eq_int)])

Standard Python protocols (__str__, __repr__)

Clean conversion logic to Rust's LayoutType

The use of layer_separate_auto_default() for the LayerSeparate variant with auto-detection comment is appropriate.

146-157: LGTM! Well-structured Python binding signature.

The PyO3 signature follows best practices with optional parameters defaulting to None, maintaining backward compatibility while extending functionality.

184-194: LGTM! Clean layout type configuration with proper defaults.

The implementation:

Uses safe Option::map() with Into conversion

Provides sensible defaults (LayoutType::FullyContiguous)

Maintains consistent patterns across all three layout types

lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (7)

26-27: LGTM! Necessary imports for layout functionality.

Both LayoutType and PyLayoutType imports are required for the layout auto-detection and Python binding functionality.

38-41: LGTM! Clean trait extension for layout parameters.

The addition of optional layout type parameters to the Worker trait maintains backward compatibility while enabling per-component layout configuration.

158-167: LGTM! Shape capture for layout auto-detection.

The first tensor shape capture is well-implemented:

Uses Option<Vec<usize>> for safe handling

Captures only the first tensor shape as needed for detection

Clear variable naming and comments

177-197: Excellent auto-detection implementation with proper error handling.

The layout auto-detection logic is robust:

Attempts auto-detection only when device layout type is not provided

Uses proper error handling with fallback to default

Includes informative tracing for both success and failure cases

Handles the edge case of no tensors gracefully

208-210: LGTM! Consistent layout type application.

The configuration properly uses the detected device layout type and applies sensible defaults for host and disk layouts.

459-471: LGTM! Proper PyO3 signature with layout parameters.

The Python binding signature follows PyO3 best practices with optional parameters, maintaining backward compatibility while extending functionality.

487-489: LGTM! Clean Python to Rust layout type conversion.

The conversion uses safe Option::map() with the implemented Into trait, properly handling optional parameters from Python.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)

lib/llm/src/block_manager/block/data/local.rs (1)

123-130: Guard against overflow when computing fully-contiguous block size

Multiplying mr.size() by num_layers() and num_outer_dims() can overflow usize and won’t be caught. Use checked_mul and surface a BlockError instead of silently wrapping.

Apply this diff:
-            let size = mr.size() * self.num_layers() * self.num_outer_dims();
+            let per_region = mr.size();
+            let size = per_region
+                .checked_mul(self.num_layers())
+                .and_then(|v| v.checked_mul(self.num_outer_dims()))
+                .ok_or_else(|| BlockError::InvalidState("block size overflow".to_string()))?;

lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (2)

353-356: Don’t panic in runtime hot path when a slot is missing

Panic will take down the process. Log and continue (or convert to an error return higher up) to keep the worker resilient.

Apply this diff:

-            } else {
-                // made this condition more strict slot existence checks were added as a prerequesite
-                // to be added to the maybe_finished_offloading set.
-                panic!("request slot missing for {request_id}; however, it was present when added to the maybe finished offloading set");
-            }
+            } else {
+                // stricter existence checks are in place, but avoid crashing the process
+                tracing::error!(
+                    request_id,
+                    "request slot missing; was present when added to maybe_finished_offloading"
+                );
+                continue;
+            }

382-383: Avoid panic in onboarding check as well

Mirror the non-panicking handling here.

Apply this diff:

-            } else {
-                panic!("request slot missing for {request_id}; however, it was present when added to the maybe finished onboarding set");
-            }
+            } else {
+                tracing::error!(
+                    request_id,
+                    "request slot missing; was present when added to maybe_finished_onboarding"
+                );
+                continue;
+            }

lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (2)

311-323: Avoid panic on offloading slot miss

Same resilience concern as the vLLM connector; don’t panic in production paths.

Apply this diff:

-            } else {
-                // made this condition more strict slot existence checks were added as a prerequesite
-                // to be added to the maybe_finished_offloading set.
-                panic!("request slot missing for {request_id}; however, it was present when added to the maybe finished offloading set");
-            }
+            } else {
+                tracing::error!(
+                    request_id,
+                    "request slot missing; was present when added to maybe_finished_offloading"
+                );
+                continue;
+            }

349-350: Avoid panic on onboarding slot miss

Mirror non-panicking handling here.

Apply this diff:

-            } else {
-                panic!("request slot missing for {request_id}; however, it was present when added to the maybe finished onboarding set");
-            }
+            } else {
+                tracing::error!(
+                    request_id,
+                    "request slot missing; was present when added to maybe_finished_onboarding"
+                );
+                continue;
+            }

lib/llm/src/block_manager/distributed/transfer.rs (1)

104-115: Bounds-check indices from the request before indexing into pools.

source_pool_list[idx] and target_pool_list[idx] may panic if a request carries out-of-range indices. Validate and error out.

+        // Validate indices before indexing
+        let src_len = source_pool_list.len();
+        let tgt_len = target_pool_list.len();
+        for (from, to) in request.blocks() {
+            if *from >= src_len || *to >= tgt_len {
+                return Err(anyhow::anyhow!(
+                    "Transfer index out of bounds: from={}, to={}, src_len={}, tgt_len={}",
+                    from, to, src_len, tgt_len
+                ));
+            }
+        }

lib/llm/src/block_manager/distributed/worker.rs (1)

581-589: Avoid unwraps in production path (stream creation).

Propagate errors instead of panicking:

-        let transfer_context = Arc::new(TransferContext::new(
-            Arc::new(Some(agent)),
-            DeviceAllocator::new(config.device_id)
-                .unwrap()
-                .ctx()
-                .new_stream()
-                .unwrap(),
+        let stream = DeviceAllocator::new(config.device_id)?
+            .ctx()
+            .new_stream()?;
+        let transfer_context = Arc::new(TransferContext::new(
+            Arc::new(Some(agent)),
+            stream,
             Handle::current(),
         ));

🧹 Nitpick comments (17)

lib/llm/src/block_manager/block/transfer/nixl.rs (1)

55-56: Early return is fine, but keep assertions in debug

The explicit Ok(()) is fine; ensure any preconditions remain covered by debug assertions elsewhere.
lib/llm/src/block_manager/block/transfer/cuda.rs (3)
246-253: Incorrect debug message for overlap check

H2D path message says “D2D copy”; adjust to avoid confusion or remove overlap check entirely for H2D.

Apply this diff:
-        "Source and destination device memory regions must not overlap for D2D copy"
+        "Source and destination memory regions must not overlap"
270-276: Same nit: overlap message mentions D2D in D2H path

Mirror the wording fix here.

Apply this diff:
-        "Source and destination device memory regions must not overlap for D2D copy"
+        "Source and destination memory regions must not overlap"
309-739: Large CUDA tests: keep but ensure they don’t run by default

The tests are cfg(test, feature = "testing-cuda"); that’s good. Consider marking the perf test with #[ignore] to avoid accidental long runs in CI tiers without the feature.
lib/llm/src/block_manager/distributed/transfer.rs (1)

21-23: Remove unused import.

layout::BlockLayoutConfig isn’t used here. Drop it to keep warnings clean.
lib/llm/src/block_manager/distributed/worker.rs (1)
217-219: Use checked math for bytes_per_block to avoid usize overflow.

Guard against large shapes:
-        let bytes_per_block =
-            num_layers * outer_dim * config.page_size * inner_dim * config.dtype_width_bytes;
+        let bytes_per_block = num_layers
+            .checked_mul(outer_dim).and_then(|v| v.checked_mul(config.page_size))
+            .and_then(|v| v.checked_mul(inner_dim)).and_then(|v| v.checked_mul(config.dtype_width_bytes))
+            .ok_or_else(|| anyhow::anyhow!("bytes_per_block overflow"))?;
lib/bindings/python/rust/llm/block_manager/distributed/worker.rs (1)
171-181: Minor: simplify Option mapping and let rustfmt reflow.

Use map(Into::into) to reduce verbosity.
-            .device_layout_type(device_layout_type.map(|py_layout| py_layout.into()).unwrap_or(LayoutType::FullyContiguous))
+            .device_layout_type(device_layout_type.map(Into::into).unwrap_or(LayoutType::FullyContiguous))
             .host_layout_type(
-                host_layout_type
-                    .map(|py_layout| py_layout.into())
-                    .unwrap_or(LayoutType::FullyContiguous)
+                host_layout_type.map(Into::into).unwrap_or(LayoutType::FullyContiguous)
             )
             .disk_layout_type(
-                disk_layout_type
-                    .map(|py_layout| py_layout.into())
-                    .unwrap_or(LayoutType::FullyContiguous)
+                disk_layout_type.map(Into::into).unwrap_or(LayoutType::FullyContiguous)
             )
lib/llm/src/block_manager/layout.rs (3)

338-345: Confirm semantics: memory_region_size now equals one page.

Setting memory_region_size = outer_dim_stride_in_bytes aligns LocalMemoryRegion::size to a single [page_size × inner_dim × dtype] region. This is consistent with region addressing used elsewhere. Ensure all callers (e.g., transfer validators) expect per-page size, not per-layer size.

861-943: LayerSeparate verification and alignment checks are solid.

Good coverage: region math, alignment, and capacity checks per-layer. Minor: consider early-returning on first mismatch to speed failing tests.

1621-1904: Comprehensive tests are valuable; minor nits.

The no-overlap/alignment/stride tests add strong regression guarantees. Consider parameterizing constants to reduce duplication across FC/LS tests.
lib/llm/src/block_manager/layout/utils.rs (7)
19-25: align_up: guard against overflow and enforce precondition in release builds

value + alignment - 1 can overflow; debug_assert! disappears in release. Use a checked add and a hard assert (or return Result) to make it safe.
 #[inline(always)]
 pub fn align_up(value: usize, alignment: usize) -> usize {
-    debug_assert!(alignment.is_power_of_two(), "Alignment must be a power of 2");
-    (value + alignment - 1) & !(alignment - 1)
+    assert!(alignment.is_power_of_two() && alignment > 0, "Alignment must be a power of 2");
+    let add = alignment - 1;
+    let sum = value.checked_add(add).expect("overflow in align_up");
+    sum & !add
 }
27-36: Don’t leak validator::ValidationError from low-level utils

This module otherwise uses LayoutError. Exposing validator::ValidationError here couples layers unnecessarily.
-/// Validates that the given value is a power of 2.
-pub fn validate_power_of_2(alignment: usize) -> Result<(), validator::ValidationError> {
-    if alignment.is_power_of_two() {
-        Ok(())
-    } else {
-        Err(validator::ValidationError::new(
-            "Alignment must be a power of 2",
-        ))
-    }
-}
+/// Validates that the given value is a power of 2.
+pub fn validate_power_of_2(alignment: usize) -> Result<(), LayoutError> {
+    if alignment.is_power_of_two() {
+        Ok(())
+    } else {
+        Err(LayoutError::InvalidConfig("Alignment must be a power of 2".into()))
+    }
+}
If external callers truly need validator::ValidationError, add a thin adapter outside this module.

167-190: Pre‑allocate results to avoid reallocations

We know total regions up front.
-            let mut results = Vec::new();
+            let mut results = Vec::with_capacity(layout.num_blocks() * layout.num_layers() * layout.outer_dim());
266-289: Stats should ignore “unknown” address state if you adopt tri‑state addr_matches

If you switch to Option<bool>, update counters to only include Some(false) as mismatches and Some(true) in successes.
-            if !result.addr_matches {
+            if matches!(result.addr_matches, Some(false)) {
                 self.stats.addr_mismatches += 1;
             }
 ...
-            if result.addr_matches && result.size_matches {
+            if result.size_matches && matches!(result.addr_matches, Some(true) | None) {
                 self.stats.successful_verifications += 1;
             }
297-300: Critical mismatches should include checksum when verify_data=true

Currently only size mismatches are considered critical. If data verification is opted in, checksum mismatches should elevate to critical as well.
-        pub fn has_critical_mismatches(&self) -> bool {
-            // Only check size mismatches since address verification is layout-specific
-            self.stats.size_mismatches > 0
-        }
+        pub fn has_critical_mismatches(&self) -> bool {
+            self.stats.size_mismatches > 0 || self.stats.checksum_mismatches > 0
+        }
303-328: Avoid intermediate allocations in report building

Use string literals with push_str.
-            report.push_str(&format!("Layout Verification Report\n"));
-            report.push_str(&format!("========================\n"));
+            report.push_str("Layout Verification Report\n");
+            report.push_str("========================\n");
(Apply similarly to the other lines.)

331-396: Compatibility should also check alignment; consider returning a rich report

Missing alignment check can green‑light incompatible GDS paths.

Returning bool hides the reason; a structured report improves UX.

Add alignment check:
         if source_config.dtype_width_bytes != dest_config.dtype_width_bytes {
             tracing::error!("Data type width mismatch: {} vs {}",
                 source_config.dtype_width_bytes, dest_config.dtype_width_bytes);
             return Ok(false);
         }
+
+        if source_config.alignment != dest_config.alignment {
+            tracing::error!("Alignment mismatch: {} vs {}",
+                source_config.alignment, dest_config.alignment);
+            return Ok(false);
+        }
Follow‑up (non‑blocking): return Result<(), Incompatibility> collecting all mismatches instead of the first-fail boolean.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67ff181 and c5e2be2.

📒 Files selected for processing (11)

lib/bindings/python/rust/llm/block_manager/distributed/worker.rs (4 hunks)
lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (2 hunks)
lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (2 hunks)
lib/llm/src/block_manager/block/data/local.rs (2 hunks)
lib/llm/src/block_manager/block/transfer/cuda.rs (2 hunks)
lib/llm/src/block_manager/block/transfer/nixl.rs (2 hunks)
lib/llm/src/block_manager/distributed/transfer.rs (3 hunks)
lib/llm/src/block_manager/distributed/worker.rs (4 hunks)
lib/llm/src/block_manager/layout.rs (6 hunks)
lib/llm/src/block_manager/layout/utils.rs (3 hunks)
lib/llm/src/block_manager/offload.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (8)

lib/llm/src/block_manager/offload.rs (4)

lib/llm/src/block_manager/layout/utils.rs (1)

verify_layout_compatibility (332-395)

lib/llm/src/block_manager/storage.rs (8)

storage_type (171-171)

storage_type (389-391)

storage_type (469-471)

storage_type (510-512)

size (177-177)

size (397-399)

size (477-479)

size (518-520)

lib/llm/src/block_manager/block/data.rs (4)

storage_type (26-26)

layer_view (49-58)

num_layers (32-32)

block_view (73-78)

lib/llm/src/block_manager/storage/disk.rs (3)

storage_type (123-125)

fd (90-92)

size (131-133)

lib/llm/src/block_manager/block/transfer/nixl.rs (3)

lib/llm/src/block_manager/layout.rs (1)

num_layers (235-237)

lib/llm/src/block_manager/block.rs (2)

num_layers (377-379)

size (1049-1051)

lib/llm/src/block_manager/storage/nixl.rs (7)

size (114-116)

size (281-283)

size (299-301)

size (323-325)

size (348-350)

size (373-375)

size (412-414)

lib/llm/src/block_manager/block/transfer/cuda.rs (4)

lib/llm/src/block_manager/block/data/local.rs (1)

num_layers (74-76)

lib/llm/src/block_manager/layout.rs (9)

num_layers (235-237)

storage (194-194)

storage (505-507)

storage (842-844)

config (206-206)

config (519-521)

config (804-806)

allocate (470-495)

allocate (764-796)

lib/llm/src/block_manager/block/data.rs (1)

num_layers (32-32)

lib/llm/src/block_manager/storage/cuda.rs (9)

size (226-228)

size (388-390)

default (284-288)

default (445-449)

ctx (460-462)

allocate (301-303)

allocate (466-468)

addr (222-224)

addr (384-386)

lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (1)

lib/llm/src/block_manager/block.rs (1)

layout (1242-1244)

lib/llm/src/block_manager/distributed/transfer.rs (3)

lib/llm/src/block_manager/block.rs (3)

layout (1242-1244)

num_layers (377-379)

size (1049-1051)

lib/llm/src/block_manager/block/data/local.rs (1)

num_layers (74-76)

lib/llm/src/block_manager/block/data.rs (1)

num_layers (32-32)

lib/llm/src/block_manager/distributed/worker.rs (1)

lib/llm/src/block_manager/layout.rs (11)

builder (296-298)

layout_type (191-191)

layout_type (501-503)

layout_type (836-840)

num_layers (235-237)

outer_dim (243-245)

inner_dim (253-255)

config (206-206)

config (519-521)

config (804-806)

page_size (248-250)

lib/llm/src/block_manager/layout.rs (2)

lib/llm/src/block_manager/storage.rs (8)

addr (174-174)

addr (393-395)

addr (473-475)

addr (514-516)

size (177-177)

size (397-399)

size (477-479)

size (518-520)

lib/llm/src/block_manager/storage/nixl.rs (8)

addr (277-279)

size (114-116)

size (281-283)

size (299-301)

size (323-325)

size (348-350)

size (373-375)

size (412-414)

lib/llm/src/block_manager/layout/utils.rs (2)

lib/llm/src/block_manager/block.rs (1)

layout (1242-1244)

lib/llm/src/block_manager/layout.rs (1)

outer_dim (243-245)

🪛 GitHub Actions: Rust pre-merge checks

lib/bindings/python/rust/llm/block_manager/distributed/worker.rs

[error] 10-10: Rustfmt: formatting issue detected by cargo fmt -- --check. Run 'cargo fmt' to fix.

[error] 28-28: Rustfmt: multi-line struct initialization formatting changed. Run 'cargo fmt'.

[error] 168-168: Rustfmt: break device_layout_type call into multi-line layout.

[error] 175-175: Rustfmt: adjust formatting around unwrap_or in device layout chain.

[error] 179-179: Rustfmt: adjust formatting in device layout chain (map/unwrap).

lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs

[error] 20-20: Rustfmt: import LayoutType inserted/moved; reformat to match style.

[error] 169-169: Rustfmt: break device_layout_type call into multi-line layout.

[error] 175-175: Rustfmt: trailing comma added and unwrap formatting adjusted.

[error] 179-179: Rustfmt: unwrap/map formatting adjusted in device layout chain.

[error] 26-26: Rustfmt: import LayoutType removed or reordered to satisfy formatting.

lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs

[error] 19-19: Rustfmt: import LayoutType inserted/moved; reformat to match style.

[error] 25-25: Rustfmt: import LayoutType adjusted to satisfy formatting.

🪛 GitHub Actions: NVIDIA Dynamo Github Validation

lib/llm/src/block_manager/block/transfer/nixl.rs

[error] 1-999: Rust formatting check failed. Run 'cargo fmt' to format code (cargo fmt -- --check).

lib/llm/src/block_manager/block/transfer/cuda.rs

[error] 1-999: Rust formatting check failed. Run 'cargo fmt' to format code (cargo fmt -- --check).

lib/llm/src/block_manager/distributed/transfer.rs

[error] 1-999: Rust formatting check failed. Run 'cargo fmt' to format code (cargo fmt -- --check).

lib/llm/src/block_manager/layout/utils.rs

[error] 1-999: Rust formatting check failed. Run 'cargo fmt' to format code (cargo fmt -- --check).

🔇 Additional comments (15)

lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (2)

26-26: Fix rustfmt import ordering

Sandbox formatting check failed (cargo fmt error: "Could not locate working directory: no /proc/self/exe available"); run cargo fmt locally and commit the resulting changes or reorder imports in lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (around line 26) to satisfy rustfmt.

169-176: Format chained builder calls (rustfmt)

Per-edge layout config is correct; format the chained builder calls to satisfy rustfmt/Clippy.
File: lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (lines 169–176)
Run locally: cargo fmt && cargo clippy -q --no-deps

lib/llm/src/block_manager/block/transfer/nixl.rs (1)

1-999: Run cargo fmt (repo-wide)

Formatter failures are blocking CI on this file; run cargo fmt --all and commit the changes. Automated verification failed in the sandbox with: "Could not locate working directory.: no /proc/self/exe available" — run locally or in CI and re-request verification.

lib/llm/src/block_manager/block/transfer/cuda.rs (1)

1-999: Run cargo fmt (repo-wide)

Formatter failures are blocking CI on this file — run cargo fmt --all and commit the resulting changes. Automated verification here failed (error: "Could not locate working directory: no /proc/self/exe available"); run locally or in CI to confirm formatting is fixed.

lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (1)

25-25: Fix rustfmt import ordering — run cargo fmt

rustfmt is failing for the added LayoutType import in lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_worker.rs (around line 25). Run cargo fmt locally and commit the formatted changes. (Automated attempt failed with: "Could not locate working directory: no /proc/self/exe available".)

lib/llm/src/block_manager/distributed/transfer.rs (1)

129-151: Run cargo fmt to fix formatting.

The pipeline is failing on rustfmt. Please run cargo fmt and commit the diff.

lib/bindings/python/rust/llm/block_manager/distributed/worker.rs (2)

26-36: Enum mapping LGTM.

PyLayoutType → LayoutType conversion is correct and clear.

1-1: Run cargo fmt.

Rustfmt errors are blocking. Please format this file.

lib/llm/src/block_manager/offload.rs (1)

1404-1408: Nice: index-based verification after bulk onboard.

The revised assertions compare each disk→device pair by index and check contents; this is more robust than counting blocks.

lib/llm/src/block_manager/layout.rs (2)

113-118: Public utils module exposure LGTM.

Making utils public and reusing it across verifiers is consistent with added verification APIs.

558-610: Verification APIs for FullyContiguous look good.

verify_memory_regions, expected_memory_address, and verify_memory_region are clean, use the shared verifier, and provide actionable logs.

lib/llm/src/block_manager/layout/utils.rs (4)

83-104: Bounds checks look good

Indexes validated against config with clear, specific errors. LGTM.

403-447: Nice: cross‑layout compatibility test

Good coverage of FC↔LS size compatibility without touching device memory.

448-476: Incompatible layouts test exercises the right surface

The differing blocks case is a good negative check.

105-477: Formatting check couldn't be executed in sandbox — run rustfmt locally and re-run formatting check

Sandbox error returned: "Unable to proceed. Could not locate working directory.: no /proc/self/exe available. Is /proc mounted?"

Run locally and re-run CI:
cargo fmt && cargo fmt -- --check

lib/llm/src/block_manager/block/data/local.rs

lib/llm/src/block_manager/block/transfer/cuda.rs

lib/llm/src/block_manager/block/transfer/nixl.rs

lib/llm/src/block_manager/layout/utils.rs

lib/llm/src/block_manager/offload.rs

lib/llm/src/block_manager/block/transfer/nixl.rs

lib/llm/src/block_manager/distributed/transfer.rs

lib/llm/src/block_manager/distributed/worker.rs

Signed-off-by: Olga Andreeva <[email protected]>

oandreeva-nv · 2025-09-26T22:39:34Z

@coderabbitai review

coderabbitai · 2025-09-26T22:39:40Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Signed-off-by: Olga Andreeva <[email protected]>

oandreeva-nv · 2025-09-26T22:59:21Z

/ok to test c1afc69

oandreeva-nv · 2025-09-26T23:16:28Z

/ok to test a8c411a

lib/bindings/python/rust/llm/block_manager/distributed/worker.rs

lib/llm/src/block_manager/offload.rs

lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs

Signed-off-by: Olga Andreeva <[email protected]>

oandreeva-nv · 2025-10-01T19:30:06Z

/ok to test 7066c88

oandreeva-nv · 2025-10-01T21:12:54Z

/ok to test b76ef8a

Signed-off-by: Olga Andreeva <[email protected]> Signed-off-by: Olga Andreeva <[email protected]> Co-authored-by: oandreeva-nv <[email protected]>

pull-request-size bot added the size/XXL label Sep 17, 2025

oandreeva-nv commented Sep 17, 2025

View reviewed changes

lib/llm/src/block_manager/block/transfer/cuda.rs Outdated Show resolved Hide resolved

oandreeva-nv commented Sep 17, 2025

View reviewed changes

lib/llm/src/block_manager/block/transfer/nixl.rs Outdated Show resolved Hide resolved

oandreeva-nv marked this pull request as ready for review September 17, 2025 17:36

oandreeva-nv requested a review from a team as a code owner September 17, 2025 17:36

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

oandreeva-nv changed the title ~~Transition to FullyContiguous Host and Disk layouts~~ feature: Transition to FullyContiguous Host and Disk layouts Sep 17, 2025

oandreeva-nv changed the title ~~feature: Transition to FullyContiguous Host and Disk layouts~~ feat: Transition to FullyContiguous Host and Disk layouts Sep 17, 2025

github-actions bot added the feat label Sep 17, 2025

richardhuo-nv reviewed Sep 17, 2025

View reviewed changes

lib/llm/src/block_manager/block/transfer/nixl.rs Outdated Show resolved Hide resolved

ziqifan617 reviewed Sep 18, 2025

View reviewed changes

oandreeva-nv force-pushed the oandreeva_fc_g2_g3 branch from bea4304 to bc4bc09 Compare September 26, 2025 22:18

oandreeva-nv requested review from a team as code owners September 26, 2025 22:18

oandreeva-nv and others added 12 commits September 26, 2025 15:25

FC layouts for host and disk

de16be8

Signed-off-by: Olga Andreeva <[email protected]>

docker related changes

bd04549

Signed-off-by: Olga Andreeva <[email protected]>

local and layout fixes

2fcd2af

Signed-off-by: Olga Andreeva <[email protected]>

tests

181eeb4

Signed-off-by: Olga Andreeva <[email protected]>

Restoring docker files

aeee6bd

Signed-off-by: Olga Andreeva <[email protected]>

Removing unnecessary if else

aa985b1

Signed-off-by: Olga Andreeva <[email protected]>

rebase on top of main

937fcb1

Signed-off-by: Olga Andreeva <[email protected]>

clean up

0834740

Signed-off-by: Olga Andreeva <[email protected]>

clean up

82f2cc5

Signed-off-by: Olga Andreeva <[email protected]>

exposing layout fully to python frontend

3e27022

Signed-off-by: Olga Andreeva <[email protected]>

adding auto detection for outer contiguous

2581083

Signed-off-by: Olga Andreeva <[email protected]>

Refined logic

953f70b

Signed-off-by: Olga Andreeva <[email protected]>

oandreeva-nv force-pushed the oandreeva_fc_g2_g3 branch from bc4bc09 to 953f70b Compare September 26, 2025 22:25

oandreeva-nv requested review from richardhuo-nv and ziqifan617 September 26, 2025 22:47

cargo fmt + cargo clippy

c1afc69

Signed-off-by: Olga Andreeva <[email protected]>

Merge branch 'main' into oandreeva_fc_g2_g3

a8c411a

nv-kmcgill53 reviewed Sep 26, 2025

View reviewed changes

lib/bindings/python/rust/llm/block_manager/distributed/worker.rs Show resolved Hide resolved

ziqifan617 reviewed Sep 30, 2025

View reviewed changes

lib/llm/src/block_manager/offload.rs Show resolved Hide resolved

ziqifan617 reviewed Sep 30, 2025

View reviewed changes

lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs Show resolved Hide resolved

ziqifan617 approved these changes Sep 30, 2025

View reviewed changes

Merge branch 'main' into oandreeva_fc_g2_g3

7066c88

Signed-off-by: Olga Andreeva <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 1, 2025 19:30 Inactive

Merge branch 'main' into oandreeva_fc_g2_g3

b76ef8a

copy-pr-bot bot temporarily deployed to GITLAB October 1, 2025 21:13 Inactive

oandreeva-nv merged commit d2e3b66 into main Oct 1, 2025
21 of 26 checks passed

oandreeva-nv deleted the oandreeva_fc_g2_g3 branch October 1, 2025 23:26

feat: Transition to FullyContiguous Host and Disk layouts #3090

feat: Transition to FullyContiguous Host and Disk layouts #3090

Uh oh!

Conversation

oandreeva-nv commented Sep 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oandreeva-nv commented Sep 26, 2025

Uh oh!

coderabbitai bot commented Sep 26, 2025

Uh oh!

oandreeva-nv commented Sep 26, 2025

Uh oh!

oandreeva-nv commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oandreeva-nv commented Oct 1, 2025

Uh oh!

oandreeva-nv commented Oct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

oandreeva-nv commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading