Skip to content

Conversation

@cdinea
Copy link

@cdinea cdinea commented Nov 21, 2025

cuslide2: GPU-Accelerated decoding via nvImageCodec v0.6.0

Overview

cucim.kit.cuslide2 plugin implementing GPU-accelerated whole-slide imaging (WSI) decoding using nvImageCodec v0.6.0. Replaces CPU-based decoders (libjpeg-turbo, OpenJPEG, libtiff) with GPU equivalents

Vendor Support:

  • ✅ Aperio SVS (JPEG, JPEG2000)
  • ✅ Philips TIFF (JPEG, multi-resolution pyramids)
  • ✅ Generic TIFF (all compression types)

Key Features

  • Direct ROI Decoding: nvImageCodec decodes regions without loading full tiles
  • GPU Memory Path: Direct CUDA allocation, no CPU→GPU copy overhead
  • Metadata Parsing: Enhanced Aperio/Philips XML metadata extraction with nvImageCodec v0.6.0 workarounds
  • Backward Compatible: Maintains same API as cucim.kit.cuslide

Build Instructions

Create Conda Environment:

# Create conda environment with all dependencies based on your CUDA system version - use all_cuda-130 for CUDA 13 and all_cuda-129 for CUDA 12
conda env create -n cucim -f ./conda/environments/all_cuda-130_arch-x86_64.yaml

# Activate environment
conda activate cucim

Build cuslide2 Plugin:

# Set CUDA compiler (uses conda's CUDA toolkit)
export CUDACXX=$CONDA_PREFIX/pkgs/cuda-toolkit/bin/nvcc

# Build libcucim + cuslide2 plugin
./run build_local all release $CONDA_PREFIX

Install Python Package:

# Install in editable mode for development
python -m pip install --editable python/cucim

Testing

Unit Tests (26 tests)

cd python/cucim
python -m pytest tests/unit/clara/test_tiff_read_region.py -v

Expected Output:

================================== test session starts ================================== platform linux -- Python 3.13.9, pytest-8.4.2, pluggy-1.6.0 -- /home/cdinea/miniconda3/envs/cucimcuda/bin/python cachedir: .pytest_cache rootdir: /home/cdinea/Downloads/cucim_pr2/branchremote/cucim/python/cucim configfile: pyproject.toml plugins: cov-7.0.0, xdist-3.8.0, lazy-fixtures-1.4.0 collected 26 items tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_inner[testimg_tiff_stripe_32x24_16_jpeg] PASSED [ 3%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_boundary[testimg_tiff_stripe_32x24_16_jpeg] PASSED [ 7%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_outside[testimg_tiff_stripe_32x24_16_jpeg] PASSED [ 11%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_inner[testimg_tiff_stripe_32x24_16_deflate] PASSED [ 15%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_boundary[testimg_tiff_stripe_32x24_16_deflate] PASSED [ 19%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_outside[testimg_tiff_stripe_32x24_16_deflate] PASSED [ 23%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_inner[testimg_tiff_stripe_32x24_16_raw] PASSED [ 26%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_boundary[testimg_tiff_stripe_32x24_16_raw] PASSED [ 30%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_outside[testimg_tiff_stripe_32x24_16_raw] PASSED [ 34%] tests/unit/clara/test_tiff_read_region.py::test_tiff_outside_of_resolution_level[testimg_tiff_stripe_4096x4096_256_jpeg] PASSED [ 38%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_multiresolution[testimg_tiff_stripe_4096x4096_256_jpeg] PASSED [ 42%] tests/unit/clara/test_tiff_read_region.py::test_region_image_level_data[testimg_tiff_stripe_4096x4096_256_jpeg] PASSED [ 46%] tests/unit/clara/test_tiff_read_region.py::test_region_image_dtype[testimg_tiff_stripe_4096x4096_256_jpeg] PASSED [ 50%] tests/unit/clara/test_tiff_read_region.py::test_tiff_iterator[testimg_tiff_stripe_4096x4096_256_jpeg] PASSED [ 53%] tests/unit/clara/test_tiff_read_region.py::test_tiff_outside_of_resolution_level[testimg_tiff_stripe_4096x4096_256_deflate] PASSED [ 57%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_multiresolution[testimg_tiff_stripe_4096x4096_256_deflate] PASSED [ 61%] tests/unit/clara/test_tiff_read_region.py::test_region_image_level_data[testimg_tiff_stripe_4096x4096_256_deflate] PASSED [ 65%] tests/unit/clara/test_tiff_read_region.py::test_region_image_dtype[testimg_tiff_stripe_4096x4096_256_deflate] PASSED [ 69%] tests/unit/clara/test_tiff_read_region.py::test_tiff_iterator[testimg_tiff_stripe_4096x4096_256_deflate] PASSED [ 73%] tests/unit/clara/test_tiff_read_region.py::test_tiff_outside_of_resolution_level[testimg_tiff_stripe_4096x4096_256_raw] PASSED [ 76%] tests/unit/clara/test_tiff_read_region.py::test_tiff_stripe_multiresolution[testimg_tiff_stripe_4096x4096_256_raw] PASSED [ 80%] tests/unit/clara/test_tiff_read_region.py::test_region_image_level_data[testimg_tiff_stripe_4096x4096_256_raw] PASSED [ 84%] tests/unit/clara/test_tiff_read_region.py::test_region_image_dtype[testimg_tiff_stripe_4096x4096_256_raw] PASSED [ 88%] tests/unit/clara/test_tiff_read_region.py::test_tiff_iterator[testimg_tiff_stripe_4096x4096_256_raw] PASSED [ 92%] tests/unit/clara/test_tiff_read_region.py::test_array_interface_support PASSED [ 96%] tests/unit/clara/test_tiff_read_region.py::test_cuda_array_interface_support PASSED [100%] ================================== 26 passed in 4.04s ===================================

Aperio SVS Validation

python scripts/test_aperio_svs.py --download

Tests:

  • Multi-level pyramid (3 levels expected)
  • GPU decode correctness (1.90× speedup on 2048×2048 tiles)
  • CPU/GPU output comparison
  • ImageDescription XML metadata parsing
  • Associated images (label, macro, thumbnail)

Key Results:

  • ✅ File loads in 0.380s
  • ✅ GPU decode: 0.0092s (2048×2048 tile)
  • ✅ CPU decode: 0.0174s (GPU→CPU copy)
  • ✅ 1.90× GPU speedup demonstrated

`📥 Downloading Aperio SVS test file...
✅ Test file already exists: /tmp/CMU-1-Small-Region.svs
✅ Plugin configuration: /tmp/.cucim_aperio_test.json
🔬 Testing cuslide2 plugin with Aperio SVS

📁 File: /tmp/CMU-1-Small-Region.svs
📂 Loading SVS file...
🔧 Creating IFD[0] from nvImageCodec metadata
Dimensions: 2220x2967, 3 channels, 8 bits/sample
Codec: jpeg (compression=7)
✅ IFD[0] initialization complete
🔧 Creating IFD[1] from nvImageCodec metadata
Dimensions: 574x768, 3 channels, 8 bits/sample
✅ IFD[1] initialization complete
🔧 Creating IFD[2] from nvImageCodec metadata
Dimensions: 387x463, 3 channels, 8 bits/sample
✅ IFD[2] initialization complete
🔧 Creating IFD[3] from nvImageCodec metadata
Dimensions: 1280x431, 3 channels, 8 bits/sample
✅ IFD[3] initialization complete
✅ Loaded in 0.380s
📊 Image Information:
Dimensions: [2967, 2220, 3]
Levels: 3
Device: cpu
🔍 Resolution Levels:
Level 0: 2220x2967 (downsample: 1.0x)
Level 1: 1280x431 (downsample: 4.3x)
Level 2: 387x463 (downsample: 6.1x)
🚀 Testing GPU decode (nvImageCodec)...
✅ GPU decode successful!
Time: 0.5288s
Shape: [512, 512, 3]
🖥️ Testing CPU decode (baseline)...
✅ CPU decode successful!
Time: 0.0029s
📏 Testing larger tile (2048x2048)...
GPU: 0.0092s
CPU: 0.0174s
🎯 Speedup: 1.90x
✅ Test completed successfully!
`

Philips TIFF Validation

python scripts/test_philips_tiff.py /tmp/Philips-1.tiff

Tests:

  • Multi-level pyramid (8 levels expected)
  • GPU decode correctness
  • Philips XML metadata (DPUfsImport format)
  • Associated images (label, macro, thumbnail)
  • DICOM_PIXEL_SPACING extraction

Key Results:

  • ✅ 8 resolution levels detected
  • ✅ 22 metadata entries extracted
  • ✅ DICOM pixel spacing: 0.2269 × 0.2269 μm/pixel
  • ✅ Multi-level reads: <0.001s per level

`✅ Plugin configuration: /tmp/.cucim_philips_test.json

🔬 Testing Philips TIFF with cuslide2

📁 File: /tmp/Philips-1.tiff
📂 Loading Philips TIFF file...
🔧 Creating IFD[0] from nvImageCodec metadata
Dimensions: 45056x35840, 3 channels, 8 bits/sample
✅ IFD[0] initialization complete
[... 7 more IFDs ...]
✅ Loaded in 0.379s
📊 Image Information:
Format: Philips TIFF
Dimensions: [35840, 45056, 3]
Levels: 8
🔍 Resolution Levels:
Level 0: 45056x35840 (downsample: 1.0x)
Level 1: 22528x17920 (downsample: 2.0x)
Level 2: 11264x8960 (downsample: 4.0x)
Level 3: 5632x4480 (downsample: 8.0x)
Level 4: 2816x2240 (downsample: 16.0x)
Level 5: 1408x1120 (downsample: 32.0x)
Level 6: 704x560 (downsample: 64.0x)
Level 7: 352x280 (downsample: 128.0x)
📋 Philips Metadata:
✅ Found 22 Philips metadata entries
DICOM_PIXEL_SPACING: [0.000226891, 0.000226907]
DICOM_MANUFACTURER: Hamamatsu
PIM_DP_IMAGE_TYPE: WSI
... and 16 more entries
📏 Pixel Spacing:
DICOM Pixel Spacing: 0.2269 x 0.2269 μm/pixel
🚀 Testing GPU decode (nvImageCodec)...
✅ GPU decode successful!
Time: 0.5250s
Shape: [512, 512, 3]
🖥️ Testing CPU decode...
✅ CPU decode successful:
Time: 0.0014s
Pixel sum: 189181125, mean: 240.56
📏 Testing larger tile (2048x2048)...
✅ GPU: 0.0168s
🔀 Testing multi-level reads...
✅ Level 0: 0.0010s ([512, 512, 3])
✅ Level 1: 0.0009s ([512, 512, 3])
✅ Level 2: 0.0007s ([512, 512, 3])
✅ Philips TIFF test completed!
`

Technical Implementation

Decoding Pipeline:

  1. Parse TIFF: NvImageCodecTiffParser extracts IFD metadata using nvimgcodecCodeStreamCreateFromFile
  2. Create Sub-Streams: Per-IFD code streams via nvimgcodecCodeStreamCreateFromCodeStreamByIndex
  3. ROI Decode: nvimgcodecDecoderDecode with region parameters (x, y, width, height)
  4. Memory Management: Direct cudaMalloc for GPU or cucim_malloc for CPU

Metadata Workarounds (nvImageCodec v0.6.0):

  • Aperio: Detect from ImageDescription starting with "Aperio " OR metadata blob kind=1 (MED_APERIO)
  • Philips: Detect from ImageDescription XML (<DataObject ObjectType="DPUfsImport">) OR metadata blob kind=2 (MED_PHILIPS)
  • Known limitation: v0.6.0 misclassifies Aperio as Leica (kind=3) and Philips as Ventana (kind=4) - workarounds implemented

Known Limitations

  • nvImageCodec v0.6.0 doesn't expose individual TIFF tags (SOFTWARE, etc.) - requires metadata blob inspection
  • Associated images with strip-based layout require full decode (no ROI support for strips)
  • Base64-encoded associated images (macro/label in XML metadata) not yet supported

Migration Guide

No code changes required for existing cuCIM users:

# Same API works with cuslide2
from cucim import CuImage
img = CuImage("slide.svs")
region = img.read_region(location=(0, 0), size=(1024, 1024), device="cuda")

cdinea and others added 30 commits November 19, 2025 16:01
- Fix libjpeg-turbo cmake configuration for both cuslide and cuslide2
- Update nvimgcodec cmake dependency configuration
- Update examples CMakeLists
- Update build scripts and documentation
@cdinea cdinea added non-breaking Introduces a non-breaking change and removed breaking Introduces a breaking change labels Nov 25, 2025
@cdinea cdinea added this to the v25.12.00 milestone Nov 25, 2025
fmt::print(" ℹ️ Using CPU buffer for ROI decoding\n");
#endif
}
else if (gpu_available)
Copy link

@mkepa-nv mkepa-nv Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can gpu not be available? Is this a valid case if target is gpu? Maybe you should error instead? User is probably expecting to receive device buffer in this case, and if we return host one it can break things

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for catching this @mkepa-nv - i addressed this bug in this commit

#endif
return true; // roi_stream, image, decode_future cleaned up by RAII
}
catch (const std::exception& e)
Copy link

@mkepa-nv mkepa-nv Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch and others in nvimgodec/ directory are probably not needed. Wat can throw here? For sure CUDA and nvImageCodec API will not throw

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the feedback @mkepa-nv -CUDA and nvImageCodec APIs don't throw - they return error codes. However, the catch blocks might still necessary because:

  • we throw std::runtime_error for error conditions (e.g., GPU availability check)

  • fmt::format() - can throw on formatting errors or memory allocation failures

-string operations, vector resizing, etc. can throw std::bad_alloc

{
switch (kind)
{
case 1: // NVIMGCODEC_METADATA_KIND_MED_APERIO

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use enum named values from nvImageCodec header

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already addressed @mkepa-nv -I've replaced all hardcoded integer values with proper enum constants

{
loader->enqueue(std::move(decode_func),
cucim::loader::TileInfo{ location_index, index, tiledata_offset, tiledata_size });
fmt::print("🔍 Executing decode_func directly (FORCED SINGLE-THREADED)\n");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all those new prints temporary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the feedback @jantonguirao - i wrapped debug prints in #ifdef DEBUG guards in this commit

Comment on lines +225 to +240
if (TARGET CUDA::nvjpeg_static)
target_link_libraries(${CUCIM_PLUGIN_NAME}
PRIVATE
# Add nvjpeg before cudart so that nvjpeg.h in static library takes precedence.
CUDA::nvjpeg_static
# Add CUDA::culibos to link necessary methods for 'deps::nvjpeg_static'
CUDA::culibos
CUDA::cudart
)
else()
target_link_libraries(${CUCIM_PLUGIN_NAME}
PRIVATE
CUDA::nvjpeg
CUDA::cudart
)
endif()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvImageCodec uses dlopen to load the nvjpeg shared-object library from system. Are we using nvjpeg directly anywhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the feedback @jantonguirao - this is a good obeservation - currently cuslide2 is currently reusing ifd.cpp from the original cuslide plugin, which uses NvJpegProcessor for batch JPEG decoding. So the nvjpeg linking is needed -we couuld refactor ifd.cpp to use pure nvImageCodec and remove the nvjpeg dependency

@@ -0,0 +1,256 @@
/*
* Copyright (c) 2020-2022, NVIDIA CORPORATION.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
* Copyright (c) 2020-2025, NVIDIA CORPORATION.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @jantonguirao - updated the year in the header in this commit

uint32_t width, uint32_t height,
uint8_t** output_buffer,
const cucim::io::Device& out_device);
#endif // CUCIM_HAS_NVIMGCODEC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You defined a dummy version of decode_ifd_region_nvimgcodec for the case that CUCIM_HAS_NVIMGCODEC is false (nvimgcodec_decoder.cpp:379) . Shouldn't you expose this declaration unconditionally then?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the feedback @jantonguirao - good catch , I addressed this feedback inthis commit

@@ -0,0 +1,95 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the feedback @jantonguirao - I update the year in the copyright header in this commit


uint32_t ThreadBatchDataLoader::request(uint32_t load_size)
{
#ifdef DEBUG

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those debug messages are meant to stay, I'd recommend a preprocessor macro that would wrap fmt::print with the ifdef DEBUG condition. This way we avoid cluttering the codebase with many #ifdef DEBUG statements.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point @jantonguirao - I made note to address this in a subsequent PR as this might affect the first cuslide plugin and we need to properly test this

plugin_lib = repo_root / "cpp/plugins/cucim.kit.cuslide2/build-release/lib"

if not plugin_lib.exists():
plugin_lib = repo_root / "install/lib"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
plugin_lib = repo_root / "install/lib"
plugin_lib = repo_root / "install" / "lib"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @jantonguirao - good suggestion , addressed inthis commit

}
}

TEST_CASE("Verify raw tiff read", "[test_read_rawtiff.cpp]")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see some comment explaining what this test does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @jantonguirao - i added a comment in [this(https://github.com//pull/978/commits/bab2aaadaa32032d42c8c9327efb838b489d6c00). @gigony FYI

@cdinea
Copy link
Author

cdinea commented Nov 29, 2025

/ok to test bab2aaa

@cdinea
Copy link
Author

cdinea commented Dec 1, 2025

/ok to test 9a0e8f5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants