Skip to content

[BUG] multithreaded read_region() with CuPy asarray() throws CUDARuntimeError, segfaults, or core dumps #884

@jakebytes

Description

@jakebytes

Describe the bug

Using multiple threads, each thread calls cupy.asarray(CuImage.read_region(...)). This causes intermittent crashes ranging from CUDARuntimeError to segfaults and core dumps. This behavior does not reproduce when using a single thread. I confirmed this happens on multiple different input images. Evidence suggests a bug related to memory safety or race conditions.

The code below reproduces the issue fairly reliably, but may need to be run multiple times before a crash is observed. The image referenced is available from TCGA.

Steps/Code to reproduce bug

import math
from concurrent.futures import ThreadPoolExecutor
from functools import partial

import cupy as cp
from cucim import CuImage

# BUG does not reproduce with N_THREADS=1
N_THREADS = 64
slide_path = "34344dfc-b64e-439f-bd15-279cf6c74401/TCGA-BP-5195-01Z-00-DX1.910fae7d-503e-4758-bb45-7c039ff9d179.svs"


def extract_patch(
    coord: tuple[int, int],
    slide: CuImage,
    level: int,
    size: tuple[int, int],
):
    try:
        img: CuImage = slide.read_region(
            location=coord,
            level=level,
            size=size,
        )
        # BUG occurs here
        return cp.asarray(img)
    except Exception as e:
        print(f"(x, y): {coord} level: {level} size: {size}")
        raise e


with ThreadPoolExecutor(max_workers=N_THREADS) as thread_pool:
    # init CuImage object
    slide = CuImage(slide_path)
    width_px, height_px = slide.size("XY")
    # hardcoded for this image, obtained via openslide
    target_mpp = 1.0
    patch_size = 512
    level = 1
    level_downsample = 4.000057352603808
    base_mpp = 0.2498
    # downsample factor to reach the target MPP
    downsample_factor = target_mpp / base_mpp
    scale_factor = downsample_factor / level_downsample
    adjusted_width = math.ceil(patch_size * scale_factor)
    adjusted_height = math.ceil(patch_size * scale_factor)
    # stride in pixels at level 0
    stride = round((patch_size) * downsample_factor)
    rows = math.floor(height_px / stride)
    cols = math.floor(width_px / stride)
    # xy coordinates of every patch
    coords = [
        (col_idx * stride, row_idx * stride)
        for row_idx in range(rows)
        for col_idx in range(cols)
    ]
    # sanity check xy coordinates for given image size
    for x, y in coords:
        assert x >= 0 and x + stride < width_px
        assert y >= 0 and y + stride < height_px

    fn = partial(
        extract_patch,
        slide=slide,
        level=level,
        size=(adjusted_width, adjusted_height),
    )
    # do read_region() in parallel using multithreading
    patches = list(thread_pool.map(fn, coords))

Expected behavior

I expect multithreaded usage of cupy.asarray() with read_region() to not throw sporadic errors.

Environment details (please complete the following information):

  • Environment location: DGX-H200 node
  • Method of cuCIM install: pip

Python package versions

cucim-cu12                               25.4.0
cupy-cuda12x                             13.4.1
nvidia-cublas-cu12                       12.4.5.8
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127
nvidia-cudnn-cu12                        9.1.0.70
nvidia-cufft-cu12                        11.2.1.3
nvidia-curand-cu12                       10.3.5.147
nvidia-cusolver-cu12                     11.6.1.9
nvidia-cusparse-cu12                     12.3.1.170
nvidia-cusparselt-cu12                   0.6.2
nvidia-nccl-cu12                         2.21.5
nvidia-nvjitlink-cu12                    12.4.127
nvidia-nvtx-cu12                         12.4.127

Linux kernel: 5.15.0
CUDA Version: 12.2
Driver Version: 535.216.03

Additional context

Most common crash output:

File "/home/cucim_bug_repro.py", line 25, in extract_patch
return cp.asarray(img)
^^^^^^^^^^^^^^^
File "/home/venv/lib/python3.12/site-packages/cupy/_creation/from_data.py", line 88, in asarray
return _core.array(a, dtype, False, order, blocking=blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cupy/_core/core.pyx", line 2455, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2482, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2647, in cupy._core.core._array_default
File "cupy/cuda/memory.pyx", line 488, in cupy.cuda.memory.MemoryPointer.copy_from_host_async
File "cupy_backends/cuda/api/runtime.pyx", line 607, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidValue: invalid argument

More rare crash output:

The futex facility returned an unexpected error code.
Aborted (core dumped)

CC @gigony

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions