-
Notifications
You must be signed in to change notification settings - Fork 70
Description
Describe the bug
Using multiple threads, each thread calls cupy.asarray(CuImage.read_region(...)). This causes intermittent crashes ranging from CUDARuntimeError to segfaults and core dumps. This behavior does not reproduce when using a single thread. I confirmed this happens on multiple different input images. Evidence suggests a bug related to memory safety or race conditions.
The code below reproduces the issue fairly reliably, but may need to be run multiple times before a crash is observed. The image referenced is available from TCGA.
Steps/Code to reproduce bug
import math
from concurrent.futures import ThreadPoolExecutor
from functools import partial
import cupy as cp
from cucim import CuImage
# BUG does not reproduce with N_THREADS=1
N_THREADS = 64
slide_path = "34344dfc-b64e-439f-bd15-279cf6c74401/TCGA-BP-5195-01Z-00-DX1.910fae7d-503e-4758-bb45-7c039ff9d179.svs"
def extract_patch(
coord: tuple[int, int],
slide: CuImage,
level: int,
size: tuple[int, int],
):
try:
img: CuImage = slide.read_region(
location=coord,
level=level,
size=size,
)
# BUG occurs here
return cp.asarray(img)
except Exception as e:
print(f"(x, y): {coord} level: {level} size: {size}")
raise e
with ThreadPoolExecutor(max_workers=N_THREADS) as thread_pool:
# init CuImage object
slide = CuImage(slide_path)
width_px, height_px = slide.size("XY")
# hardcoded for this image, obtained via openslide
target_mpp = 1.0
patch_size = 512
level = 1
level_downsample = 4.000057352603808
base_mpp = 0.2498
# downsample factor to reach the target MPP
downsample_factor = target_mpp / base_mpp
scale_factor = downsample_factor / level_downsample
adjusted_width = math.ceil(patch_size * scale_factor)
adjusted_height = math.ceil(patch_size * scale_factor)
# stride in pixels at level 0
stride = round((patch_size) * downsample_factor)
rows = math.floor(height_px / stride)
cols = math.floor(width_px / stride)
# xy coordinates of every patch
coords = [
(col_idx * stride, row_idx * stride)
for row_idx in range(rows)
for col_idx in range(cols)
]
# sanity check xy coordinates for given image size
for x, y in coords:
assert x >= 0 and x + stride < width_px
assert y >= 0 and y + stride < height_px
fn = partial(
extract_patch,
slide=slide,
level=level,
size=(adjusted_width, adjusted_height),
)
# do read_region() in parallel using multithreading
patches = list(thread_pool.map(fn, coords))Expected behavior
I expect multithreaded usage of cupy.asarray() with read_region() to not throw sporadic errors.
Environment details (please complete the following information):
- Environment location: DGX-H200 node
- Method of cuCIM install: pip
Python package versions
cucim-cu12 25.4.0
cupy-cuda12x 13.4.1
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
Linux kernel: 5.15.0
CUDA Version: 12.2
Driver Version: 535.216.03
Additional context
Most common crash output:
File "/home/cucim_bug_repro.py", line 25, in extract_patch
return cp.asarray(img)
^^^^^^^^^^^^^^^
File "/home/venv/lib/python3.12/site-packages/cupy/_creation/from_data.py", line 88, in asarray
return _core.array(a, dtype, False, order, blocking=blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cupy/_core/core.pyx", line 2455, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2482, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2647, in cupy._core.core._array_default
File "cupy/cuda/memory.pyx", line 488, in cupy.cuda.memory.MemoryPointer.copy_from_host_async
File "cupy_backends/cuda/api/runtime.pyx", line 607, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidValue: invalid argument
More rare crash output:
The futex facility returned an unexpected error code.
Aborted (core dumped)
CC @gigony