Add deadlock warnings to Stream.add_callback and Stream.async_done docstrings (#321)

kkraus14 · web-flow · commit 43184d73b7b7 · 2025-07-18T22:23:00.000+01:00
* Add deadlock warnings to Stream.add_callback and Stream.async_done docstrings

- Add warning about potential deadlock when using libraries that call CUDA functions without releasing the GIL
- This can occur when callback functions attempt to acquire the GIL while another thread is holding it and making CUDA calls
- Recommends using libraries that properly release the GIL around CUDA operations

* Update docstring warnings to clarify lock ordering issue

- Clarify that deadlock is due to lock ordering issue between GIL and CUDA driver lock
- Remove reference to 'another thread attempting to make CUDA calls' as this is not required
- Focus on the core issue: callback acquiring GIL while CUDA driver lock is held
diff --git a/numba_cuda/numba/cuda/cudadrv/driver.py b/numba_cuda/numba/cuda/cudadrv/driver.py
@@ -2391,6 +2391,16 @@ def add_callback(self, callback, arg=None):
         callback will block later work in the stream and may block other
         callbacks from being executed.
 
+        .. warning::
+            There is a potential for deadlock due to a lock ordering issue
+            between the GIL and the CUDA driver lock when using libraries
+            that call CUDA functions without releasing the GIL. This can
+            occur when the callback function, which holds the CUDA driver lock,
+            attempts to acquire the GIL while another thread that holds the GIL
+            is waiting for the CUDA driver lock. Consider using libraries that
+            properly release the GIL around CUDA operations or restructure
+            your code to avoid this situation.
+
         Note: The driver function underlying this method is marked for
         eventual deprecation and may be replaced in a future CUDA release.
 
@@ -2425,6 +2435,16 @@ def async_done(self) -> asyncio.futures.Future:
         """
         Return an awaitable that resolves once all preceding stream operations
         are complete. The result of the awaitable is the current stream.
+
+        .. warning::
+            There is a potential for deadlock due to a lock ordering issue
+            between the GIL and the CUDA driver lock when using libraries
+            that call CUDA functions without releasing the GIL. This can
+            occur when the callback function (internally used by this method),
+            which holds the CUDA driver lock, attempts to acquire the GIL
+            while another thread that holds the GIL is waiting for the CUDA driver lock.
+            Consider using libraries that properly release the GIL around
+            CUDA operations or restructure your code to avoid this situation.
         """
         loop = asyncio.get_running_loop()
         future = loop.create_future()