Add rapids-doctor basic health checks#21928
Add rapids-doctor basic health checks#21928ncclementi wants to merge 3 commits intorapidsai:mainfrom
Conversation
| ) | ||
|
|
||
| got_a = res["a"].to_pandas().tolist() | ||
| expected_a = [1, 2] |
There was a problem hiding this comment.
Do we guarantee group ordering in cudf to be sorted and/or stable? Might want to check how other cudf tests are handling this.
There was a problem hiding this comment.
Good catch, I'll make sure to modify this. I forgot that Dataframe.equals existed. I'll update this
| try: | ||
| out = s.apply(lambda x: len(x)) | ||
| except OSError as e: | ||
| if _is_libnvvm_missing_error(e): |
There was a problem hiding this comment.
Do you know how users get into this scenario? Is there a specific common mistake during installation?
There was a problem hiding this comment.
Yes, this usually happens when you are missing the cuda runtime. One common scenario is get a docker container like
docker run --gpus all -it nvidia/cuda:12.8.0-runtime-ubuntu24.04
and then in this scenario, at least in the past (this is an older snippet I have saved), this had an issue:
# Then, in the container...
apt update && apt install python3 python3-pip
pip install --break-system-packages cudf-cu12
python3
# Then, this will work
>>> import cudf
>>> cudf.Series([1, 2, 3])
### but this not
cudf.Series([1, 2, 3]).apply(lambda x: len(x))
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/numba_cuda/numba/cuda/cudadrv/nvvm.py", line 139, in __new__
inst.driver = open_cudalib('nvvm')
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/numba_cuda/numba/cuda/cudadrv/libs.py", line 84, in open_cudalib
return ctypes.CDLL(path)
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/ctypes/__init__.py", line 379, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libnvvm.so: cannot open shared object file: No such file or directory
I'm not sure not sure if this will happen from now on, given cupy 14 and the pip wheels bringing all the ctk.
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
|
|
||
| """cuDF health checks for rapids doctor.""" |
There was a problem hiding this comment.
This should probably link to https://github.com/rapidsai/rapids-cli. I had to dig for a while to find that repo, couldn't find it by searching "rapids doctor."
0f2fb71 to
f8368f7
Compare
|
I updated the the code according to the review and answer the question regarding |
Description
Adding some basic health checks to be discovered by rapids-doctor when this is installed.
Notes:
functional_numba_checkis the right approach to test this, or if we should modify it. This was added based on my experience in certain scenarios I encountered, but this might not happen any more. Feedback welcome.Checklist
cc: @jacobtomlinson for visibility