Skip to content

Add rapids-doctor basic health checks#21928

Open
ncclementi wants to merge 3 commits intorapidsai:mainfrom
ncclementi:rapids-doctor-checks
Open

Add rapids-doctor basic health checks#21928
ncclementi wants to merge 3 commits intorapidsai:mainfrom
ncclementi:rapids-doctor-checks

Conversation

@ncclementi
Copy link
Copy Markdown
Contributor

@ncclementi ncclementi commented Mar 25, 2026

Description

Adding some basic health checks to be discovered by rapids-doctor when this is installed.

Notes:

  • I want some feedback on weather the functional_numba_check is the right approach to test this, or if we should modify it. This was added based on my experience in certain scenarios I encountered, but this might not happen any more. Feedback welcome.
  • I tested the usage and discoverability of the checks locally.
  • We can start with this checks, and continue with more as needed.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes. (N/A)
  • The documentation is up to date with these changes. (N/A)

cc: @jacobtomlinson for visibility

@ncclementi ncclementi requested review from a team as code owners March 25, 2026 14:53
@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 25, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python Mar 25, 2026
@bdice bdice added feature request New feature or request non-breaking Non-breaking change labels Mar 25, 2026
)

got_a = res["a"].to_pandas().tolist()
expected_a = [1, 2]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we guarantee group ordering in cudf to be sorted and/or stable? Might want to check how other cudf tests are handling this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll make sure to modify this. I forgot that Dataframe.equals existed. I'll update this

try:
out = s.apply(lambda x: len(x))
except OSError as e:
if _is_libnvvm_missing_error(e):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how users get into this scenario? Is there a specific common mistake during installation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this usually happens when you are missing the cuda runtime. One common scenario is get a docker container like

docker run --gpus all -it nvidia/cuda:12.8.0-runtime-ubuntu24.04

and then in this scenario, at least in the past (this is an older snippet I have saved), this had an issue:

# Then, in the container...
apt update && apt install python3 python3-pip
pip install --break-system-packages cudf-cu12
python3

# Then, this will work
>>> import cudf
>>> cudf.Series([1, 2, 3])

### but this not
cudf.Series([1, 2, 3]).apply(lambda x: len(x))
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/numba_cuda/numba/cuda/cudadrv/nvvm.py", line 139, in __new__
    inst.driver = open_cudalib('nvvm')
                  ^^^^^^^^^^^^^^^^^^^^

  File "/usr/local/lib/python3.12/dist-packages/numba_cuda/numba/cuda/cudadrv/libs.py", line 84, in open_cudalib
    return ctypes.CDLL(path)
           ^^^^^^^^^^^^^^^^^

  File "/usr/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libnvvm.so: cannot open shared object file: No such file or directory

I'm not sure not sure if this will happen from now on, given cupy 14 and the pip wheels bringing all the ctk.

# SPDX-License-Identifier: Apache-2.0
#

"""cuDF health checks for rapids doctor."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably link to https://github.com/rapidsai/rapids-cli. I had to dig for a while to find that repo, couldn't find it by searching "rapids doctor."

@ncclementi ncclementi force-pushed the rapids-doctor-checks branch from 0f2fb71 to f8368f7 Compare March 30, 2026 17:33
@ncclementi
Copy link
Copy Markdown
Contributor Author

I updated the the code according to the review and answer the question regarding _is_libnvvm_missing_error. I'm noticing a few errors in CI, I rebase against main a few times but I still see failures but they seem unrelated to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants