Log GPU timing in cudf-polars traces by TomAugspurger · Pull Request #21970 · rapidsai/cudf

TomAugspurger · 2026-03-31T21:01:40Z

Description

The tracing in cudf-polars is currently based around host times. We record a start and stop based on Python's monotonic_ns around a call to IR.do_evaluate. But inside of that IR.do_evaluate we call some non-blocking, asynchronous pylibucdf calls whose runtime extends past the end of stop, running on the GPU.

To measure the actual runtime of GPU operations associated with some IR node, we need to measure when that sequence of GPU operations actually finishes. There's several ways to do this, but I've opted for CUDA Events and cudaLaunchHostFunc.

Using cudaLaunchHostFunc to call a python function can be fraught (deadlocks, often related to the GIL, are apparently a risk)... We deliberately keep the work done inside this function (and so, on the thread calling it) simple: just putting an integer completion token on a Queue.

The actual logging is done by a background thread.

When enabled, we'll emit two traces per IR.do_evaluate:

a trace with scope=evaluate_ir_node containing host information
a trace with scope=evaluate_ir_node_gpu containing gpu information

We include a trace_event_id (a UUID generated on the fly) so that consumers can correlate GPU traces with host traces.

copy-pr-bot · 2026-03-31T21:01:44Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

TomAugspurger · 2026-03-31T21:37:12Z

Using cudaLaunchHostFunc to call a python function can be fraught

Proof: this is currently hanging on some queries. I don't know why yet, but it's related to recording the completion event on result.stream; An earlier version used kwargs["context"].get_cuda_stream(). This didn't hang, but it also didn't correctly measure the time the GPU operations took (since the stream from get_cuda_stream() might not be downstream of the operations in IR.do_evaluate.

TomAugspurger · 2026-03-31T21:47:01Z

Using cudaLaunchHostFunc to call a python function can be fraught

Proof 2: even after "fixing" that (by recording the completion event on another stream with)

                timing_stream = get_joined_cuda_stream(
                    exec_ctx.get_cuda_stream,
                    upstreams=[exec_ctx.get_cuda_stream(), result.stream],
                )

I'm getting intermittent hangs. Maybe this suggests the approach of using cuda.bindings plus a "simple" host callback function is more fragile than I thought. Or maybe I'm missing something about how we're supposed to record the events.

TomAugspurger · 2026-03-31T21:48:19Z

+            trace_event_id: str | None = None
+            query_id_str: str | None = None


I think query_id should be included through context variables?

TomAugspurger · 2026-03-31T21:49:28Z

+
+from __future__ import annotations
+
+import ctypes


ctypes probably isn't the way to do this long term, but might be OK as a POC.

The tracing in cudf-polars is currently based around *host* times. We record a `start` and `stop` based on Python's monotonic_ns around a call to `IR.do_evaluate`. But inside of that IR.do_evaluate we call some non-blocking, asynchronous pylibucdf calls whose runtime extends past the end of `stop`, running on the GPU. To measure the *actual* runtime of GPU operations associated with some IR node, we need to measure when sequence of GPU operations actually finishes. There's several ways to do this, but I've opted for CUDA Events and `cudaLaunchHostFunc`. Using `cudaLaunchHostFunc` to call a *python* function can be fraught... We deliberately keep the work done inside this function (and so, on the thread calling it) simple: just putting an integer completion token on a Queue. The actual logging is done by a background thread.

TomAugspurger · 2026-04-01T18:45:38Z

/ok to test 1eb332d

github-actions Bot assigned TomAugspurger Mar 31, 2026

github-actions Bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Mar 31, 2026

github-project-automation Bot added this to cuDF Python Mar 31, 2026

GPUtester moved this to In Progress in cuDF Python Mar 31, 2026

TomAugspurger changed the title ~~Log GPU Events~~ Log GPU traces in cudf-polars Mar 31, 2026

TomAugspurger commented Mar 31, 2026

View reviewed changes

TomAugspurger added 3 commits April 1, 2026 06:29

hacky hang fix

5476b9f

More hacking

372d0f9

TomAugspurger force-pushed the tom/gpu-timing branch from a90e3b6 to 372d0f9 Compare April 1, 2026 13:29

fixup

17f8f37

TomAugspurger added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 1, 2026

TomAugspurger added 4 commits April 1, 2026 10:33

fixup

b456226

fixup

8d799d2

cleanup

8dcd992

Merge remote-tracking branch 'upstream/main' into tom/gpu-timing

1eb332d

TomAugspurger changed the title ~~Log GPU traces in cudf-polars~~ Log GPU timing in cudf-polars traces Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log GPU timing in cudf-polars traces#21970

Log GPU timing in cudf-polars traces#21970
TomAugspurger wants to merge 8 commits intorapidsai:mainfrom
TomAugspurger:tom/gpu-timing

TomAugspurger commented Mar 31, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Mar 31, 2026

Uh oh!

TomAugspurger commented Mar 31, 2026 •

edited

Loading

Uh oh!

TomAugspurger commented Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomAugspurger Mar 31, 2026

Uh oh!

TomAugspurger Mar 31, 2026

Uh oh!

TomAugspurger commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		trace_event_id: str \| None = None
		query_id_str: str \| None = None

Conversation

TomAugspurger commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

copy-pr-bot Bot commented Mar 31, 2026

Uh oh!

TomAugspurger commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomAugspurger Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TomAugspurger commented Mar 31, 2026 •

edited

Loading

TomAugspurger commented Mar 31, 2026 •

edited

Loading