Skip to content

[BUG]: RuntimeError in pylibcudf.io.parquet.ChunkedParquetReader.read_chunk when changing device in cudf-polars #20608

@TomAugspurger

Description

@TomAugspurger

Describe the bug

The GPU engine for polars allows you to specify a device. Changing the device apparently doesn't mix well with some process-global state used in read_parquet.

Steps/Code to reproduce bug

import polars as pl

fname = 'test.parquet'
data = {"a": [1, 2], "b": [3, 4]}
pl.DataFrame(data).write_parquet(fname)


pl.scan_parquet(fname).collect(engine=pl.GPUEngine(device=0))
print("Device 0 success")
pl.scan_parquet(fname).collect(engine=pl.GPUEngine(device=1))
print("Device 1 success")

which raises with

Device 0 success
Traceback (most recent call last):
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/bug.py", line 12, in <module>
    pl.scan_parquet(fname).collect(engine=pl.GPUEngine(device=1))
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/_utils/deprecation.py", line 97, in wrapper
    return function(*args, **kwargs)
  File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/lazyframe/opt_flags.py", line 330, in wrapper
    return function(*args, **kwargs)
  File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/lazyframe/frame.py", line 2407, in collect
    return wrap_df(ldf.collect(engine, callback))
                   ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/polars/_utils/scan.py", line 27, in _execute_from_rust
    return function(with_columns, *args)
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/callback.py", line 263, in _callback
    return evaluate_streaming(ir, config_options)
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/parallel.py", line 284, in evaluate_streaming
    return get_scheduler(config_options)(graph, key).to_polars()
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/scheduler.py", line 149, in synchronous_scheduler
    cache[k] = _execute_task(graph[k], cache)
               ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/experimental/scheduler.py", line 51, in _execute_task
    return arg[0](*(_execute_task(a, cache) for a in arg[1:]))
           ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/dsl/ir.py", line 788, in do_evaluate
    chunk = reader.read_chunk()
  File "pylibcudf/io/parquet.pyx", line 366, in pylibcudf.io.parquet.ChunkedParquetReader.read_chunk
  File "pylibcudf/io/parquet.pyx", line 385, in pylibcudf.io.parquet.ChunkedParquetReader.read_chunk
RuntimeError: after determining tmp storage requirements for exclusive_scan: cudaErrorInvalidDevice: invalid device ordinal

Expected behavior

No error.

Additional context

Initially reported at pola-rs/polars#25262 (cc @danielcrane)

The error seems to be in pylibcudf. Just using pl.LazyFrame(data).collect() with the two devices works fine.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcudf-polarsIssues specific to cudf-polars

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions