Skip to content

Conversation

@kingcrimsontianyu
Copy link
Contributor

@kingcrimsontianyu kingcrimsontianyu commented Mar 14, 2025

This PR:

  • Implements ThreadPoolSimple that executes tasks with less synchronization overhead. Adds its benchmark to compare against the BS pool.
  • Adds a baseline "static task" benchmark with the least synchronization overhead.
  • Adds explanations to the time metrics Time and CPU as part of the benchmark result.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 14, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kingcrimsontianyu kingcrimsontianyu added feature request New feature or request non-breaking Introduces a non-breaking change c++ Affects the C++ API of KvikIO DO NOT MERGE labels Mar 14, 2025
@kingcrimsontianyu kingcrimsontianyu force-pushed the bench-2 branch 2 times, most recently from bf6bd2c to 5dbb605 Compare March 18, 2025 03:01
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 19, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kingcrimsontianyu
Copy link
Contributor Author

kingcrimsontianyu commented Mar 19, 2025

On an AMD EPYC 7642 48-Core processor, the scalability starts to drop beyond 16 threads, with the BS pool showing quicker deterioration. In this benchmark, each compute task is ~30 microseconds.
thread_scale_EPYC

@kingcrimsontianyu kingcrimsontianyu changed the base branch from branch-25.04 to branch-25.06 March 19, 2025 14:50
TomAugspurger and others added 28 commits September 26, 2025 22:04
`S3Endpoint` takes optional parameters for the AWS region, access key ID, etc. If these aren't set, they're looked up from the environment.

Previously, the only way to specify these from Python was via environment variables. This adds named parameters to `f = kvikio.RemoteFile.open_s3` so that users can specify the credentials programatically. The default behavior is unchanged: environment variables are used when not specified otherwise.

Here's a test snippet against an S3 bucket:

```python
import sys
import boto3
import kvikio
import rmm


bucket, access_key_id, secret_access_key, session_token, default_region = sys.argv[1:]

client = boto3.client(
    's3',
    aws_access_key_id=access_key_id,
    aws_secret_access_key=secret_access_key,
    aws_session_token=session_token,
    region_name=default_region,
)

key = "test/date-2025-09-16"
client.put_object(Bucket=bucket, Key=key, Body=b'Hello, world!')
client.head_object(Bucket=bucket, Key=key)

buf = rmm.DeviceBuffer(size=13)
f = kvikio.RemoteFile.open_s3(bucket, key, access_key_id=access_key_id, secret_access_key=secret_access_key, session_token=session_token, region_name=default_region)
f.read(buf)
print(buf.tobytes())
```

I've set those variables to `_`-prefixed versions. When run, that prints

```
❯ python debug.py kvikiobench-33622 $_AWS_ACCESS_KEY_ID $_AWS_SECRET_ACCESS_KEY $_AWS_SESSION_TOKEN $_AWS_DEFAULT_REGION
b'Hello, world!'
```

Authors:
  - Tom Augspurger (https://github.com/TomAugspurger)

Approvers:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

URL: rapidsai#846
…th issue (rapidsai#844)

## Background

### Initial problem
There is currently an unsolved problem in libcurl, which somehow is mislabeled as merged/solved in curl/curl#13754. For AWS S3 that requires credentials, if an object key name contains `=`, libcurl will fail with an HTTP 403 response. This problem does not occur to public S3 objects. This can be reproduced using the `curl` program:

```bash
#!/usr/bin/env bash

# version: curl 8.15.0-DEV
curl_bin=<my_curl_program_loc>

# ..........S3 private..........
region=$(aws configure get region)
user_password=$(aws configure get aws_access_key_id):$(aws configure get aws_secret_access_key)
# curl can handle this. The object key name does not contain =
url="https://<private-bucket>.s3.<region>.amazonaws.com/witcher/2MiB.bin"
# curl cannot handle this. The object key name contains =
url="https://<private-bucket>.s3.<region>.amazonaws.com/witcher/key=value_2MiB.bin"

$curl_bin -s $url \
--aws-sigv4 "aws:amz:$region:s3" \
--user "$user_password" \
-o /dev/null -w "%{http_code}\n" -v

# ..........S3 public..........
# curl can handle both
url="https://<public-bucket>.s3.<region>.amazonaws.com/witcher/2MiB.bin"
url="https://<public-bucket>.s3.<region>.amazonaws.com/witcher/key=value_2MiB.bin"
$curl_bin -s $url \
-o /dev/null -w "%{http_code}\n" -v
```

### Additional problem
It has been found that beyond `=` alone, other special characters such as `!*'()` in a private S3 object will also cause libcurl error. In addition, some characters such as `+` in a public S3 object will cause the same error.

## This PR

This PR addresses this problem by handling special characters listed in the [AWS object key naming guidelines](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html#object-key-guidelines), for both private and public S3 object names. The KvikIO-specific object key naming guidelines are added to the remote file documentation.

Specifically, this PR introduces utility classes `UrlBuilder` (to complement the existing `UrlParser`), which builds a URL according to the user-provided components, and `UrlEncoder` which uses a compile-time, percent-encoding lookup table to encode selected characters.





Closes rapidsai#823

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Tom Augspurger (https://github.com/TomAugspurger)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#844
…i#848)

This small PR adds `aws_` prefix to the parameter `session_token` to make the parameter names more consistent for S3 utility functions.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Tom Augspurger (https://github.com/TomAugspurger)

URL: rapidsai#848
Forward-merge branch-25.10 into branch-25.12
…sai#853)

This small PR fixes an out-of-bounds memory access that happens when the file open flags consist of a single character (e.g. `"r"` or `"w"` without the `"+"` suffix).

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#853
Supports rollout of new branching strategy. https://docs.rapids.ai/notices/rsn0047/

xref: rapidsai/build-planning#224

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: rapidsai#855
…l namespace (rapidsai#851)

This PR is part of the effort to minimize transitive includes in KvikIO shared library. It moves the NVTX-related code from the public headers to the `detail` namespace.
As a result, the files `parallel_operation.hpp` and `posix_io.hpp` have also been moved to the `detail` namespace.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#851
Contributes to rapidsai/build-planning#224

## Notes for Reviewers

This is safe to admin-merge because the change is a no-op... configs on those 2 branches are identical.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Nate Rock (https://github.com/rockhowse)

URL: rapidsai#857
)

With this PR, KvikIO will support username-based authentication for WebHDFS via environment variable `KVIKIO_WEBHDFS_USERNAME`.

Note: `libcudf` uses KvikIO's utility function `open(url)` to infer endpoint type, where currently the access credentials can only be specified via environment variables instead of programmatically as function parameters. We will address this limitation in the future.

This PR is breaking in that:
- It moves S3 endpoint's utility function `unwrap_or_default` to the detailed namespace, considering that this utility function is supposed to be an implementation detail.
- It adds `username` parameter to one of the two WebHDFS endpoint constructors for completeness (the other constructor has already had `username` as its parameter).

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: rapidsai#859
…e configs (rapidsai#862)

This uses `RAPIDS_BRANCH` in style checks where we reference rapids-cmake configs for `cmake-format`.

xref: rapidsai/build-planning#224

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)

URL: rapidsai#862
Ruff does not yet support Cython, so restore isort only for Cython.

Issue: rapidsai/build-planning#130

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - https://github.com/jakirkham

URL: rapidsai#864
…rapidsai#867)

This fixes a conda environment creation command to support both `x86_64` and `aarch64` systems.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Matthew Murray (https://github.com/Matt711)

URL: rapidsai#867
This PR disables building benchmarks by default, consistent with other RAPIDS projects such as cuDF and RAFT. It also updates the CI build script to ensure that benchmark builds are still tested in CI. This change helps address the issue in cuDF where KvikIO benchmarks are built unnecessarily.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#866
…ai#868)

This PR supports handling the new main branch strategy outlined below:

* [RSN 47 - Changes to RAPIDS branching strategy in 25.12](https://docs.rapids.ai/notices/rsn0047/)

The `update-version.sh` script should now supports two modes controlled via  `CLI` params or `ENV` vars:

CLI arguments: `--run-context=main|release`
ENV var `RAPIDS_RUN_CONTEXT=main|release`

xref: rapidsai/build-planning#224

Authors:
  - Nate Rock (https://github.com/rockhowse)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#868
Recently Cython 3.2.0 was released and we have seen a few subtle issues building with it. While we work out these issues, this pins to Cython 3.1, which know to be working for us.

Similarly PyTest 9 was recently released, but we have ran into some issues with it as well. So pin to PyTest 8 while we work through PyTest 9 issues.

Authors:
  - https://github.com/jakirkham

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#869
This PR significantly improves POSIX I/O write performance as well as cold-page-cache read by opportunistically using Direct I/O. The speedup for sequential write is approximately 3~4x.

The opportunistic POSIX Direct I/O feature can be controlled in two ways:
- Environment variables:
  - `"KVIKIO_AUTO_DIRECT_IO_READ"`: defaults to `false`.
  - `"KVIKIO_AUTO_DIRECT_IO_WRITE"`: defaults to `true`.

- C++/Python API
  - `defaults::set_auto_direct_io_read(bool flag)`/`kvikio.defaults.set("posix_auto_direct_io_read", flag)`
  - `defaults::set_auto_direct_io_write(bool flag)`/`kvikio.defaults.set("posix_auto_direct_io_write", flag)`

In addition, this PR refactors the bounce buffer class. To improve clarity, relevant classes and variables have been renamed and a lot of comments added. The bounce buffer class is now templated by allocator to accommodate different use cases:
- `PageAlignedBounceBufferPool`: used for Direct I/O to/from unaligned host buffer. Does not require CUDA context.
- `CudaPinnedBounceBufferPool`: used for buffered I/O to/from device buffer. Requires CUDA context. This is the original implementation on the main branch.
- `CudaPageAlignedPinnedBounceBufferPool`: used for Direct I/O to/from device buffer. Requires CUDA context.


## Performance results
See rapidsai#863 (comment)

## Goal
- Addresses most part of rapidsai#761
- Addresses the reported write performance issue in cudf

## Non-goal
- This PR does not add opportunistic Direct I/O as file handle's function parameters. This will be revisited in a future PR.
- This PR does not address one of the objectives in rapidsai#520, which is to unify the implementation of the bounce buffer in POSIX IO and in Remote IO. This will be revisited in a future PR.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#863
Forward-merge release/25.12 into main
…#865)

RAPIDS has deployed an autoscaling cloud build cluster that can be used to accelerate building large RAPIDS projects. 

This PR updates the conda and wheel builds to use the build cluster.

This contributes to rapidsai/build-planning#228.

Authors:
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - Nate Rock (https://github.com/rockhowse)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#865
Forward-merge release/25.12 into main
@kingcrimsontianyu
Copy link
Contributor Author

Superseded by #878

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Affects the C++ API of KvikIO feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.