Skip to content

Conversation

vinaydes
Copy link
Contributor

No description provided.

Copy link

copy-pr-bot bot commented Jul 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Jul 24, 2025
@tfeher tfeher added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jul 25, 2025
@tfeher tfeher self-requested a review July 25, 2025 11:16
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vinay for the PR! It is great to have these improvements for he kmeans training.

Apart from fixing the selection function, it would be great to check how this improves ivf-pq build times for different datasets.

template <typename MathT, typename IdxT, typename LabelT>
bool use_fused(IdxT m, IdxT n, IdxT k)
{
#if __CUDA_ARCH__ > 800
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work, the macro is not defined in host code. Try using the helpers from raft/util/arch.cuh

// Tensor operations from `mma.h` are guarded with archicteture
// __CUDA_ARCH__ >= 700. Since RAFT supports compilation for ARCH 600,
// we need to ensure that `local_join_kernel` (which uses tensor) operations
// is not only not compiled, but also a runtime error is presented to the user
auto kernel = preprocess_data_kernel<input_t>;
void* kernel_ptr = reinterpret_cast<void*>(kernel);
auto runtime_arch = raft::util::arch::kernel_virtual_arch(kernel_ptr);
auto wmma_range =
raft::util::arch::SM_range(raft::util::arch::SM_70(), raft::util::arch::SM_future());
if (wmma_range.contains(runtime_arch)) {
local_join(stream, dist_epilogue);
} else {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO item for this, thanks. Marking it resolved for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not just a request to use existing helpers. We are in host code, the CUDA_ARCH macro is not defined. Currently the function always returns true.

@vinaydes vinaydes requested a review from a team as a code owner July 31, 2025 13:05
@github-actions github-actions bot added the CMake label Jul 31, 2025
@vinaydes vinaydes changed the base branch from branch-25.08 to branch-25.10 July 31, 2025 13:06
@vinaydes vinaydes marked this pull request as draft July 31, 2025 13:09
@cjnolet
Copy link
Member

cjnolet commented Jul 31, 2025

/ok to test 80ae8c1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change Waiting for author
Development

Successfully merging this pull request may close these issues.

3 participants