-
Notifications
You must be signed in to change notification settings - Fork 123
Adding unfused GEMM + reduction path to L2 NN #1175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.10
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Vinay for the PR! It is great to have these improvements for he kmeans training.
Apart from fixing the selection function, it would be great to check how this improves ivf-pq build times for different datasets.
template <typename MathT, typename IdxT, typename LabelT> | ||
bool use_fused(IdxT m, IdxT n, IdxT k) | ||
{ | ||
#if __CUDA_ARCH__ > 800 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work, the macro is not defined in host code. Try using the helpers from raft/util/arch.cuh
cuvs/cpp/src/neighbors/detail/nn_descent.cuh
Lines 1210 to 1222 in 1155a3a
// Tensor operations from `mma.h` are guarded with archicteture | |
// __CUDA_ARCH__ >= 700. Since RAFT supports compilation for ARCH 600, | |
// we need to ensure that `local_join_kernel` (which uses tensor) operations | |
// is not only not compiled, but also a runtime error is presented to the user | |
auto kernel = preprocess_data_kernel<input_t>; | |
void* kernel_ptr = reinterpret_cast<void*>(kernel); | |
auto runtime_arch = raft::util::arch::kernel_virtual_arch(kernel_ptr); | |
auto wmma_range = | |
raft::util::arch::SM_range(raft::util::arch::SM_70(), raft::util::arch::SM_future()); | |
if (wmma_range.contains(runtime_arch)) { | |
local_join(stream, dist_epilogue); | |
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO item for this, thanks. Marking it resolved for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not just a request to use existing helpers. We are in host code, the CUDA_ARCH macro is not defined. Currently the function always returns true.
/ok to test 80ae8c1 |
No description provided.