Adding unfused GEMM + reduction path to L2 NN #1175

vinaydes · 2025-07-24T17:32:35Z

No description provided.

copy-pr-bot · 2025-07-24T17:32:39Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cpp/src/distance/unfused_distance_nn.cuh

tfeher

Thanks Vinay for the PR! It is great to have these improvements for he kmeans training.

Apart from fixing the selection function, it would be great to check how this improves ivf-pq build times for different datasets.

tfeher · 2025-07-29T20:25:22Z

cpp/src/cluster/detail/kmeans_balanced.cuh

+template <typename MathT, typename IdxT, typename LabelT>
+bool use_fused(IdxT m, IdxT n, IdxT k)
+{
+#if __CUDA_ARCH__ > 800


This will not work, the macro is not defined in host code. Try using the helpers from raft/util/arch.cuh

cuvs/cpp/src/neighbors/detail/nn_descent.cuh

Lines 1210 to 1222 in 1155a3a

// Tensor operations from `mma.h` are guarded with archicteture

// __CUDA_ARCH__ >= 700. Since RAFT supports compilation for ARCH 600,

// we need to ensure that `local_join_kernel` (which uses tensor) operations

// is not only not compiled, but also a runtime error is presented to the user

auto kernel = preprocess_data_kernel<input_t>;

void* kernel_ptr = reinterpret_cast<void*>(kernel);

auto runtime_arch = raft::util::arch::kernel_virtual_arch(kernel_ptr);

auto wmma_range =

raft::util::arch::SM_range(raft::util::arch::SM_70(), raft::util::arch::SM_future());

if (wmma_range.contains(runtime_arch)) {

local_join(stream, dist_epilogue);

} else {

Added a TODO item for this, thanks. Marking it resolved for now.

This is not just a request to use existing helpers. We are in host code, the CUDA_ARCH macro is not defined. Currently the function always returns true.

cpp/src/distance/unfused_distance_nn.cuh

cjnolet · 2025-07-31T16:32:46Z

/ok to test 80ae8c1

vinaydes requested a review from a team as a code owner July 24, 2025 17:32

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Jul 24, 2025

github-project-automation bot moved this to Todo in Vector Search, ML, & Data Mining Release Board Jul 24, 2025

github-actions bot added the cpp label Jul 24, 2025

vinaydes commented Jul 24, 2025

View reviewed changes

cpp/src/distance/unfused_distance_nn.cuh Show resolved Hide resolved

tfeher added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jul 25, 2025

tfeher self-requested a review July 25, 2025 11:16

tfeher requested changes Jul 29, 2025

View reviewed changes

cjnolet added Waiting for review Waiting for author and removed Waiting for review labels Jul 30, 2025

vinaydes requested a review from a team as a code owner July 31, 2025 13:05

github-actions bot added the CMake label Jul 31, 2025

vinaydes changed the base branch from branch-25.08 to branch-25.10 July 31, 2025 13:06

vinaydes added 10 commits July 31, 2025 14:08

Adding unfused GEMM + reduction path to L2 NN

fc5328e

Removing inline checks

cbefed0

Clang-format changes

7585ccf

Undoing test changes

ec93893

Renaming for clarity

705816c

Using RAFT error checking for cuBLAS call

e018dc4

Trimming headers

292fd7c

Passing raft handle instead of cublas handle

6e7564a

Adding unit test for fused/unfused NN api

7c1a703

Several updates to the unit tests

bfd6f06

vinaydes force-pushed the unfused-l2-nn branch from 64829fa to bfd6f06 Compare July 31, 2025 13:08

vinaydes marked this pull request as draft July 31, 2025 13:09

Adding TODOs from PR comments

80ae8c1

vinaydes added 3 commits July 31, 2025 20:57

Pre-commit related changes

9af519d

Pre-commit related changes

371dfbb

Removing the benchmark for now

239c073

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding unfused GEMM + reduction path to L2 NN #1175

Adding unfused GEMM + reduction path to L2 NN #1175

Uh oh!

vinaydes commented Jul 24, 2025

Uh oh!

copy-pr-bot bot commented Jul 24, 2025

Uh oh!

Uh oh!

tfeher left a comment

Uh oh!

tfeher Jul 29, 2025

Uh oh!

vinaydes Jul 31, 2025

Uh oh!

tfeher Jul 31, 2025

Uh oh!

Uh oh!

cjnolet commented Jul 31, 2025

Uh oh!

Uh oh!

	// Tensor operations from `mma.h` are guarded with archicteture
	// __CUDA_ARCH__ >= 700. Since RAFT supports compilation for ARCH 600,
	// we need to ensure that `local_join_kernel` (which uses tensor) operations
	// is not only not compiled, but also a runtime error is presented to the user
	auto kernel = preprocess_data_kernel<input_t>;
	void* kernel_ptr = reinterpret_cast<void*>(kernel);
	auto runtime_arch = raft::util::arch::kernel_virtual_arch(kernel_ptr);
	auto wmma_range =
	raft::util::arch::SM_range(raft::util::arch::SM_70(), raft::util::arch::SM_future());

	if (wmma_range.contains(runtime_arch)) {
	local_join(stream, dist_epilogue);
	} else {

Adding unfused GEMM + reduction path to L2 NN #1175

Are you sure you want to change the base?

Adding unfused GEMM + reduction path to L2 NN #1175

Uh oh!

Conversation

vinaydes commented Jul 24, 2025

Uh oh!

copy-pr-bot bot commented Jul 24, 2025

Uh oh!

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

tfeher Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

vinaydes Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

tfeher Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cjnolet commented Jul 31, 2025

Uh oh!

Uh oh!