-
Notifications
You must be signed in to change notification settings - Fork 146
Spectral Clustering #1425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spectral Clustering #1425
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Update: Update: |
|
Update: Update: |
| @@ -0,0 +1,38 @@ | |||
| /* | |||
| * Copyright (c) 2025, NVIDIA CORPORATION. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we name this file spectral.hpp instead of spectral_clustering. The clustering is already in the namespace, so it's redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in f1ab56a
| * @} | ||
| */ | ||
|
|
||
| void create_connectivity_graph(raft::resources const& handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not part of the standard "fit()" "predict()" APIs that one would normally use day to day, so please add it to the cuvs::preprocessing::spectral_embedding::helpers namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in f1ab56a
f364560 to
f377524
Compare
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
viclafargue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, LGTM! It looks like the connectivity graph uses a int as its nnz type. I guess this is intentional and temporary?
cpp/src/cluster/spectral.cu
Outdated
| raft::linalg::transpose(handle, | ||
| embedding_col_major.data_handle(), | ||
| embedding_row_major.data_handle(), | ||
| n_samples, | ||
| config.n_components, | ||
| raft::resource::get_cuda_stream(handle)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that it is necessary to double VRAM usage and perform a transposition here, but would be great if we could find a workaround in the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Yes, we have to switch from col to row major since spectral embedding needs col major and kmeans needs row major. Yes, we are using int as nnz type currently until we resolve the scale up work in spectral embedding.
| int n_components; | ||
| int n_init; | ||
| int n_neighbors; | ||
| uint64_t seed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we expose the rng_state instead of the seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in d9d4bc8
|
|
||
| void fit_predict(raft::resources const& handle, | ||
| params config, | ||
| raft::device_coo_matrix_view<float, int, int, int> connectivity_graph, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raft::device_coo_matrix_view<double, int, int, int> is not supported while cuGraph has templates for both float and double. Can we add support for the latter as well. Otherwise, i get the below error
error: no suitable user-defined conversion from "raft::device_coo_matrix_view<double, int, int, int>" to "raft::device_coo_matrix_view<float, int, int, int>" exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added double support in 5c608f3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want int here? Specifically for the nnz, we probably want an int64 for that. Otherwise, you are limited to very small graphs where the total number of nonzeros is < 2B (which is very small).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will be able to support int64_t once the spectral scale up work is resolved rapidsai/cuml#7225
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a cuvs issue specifically for this API please, so that it oesn't get left behind once we're able to shift to the int_64 in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please link the issue in the code here in a todo that we're going to be changing the ints to int_64 soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 40ac5c2
|
/merge |
jnke2016
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the changes I requested and it looks good to me. I tested this PR in cuGraph and all tests succeeded. The cuGraph PR will be merged once this one is.
Resolves #1201
This PR introduces the Spectral Clustering algorithm which builds on top of Spectral Embedding.
This PR will enable us to remove the legacy and inaccurate spectral clustering algorithm called
partitionThis PR also adds
doubleprecision support for spectral embedding and spectral clustering. The new templates require a refactor of the source files to.cuhand the template instantiations in.cuGtests have been added to this PR with synthetic data in a similar fashion as kmeans gtests.
I have tested this PR through python bindings and by comparing the plots with sklearn's SpectralClustering.
Here is a plot of sklearn vs ours.
