Skip to content

Commit fcc8111

Browse files
authored
Update KMeans init docstring to include 'random' option (#1020)
The docstring for the dask_ml.cluster.KMeans class currently omits 'random' as a valid option for the init parameter. However, this option is fully implemented in the underlying k_init function and serves as a critical scalable alternative to the default 'k-means||', which can overwhelm the scheduler on large datasets.
1 parent 8e2fbb2 commit fcc8111

File tree

1 file changed

+8
-3
lines changed

1 file changed

+8
-3
lines changed

dask_ml/cluster/k_means.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,19 +36,24 @@ class KMeans(TransformerMixin, BaseEstimator):
3636
----------
3737
n_clusters : int, default 8
3838
Number of clusters to end up with
39-
init : {'k-means||', 'k-means++' or ndarray}
39+
init : {'k-means||', 'k-means++', 'random' or ndarray}
4040
Method for center initialization, defaults to 'k-means||'.
4141
42-
'k-means||' : selects the the gg
42+
'k-means||' : Selects initial cluster centers using a scalable
43+
variant of k-means++. See the notes for more details.
4344
44-
'k-means++' : selects the initial cluster centers in a smart way
45+
'k-means++' : Selects the initial cluster centers in a smart way
4546
to speed up convergence. Uses scikit-learn's implementation.
4647
4748
.. warning::
4849
4950
If using ``'k-means++'``, the entire dataset will be read into
5051
memory at once.
5152
53+
'random' : Selects `n_clusters` random rows from the input data for
54+
the initial centroids. Use `n_init` to run multiple random
55+
initializations for more robust results.
56+
5257
An array of shape (n_clusters, n_features) can be used to give
5358
an explicit starting point
5459

0 commit comments

Comments
 (0)