-
Notifications
You must be signed in to change notification settings - Fork 603
Closed
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't workingcidependency-breakIssue is related to an upstream breaking change.Issue is related to an upstream breaking change.
Description
Summary
All conda-python-tests-singlegpu test jobs are experiencing widespread failures (34 tests) across two components: IVF-based nearest neighbors (segmentation faults) and TSNE sparse input with specific distance metrics (CUDA device function errors).
Failing tests/components:
test_nearest_neighbors.py- IVF-Flat and IVF-PQ tests (27 failures)test_tsne.py::test_tsne_distance_metrics_on_sparse_input(7 failures)test_pickle.py- IVF pickle tests (2 failures)
Failure observed in:
- https://github.com/rapidsai/cuml/actions/runs/18914598319/job/53999305120
- All conda-python-tests-singlegpu jobs across different configurations
Environment
Environment independent.
Test Details
Category 1: IVF Nearest Neighbors - Segmentation Faults (27 failures)
Affected tests:
test_nearest_neighbors.py::test_ann_distances_metrics[ivfflat-*](5 tests)test_nearest_neighbors.py::test_ann_distances_metrics[ivfpq-*](4 tests)test_nearest_neighbors.py::test_neighborhood_predictions[ivfflat-*](4 tests)test_nearest_neighbors.py::test_neighborhood_predictions[ivfpq-*](4 tests)test_nearest_neighbors.py::test_ivfflat_pred[*](3 tests)test_nearest_neighbors.py::test_ivfpq_pred[*](2 tests)test_nearest_neighbors.py::test_knn_graph_algorithm[ivfpq]test_nearest_neighbors.py::test_nearest_neighbors_rbc[*]test_pickle.py::test_nearest_neighbors_pickle[ivfflat]test_pickle.py::test_nearest_neighbors_pickle[ivfpq]
Error:
Fatal Python error: Segmentation fault
Current thread (most recent call first):
File "cuml/internals/api_decorators.py", line 200 in wrapper
File "test_nearest_neighbors.py", line 241 in test_ivfpq_pred
Multiple pytest workers (gw2-gw24) crashed during execution with memory corruption errors.
Category 2: TSNE Sparse Input - CUDA Device Function Error (7 failures)
Affected tests:
test_tsne.py::test_tsne_distance_metrics_on_sparse_input[euclidean-exact]test_tsne.py::test_tsne_distance_metrics_on_sparse_input[cityblock-fft]test_tsne.py::test_tsne_distance_metrics_on_sparse_input[cityblock-barnes_hut]test_tsne.py::test_tsne_distance_metrics_on_sparse_input[cityblock-exact]test_tsne.py::test_tsne_distance_metrics_on_sparse_input[l1-fft]test_tsne.py::test_tsne_distance_metrics_on_sparse_input[l1-barnes_hut]test_tsne.py::test_tsne_distance_metrics_on_sparse_input[l1-exact]
Error:
RuntimeError: transform: failed inside CUB: cudaErrorInvalidDeviceFunction: invalid device function
File "test_tsne.py", line 422, in test_tsne_distance_metrics_on_sparse_input
cuml_embedding = cuml_tsne.fit_transform(data_sparse)
File "cuml/manifold/t_sne.pyx", line 654, in cuml.manifold.t_sne.TSNE.fit
Failures occur only with sparse input and L1/cityblock/euclidean distance metrics across all TSNE methods.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingcidependency-breakIssue is related to an upstream breaking change.Issue is related to an upstream breaking change.