Fix rank_genes_groups with groups parameter by Intron7 · Pull Request #651 · scverse/rapids-singlecell

Intron7 · 2026-05-06T14:32:04Z

This PR fixes #650. I also added tests so this wont happen again.

Intron7 · 2026-05-06T14:32:20Z

coderabbitai · 2026-05-06T14:32:26Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-06T14:41:04Z

📝 Walkthrough

Walkthrough

This PR fixes issue #650 where the groups parameter to rank_genes_groups produced zero or NaN scores/statistics on GPU. The core change reworkes _basic_stats to aggregate all categories in a single GPU Aggregate call, then slice to user-selected groups and compute rest statistics from aggregated sums, ensuring consistency with CPU/Scanpy behavior.

Changes

Bug Fix: GPU Aggregation for Grouped Rank Genes

Layer / File(s)	Summary
Core GPU Aggregation Logic `src/rapids_singlecell/tools/_rank_genes_groups/_core.py`	Reworked `_basic_stats` GPU path: single Aggregate call computes per-category sums and sum-of-squares for all groups; results are sliced to user-selected groups and rest statistics are computed from aggregated sums rather than directly from sliced data.
T-Test Regression Tests `tests/test_rank_genes_groups_ttest.py`	Added `test_rank_genes_groups_ttest_subset_matches_scanpy` with parametrization over groups, reference, and method to validate GPU results match Scanpy across subset configurations.
Wilcoxon Regression Tests `tests/test_rank_genes_groups_wilcoxon.py`	Added `test_rank_genes_groups_wilcoxon_subset_matches_scanpy` with parametrization over groups and reference to ensure subset handling produces results consistent with CPU/Scanpy for tie-corrected and non-corrected paths.
Wilcoxon Binned Test Enhancement `tests/test_rank_genes_groups_wilcoxon_binned.py`	Extended group-subset matching tests with two new reference-group scenarios (`rest_single_group` and `rest_group_subset`); strengthened validation by checking multiple score fields (scores, logfoldchanges, pvals, pvals_adj) across all relevant groups.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

scverse/rapids-singlecell#570: Refactors group selection and group_sizes usage in _core.py's GPU path alongside this PR's Aggregate-based aggregation rework.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description references issue `#650` and mentions adding tests, which are related to the changeset, but lacks implementation details about the fix itself.
Linked Issues check	✅ Passed	The PR addresses the core bug reported in `#650`: fixing rank_genes_groups to return correct results when called with groups parameter for t-test and wilcoxon methods, and adds regression tests to prevent recurrence.
Out of Scope Changes check	✅ Passed	All changes are in-scope: core fix in _rank_genes_groups/_core.py for GPU aggregation path, plus targeted regression tests for t-test, wilcoxon, and wilcoxon_binned methods.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title 'Fix rank_genes_groups with groups parameter' directly addresses the main change: fixing a bug in rank_genes_groups when the groups parameter is used, as documented in issue `#650`.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-groups-for-ttest

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

tests/test_rank_genes_groups_wilcoxon_binned.py (1)

304-312: ⚡ Quick win

Assert the ranked gene names here too.

The new loop only compares numeric arrays. If the subset path returns the same scores/p-values under a different gene order, this can still pass on tied values. Add a names equality check before the numeric-field assertions.

Suggested change

         for group in result_sub["names"].dtype.names:
+            assert tuple(result_all["names"][group]) == tuple(
+                result_sub["names"][group]
+            )
             for field in ("scores", "logfoldchanges", "pvals", "pvals_adj"):
                 np.testing.assert_allclose(
                     np.asarray(result_all[field][group], dtype=float),
                     np.asarray(result_sub[field][group], dtype=float),
                     rtol=1e-10,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_rank_genes_groups_wilcoxon_binned.py` around lines 304 - 312, The
test currently only compares numeric fields and can miss reordered gene names;
before the numeric assertions in the loop over group in
result_sub["names"].dtype.names add an explicit equality check of the ranked
gene names between result_all and result_sub (e.g., compare
result_all["names"][group] to result_sub["names"][group] using an
array-equality/assertion helper) so that name order is verified prior to
comparing "scores", "logfoldchanges", "pvals", and "pvals_adj".

tests/test_rank_genes_groups_wilcoxon.py (1)

151-194: ⚡ Quick win

Exercise the new Aggregate branch here as well.

With method="wilcoxon" and the default pre_load=False, this test stays on the chunked path and never hits the _basic_stats GPU aggregation branch changed in _core.py. Please parameterize pre_load (or move X to GPU explicitly) so this regression actually covers the new code path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_rank_genes_groups_wilcoxon.py` around lines 151 - 194, The test
test_rank_genes_groups_wilcoxon_subset_matches_scanpy never exercises the GPU
Aggregate/_basic_stats branch because it always runs with pre_load=False; update
the test to parameterize pre_load (e.g. add
`@pytest.mark.parametrize`("pre_load",[False,True]) and pass pre_load=pre_load
into the rsc.tl.rank_genes_groups call) so the GPU pre-loaded/aggregate code
path in _core.py/_basic_stats is executed (alternatively you can explicitly move
the adata_gpu.X to the GPU before calling rsc.tl.rank_genes_groups, but prefer
parametrize to keep coverage for both paths).

tests/test_rank_genes_groups_ttest.py (1)

162-207: ⚡ Quick win

Pin this regression against an rsc full-run baseline too.

This validates the subset path against Scanpy, but issue #650 was specifically a divergence between groups=[...] and rsc’s own all-groups result. One extra all-groups rsc.tl.rank_genes_groups(...) call here would lock in that exact contract and make future subset-only regressions much harder to miss.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_rank_genes_groups_ttest.py` around lines 162 - 207, Add an
additional full-run rsc baseline call inside
test_rank_genes_groups_ttest_subset_matches_scanpy: after creating adata_gpu
(and before or after the subset rsc.tl.rank_genes_groups call) run
rsc.tl.rank_genes_groups(adata_gpu_full, "blobs", method=method,
reference=reference, use_raw=False) (or reuse adata_gpu copy) with no groups
argument to produce the rsc all-groups result, then capture
adata_gpu_full.uns["rank_genes_groups"] and assert that the entries for the
groups being tested match the corresponding entries in that full-run result
(compare fields "names", "scores", "logfoldchanges", "pvals", "pvals_adj" the
same way you compare to cpu_result) so regressions between subset and rsc
full-run outputs are detected.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_rank_genes_groups_ttest.py`:
- Around line 162-207: Add an additional full-run rsc baseline call inside
test_rank_genes_groups_ttest_subset_matches_scanpy: after creating adata_gpu
(and before or after the subset rsc.tl.rank_genes_groups call) run
rsc.tl.rank_genes_groups(adata_gpu_full, "blobs", method=method,
reference=reference, use_raw=False) (or reuse adata_gpu copy) with no groups
argument to produce the rsc all-groups result, then capture
adata_gpu_full.uns["rank_genes_groups"] and assert that the entries for the
groups being tested match the corresponding entries in that full-run result
(compare fields "names", "scores", "logfoldchanges", "pvals", "pvals_adj" the
same way you compare to cpu_result) so regressions between subset and rsc
full-run outputs are detected.

In `@tests/test_rank_genes_groups_wilcoxon_binned.py`:
- Around line 304-312: The test currently only compares numeric fields and can
miss reordered gene names; before the numeric assertions in the loop over group
in result_sub["names"].dtype.names add an explicit equality check of the ranked
gene names between result_all and result_sub (e.g., compare
result_all["names"][group] to result_sub["names"][group] using an
array-equality/assertion helper) so that name order is verified prior to
comparing "scores", "logfoldchanges", "pvals", and "pvals_adj".

In `@tests/test_rank_genes_groups_wilcoxon.py`:
- Around line 151-194: The test
test_rank_genes_groups_wilcoxon_subset_matches_scanpy never exercises the GPU
Aggregate/_basic_stats branch because it always runs with pre_load=False; update
the test to parameterize pre_load (e.g. add
`@pytest.mark.parametrize`("pre_load",[False,True]) and pass pre_load=pre_load
into the rsc.tl.rank_genes_groups call) so the GPU pre-loaded/aggregate code
path in _core.py/_basic_stats is executed (alternatively you can explicitly move
the adata_gpu.X to the GPU before calling rsc.tl.rank_genes_groups, but prefer
parametrize to keep coverage for both paths).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a76aae1a-7d79-42af-9121-6a3769492e83

📥 Commits

Reviewing files that changed from the base of the PR and between 4aeefb6 and 97ef3ef.

📒 Files selected for processing (4)

src/rapids_singlecell/tools/_rank_genes_groups/_core.py
tests/test_rank_genes_groups_ttest.py
tests/test_rank_genes_groups_wilcoxon.py
tests/test_rank_genes_groups_wilcoxon_binned.py

codecov-commenter · 2026-05-06T14:49:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.12%. Comparing base (4aeefb6) to head (3493bc4).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #651      +/-   ##
==========================================
+ Coverage   88.03%   88.12%   +0.09%     
==========================================
  Files          96       96              
  Lines        7035     7039       +4     
==========================================
+ Hits         6193     6203      +10     
+ Misses        842      836       -6

Files with missing lines	Coverage Δ
...apids_singlecell/tools/_rank_genes_groups/_core.py	`91.63% <100.00%> (+0.14%)`	⬆️

... and 1 file with indirect coverage changes

fix ttest groups

97ef3ef

Intron7 added the run-gpu-ci label May 6, 2026

github-actions Bot removed the run-gpu-ci label May 6, 2026

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Intron7 changed the title ~~fix ttest groups~~ Fix rank_genes_groups with groups parameter May 6, 2026

Intron7 added 2 commits May 6, 2026 17:14

add release note

5cb20ed

add more tests

3493bc4

Intron7 added the run-gpu-ci label May 6, 2026

Intron7 enabled auto-merge (squash) May 6, 2026 15:41

github-actions Bot removed the run-gpu-ci label May 6, 2026

Intron7 mentioned this pull request May 6, 2026

[BUG] Passing 'groups' parameter to rapids_singlecell.tl.rank_genes_groups gives all 0 scores/NA LFCs #650

Closed

Intron7 merged commit 5fd8aa6 into main May 6, 2026
22 of 25 checks passed

Intron7 deleted the fix-groups-for-ttest branch May 6, 2026 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rank_genes_groups with groups parameter#651

Fix rank_genes_groups with groups parameter#651
Intron7 merged 3 commits intomainfrom
fix-groups-for-ttest

Intron7 commented May 6, 2026

Uh oh!

Intron7 commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov-commenter commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Intron7 commented May 6, 2026

Uh oh!

Intron7 commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 6, 2026 •

edited

Loading

codecov-commenter commented May 6, 2026 •

edited

Loading