Fix rank_genes_groups with groups parameter#651
Conversation
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis PR fixes issue ChangesBug Fix: GPU Aggregation for Grouped Rank Genes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
tests/test_rank_genes_groups_wilcoxon_binned.py (1)
304-312: ⚡ Quick winAssert the ranked gene names here too.
The new loop only compares numeric arrays. If the subset path returns the same scores/p-values under a different gene order, this can still pass on tied values. Add a
namesequality check before the numeric-field assertions.Suggested change
for group in result_sub["names"].dtype.names: + assert tuple(result_all["names"][group]) == tuple( + result_sub["names"][group] + ) for field in ("scores", "logfoldchanges", "pvals", "pvals_adj"): np.testing.assert_allclose( np.asarray(result_all[field][group], dtype=float), np.asarray(result_sub[field][group], dtype=float), rtol=1e-10,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_rank_genes_groups_wilcoxon_binned.py` around lines 304 - 312, The test currently only compares numeric fields and can miss reordered gene names; before the numeric assertions in the loop over group in result_sub["names"].dtype.names add an explicit equality check of the ranked gene names between result_all and result_sub (e.g., compare result_all["names"][group] to result_sub["names"][group] using an array-equality/assertion helper) so that name order is verified prior to comparing "scores", "logfoldchanges", "pvals", and "pvals_adj".tests/test_rank_genes_groups_wilcoxon.py (1)
151-194: ⚡ Quick winExercise the new Aggregate branch here as well.
With
method="wilcoxon"and the defaultpre_load=False, this test stays on the chunked path and never hits the_basic_statsGPU aggregation branch changed in_core.py. Please parameterizepre_load(or moveXto GPU explicitly) so this regression actually covers the new code path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_rank_genes_groups_wilcoxon.py` around lines 151 - 194, The test test_rank_genes_groups_wilcoxon_subset_matches_scanpy never exercises the GPU Aggregate/_basic_stats branch because it always runs with pre_load=False; update the test to parameterize pre_load (e.g. add `@pytest.mark.parametrize`("pre_load",[False,True]) and pass pre_load=pre_load into the rsc.tl.rank_genes_groups call) so the GPU pre-loaded/aggregate code path in _core.py/_basic_stats is executed (alternatively you can explicitly move the adata_gpu.X to the GPU before calling rsc.tl.rank_genes_groups, but prefer parametrize to keep coverage for both paths).tests/test_rank_genes_groups_ttest.py (1)
162-207: ⚡ Quick winPin this regression against an
rscfull-run baseline too.This validates the subset path against Scanpy, but issue
#650was specifically a divergence betweengroups=[...]andrsc’s own all-groups result. One extra all-groupsrsc.tl.rank_genes_groups(...)call here would lock in that exact contract and make future subset-only regressions much harder to miss.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_rank_genes_groups_ttest.py` around lines 162 - 207, Add an additional full-run rsc baseline call inside test_rank_genes_groups_ttest_subset_matches_scanpy: after creating adata_gpu (and before or after the subset rsc.tl.rank_genes_groups call) run rsc.tl.rank_genes_groups(adata_gpu_full, "blobs", method=method, reference=reference, use_raw=False) (or reuse adata_gpu copy) with no groups argument to produce the rsc all-groups result, then capture adata_gpu_full.uns["rank_genes_groups"] and assert that the entries for the groups being tested match the corresponding entries in that full-run result (compare fields "names", "scores", "logfoldchanges", "pvals", "pvals_adj" the same way you compare to cpu_result) so regressions between subset and rsc full-run outputs are detected.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_rank_genes_groups_ttest.py`:
- Around line 162-207: Add an additional full-run rsc baseline call inside
test_rank_genes_groups_ttest_subset_matches_scanpy: after creating adata_gpu
(and before or after the subset rsc.tl.rank_genes_groups call) run
rsc.tl.rank_genes_groups(adata_gpu_full, "blobs", method=method,
reference=reference, use_raw=False) (or reuse adata_gpu copy) with no groups
argument to produce the rsc all-groups result, then capture
adata_gpu_full.uns["rank_genes_groups"] and assert that the entries for the
groups being tested match the corresponding entries in that full-run result
(compare fields "names", "scores", "logfoldchanges", "pvals", "pvals_adj" the
same way you compare to cpu_result) so regressions between subset and rsc
full-run outputs are detected.
In `@tests/test_rank_genes_groups_wilcoxon_binned.py`:
- Around line 304-312: The test currently only compares numeric fields and can
miss reordered gene names; before the numeric assertions in the loop over group
in result_sub["names"].dtype.names add an explicit equality check of the ranked
gene names between result_all and result_sub (e.g., compare
result_all["names"][group] to result_sub["names"][group] using an
array-equality/assertion helper) so that name order is verified prior to
comparing "scores", "logfoldchanges", "pvals", and "pvals_adj".
In `@tests/test_rank_genes_groups_wilcoxon.py`:
- Around line 151-194: The test
test_rank_genes_groups_wilcoxon_subset_matches_scanpy never exercises the GPU
Aggregate/_basic_stats branch because it always runs with pre_load=False; update
the test to parameterize pre_load (e.g. add
`@pytest.mark.parametrize`("pre_load",[False,True]) and pass pre_load=pre_load
into the rsc.tl.rank_genes_groups call) so the GPU pre-loaded/aggregate code
path in _core.py/_basic_stats is executed (alternatively you can explicitly move
the adata_gpu.X to the GPU before calling rsc.tl.rank_genes_groups, but prefer
parametrize to keep coverage for both paths).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a76aae1a-7d79-42af-9121-6a3769492e83
📒 Files selected for processing (4)
src/rapids_singlecell/tools/_rank_genes_groups/_core.pytests/test_rank_genes_groups_ttest.pytests/test_rank_genes_groups_wilcoxon.pytests/test_rank_genes_groups_wilcoxon_binned.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #651 +/- ##
==========================================
+ Coverage 88.03% 88.12% +0.09%
==========================================
Files 96 96
Lines 7035 7039 +4
==========================================
+ Hits 6193 6203 +10
+ Misses 842 836 -6
|
This PR fixes #650. I also added tests so this wont happen again.