Describe the bug
When calling rsc.tl.rank_genes_groups with the groups=[...] parameter and method='t-test' or method='wilcoxon', the resulting scores, p-values, and log fold changes are all zero or NaN (scores are okay for wilcoxon). Calling the same function on the same AnnData without the groups= parameter (computing all groups) returns correct, non-degenerate values for every group including the one that was previously broken. The method='wilcoxon' path is partially affected: scores compute correctly with groups=, but pvals, pvals_adj, and logfoldchanges are still all zero / NaN.
scanpy's sc.tl.rank_genes_groups on the identical AnnData with the same groupby and groups= arguments returns correct values, so the data itself is valid.
Steps/Code to reproduce bug
import cupy as cp
import cupyx.scipy.sparse as cusp
import anndata as ad
import numpy as np
import pandas as pd
import rapids_singlecell as rsc
import scanpy as sc
n_cells, n_genes = 5000, 1000
X = cusp.random(n_cells, n_genes, density=0.1, dtype=cp.float32, format='csr')
obs = pd.DataFrame({
'group': pd.Categorical(np.random.choice(list('ABCDE'), n_cells))
})
adata = ad.AnnData(X=X, obs=obs)
rsc.pp.normalize_total(adata)
rsc.pp.log1p(adata)
# Works: scores, pvals, lfc all populated correctly for every group
rsc.tl.rank_genes_groups(adata, groupby='group', method='t-test', use_raw=False)
print("All groups (no `groups=`):")
print(pd.DataFrame(adata.uns['rank_genes_groups']['scores']).head())
print(pd.DataFrame(adata.uns['rank_genes_groups']['pvals']).head())
print(pd.DataFrame(adata.uns['rank_genes_groups']['logfoldchanges']).head())
# Broken: scores all 0, pvals all 0, logfoldchanges all NaN
rsc.tl.rank_genes_groups(adata, groupby='group', groups=['A'],
method='t-test', use_raw=False)
print("\nWith `groups=['A']`:")
print(pd.DataFrame(adata.uns['rank_genes_groups']['scores']).head())
print(pd.DataFrame(adata.uns['rank_genes_groups']['pvals']).head())
print(pd.DataFrame(adata.uns['rank_genes_groups']['logfoldchanges']).head())
# scanpy's equivalent on the same data works fine for comparison
adata_cpu = adata.copy()
rsc.get.anndata_to_CPU(adata_cpu)
sc.tl.rank_genes_groups(adata_cpu, groupby='group', groups=['A'],
method='t-test', use_raw=False)
print("\nscanpy with `groups=['A']`:")
print(pd.DataFrame(adata_cpu.uns['rank_genes_groups']['scores']).head())
Expected behavior
Calling rsc.tl.rank_genes_groups with groups=[X] should produce the same scores, p-values, adjusted p-values, and log fold changes for group X as calling it without the groups= argument and indexing into the result for group X. Currently the two code paths disagree:
- Without
groups=: correct values for all groups.
- With
groups=[X]: scores 0, pvals 0, logfoldchanges NaN (t-test); scores correct but pvals/lfc still 0/NaN (wilcoxon).
Environment details (please complete the following information):
- Environment location: Cloud (Lambda Labs / Lambda Cloud GPU VM)
- Linux Distro/Architecture: Ubuntu 22.04 amd64
- GPU Model/Driver: NVIDIA H100 PCIe/Driver Version: 580.105.08
- CUDA: cu13
- Method of Rapids install: pip
pip install 'rapids-singlecell[rapids-cu12]' --extra-index-url=https://pypi.nvidia.com
- Also tried
pip install 'rapids-singlecell[rapids-cu13]' --extra-index-url=https://pypi.nvidia.com which gives a similar output
Describe the bug
When calling
rsc.tl.rank_genes_groupswith thegroups=[...]parameter andmethod='t-test'ormethod='wilcoxon', the resulting scores, p-values, and log fold changes are all zero or NaN (scores are okay for wilcoxon). Calling the same function on the same AnnData without thegroups=parameter (computing all groups) returns correct, non-degenerate values for every group including the one that was previously broken. Themethod='wilcoxon'path is partially affected: scores compute correctly withgroups=, butpvals,pvals_adj, andlogfoldchangesare still all zero / NaN.scanpy's
sc.tl.rank_genes_groupson the identical AnnData with the samegroupbyandgroups=arguments returns correct values, so the data itself is valid.Steps/Code to reproduce bug
Expected behavior
Calling
rsc.tl.rank_genes_groupswithgroups=[X]should produce the same scores, p-values, adjusted p-values, and log fold changes for groupXas calling it without thegroups=argument and indexing into the result for groupX. Currently the two code paths disagree:groups=: correct values for all groups.groups=[X]: scores 0, pvals 0, logfoldchanges NaN (t-test); scores correct but pvals/lfc still 0/NaN (wilcoxon).Environment details (please complete the following information):
pip install 'rapids-singlecell[rapids-cu12]' --extra-index-url=https://pypi.nvidia.compip install 'rapids-singlecell[rapids-cu13]' --extra-index-url=https://pypi.nvidia.comwhich gives a similar output