-
Notifications
You must be signed in to change notification settings - Fork 0
Show gene name in DE table for CELLxGENE-style AnnData files (SCP-5859) #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## development #372 +/- ##
============================================
Coverage 75.70% 75.71%
============================================
Files 30 30
Lines 4400 4418 +18
============================================
+ Hits 3331 3345 +14
- Misses 1069 1073 +4
|
bistline
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, modulo the actions comment and a question about a minor change that seems obtuse to me.
| # push: # Uncomment, update branches to develop / debug | ||
| # branches: | ||
| # jb-metadata-boolean | ||
| push: # Uncomment, update branches to develop / debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll want to re-comment this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is my first time dealing with minify_ontologies.yml. I didn't realize I'd need to do that.
eweitz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, thanks for the helpful context in SCP demo!
fix error in uniquefied name splitting
BACKGROUND & CHANGES
This update modifies DE output to use feature names as the first column of DE output so gene names are displayed rather than gene IDs (ie. Ensembl identifiers). This PR also moves raw counts from the adata.raw into the X slot of a new AnnData object so data fed into DE is consistently preprocessed in the same manner before
rank_genes_groupsis called. Additionally, an SCP convention compliant version oftests/data/anndata/cellxgene.human_liver_b_cells.h5adis used for AnnData DE tests (test_de_process_h5ad), versions oftrimmed_compliant_pbmc3K.h5adused in other AnnData tests may eventually need to be replaced due to changes in AnnData (specifically AnnData's Dataframe and Categorical array specifications )MANUAL TESTING
Option 1
Using local instance with development branch plus docker image built from this PR:
HTAPP-330-SMP-1082_compliant.h5ad: gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/differential_expression/HTAPP-330-SMP-1082/HTAPP-330-SMP-1082_compliant.h5ad
Delayed::Jobgcr.io/broad-singlecellportal-staging/scp-ingest-jlc_show_gene_name:2b1f21b
HTAPP-330-SMP-1082_compliant.h5adfor ingest with raw counts set toYesOption 2
Using ingest_pipeline CLI: This requires an AnnData file and cluster & metadata AnnData fragments extracted from the AnnData file.
HTAPP-330-SMP-1082_compliant.h5ad: gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/differential_expression/HTAPP-330-SMP-1082/HTAPP-330-SMP-1082_compliant.h5ad
HTAPP-330-SMP-1082_h5ad_frag.cluster.X_umap.tsv.gz: gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/differential_expression/HTAPP-330-SMP-1082/HTAPP-330-SMP-1082_h5ad_frag.cluster.X_umap.tsv.gz (Note: file will lose the .gz suffix BUT will still need to be decompressed :(
HTAPP-330-SMP-1082_h5ad_frag.metadata.tsv.gz: gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/differential_expression/HTAPP-330-SMP-1082/HTAPP-330-SMP-1082_h5ad_frag.metadata.tsv.gz (Note: file will lose the .gz suffix BUT will still need to be decompressed :(
export BYPASS_MONGO_WRITES='yes'python ingest_pipeline.py --study-id addedfeed000000000000000 --study-file-id dec0dedfeed1111111111111 differential_expression --annotation-name leiden --annotation-type group --annotation-scope study --annotation-file HTAPP-330-SMP-1082_h5ad_frag.metadata.tsv.gz --cluster-file HTAPP-330-SMP-1082_h5ad_frag.cluster.X_umap.tsv.gz --cluster-name umap --matrix-file-path HTAPP-330-SMP-1082_compliant.h5ad --matrix-file-type h5ad --study-accession SCPdev --differential-expression