Skip to content

Conversation

@jlchang
Copy link
Contributor

@jlchang jlchang commented Dec 3, 2024

BACKGROUND & CHANGES

This update modifies DE output to use feature names as the first column of DE output so gene names are displayed rather than gene IDs (ie. Ensembl identifiers). This PR also moves raw counts from the adata.raw into the X slot of a new AnnData object so data fed into DE is consistently preprocessed in the same manner before rank_genes_groups is called. Additionally, an SCP convention compliant version of tests/data/anndata/cellxgene.human_liver_b_cells.h5ad is used for AnnData DE tests (test_de_process_h5ad), versions of trimmed_compliant_pbmc3K.h5ad used in other AnnData tests may eventually need to be replaced due to changes in AnnData (specifically AnnData's Dataframe and Categorical array specifications )

MANUAL TESTING

Option 1

Using local instance with development branch plus docker image built from this PR:

  1. Download the following file
    HTAPP-330-SMP-1082_compliant.h5ad: gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/differential_expression/HTAPP-330-SMP-1082/HTAPP-330-SMP-1082_compliant.h5ad
  2. Initialize local environment as normal, boot all services including Delayed::Job
  3. Set the ingest docker image for your local instance to:
    gcr.io/broad-singlecellportal-staging/scp-ingest-jlc_show_gene_name:2b1f21b
  4. Set up HTAPP-330-SMP-1082_compliant.h5ad for ingest with raw counts set to Yes
  5. Confirm that the DE table shows gene name and not gene id
  6. [Optional] Compare results in portal to the reference output file, HTAPP-330-SMP-1082/umap--leiden--0--study--wilcoxon.tsv: gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/differential_expression/HTAPP-330-SMP-1082/umap--leiden--0--study--wilcoxon.tsv
Option 2

Using ingest_pipeline CLI: This requires an AnnData file and cluster & metadata AnnData fragments extracted from the AnnData file.

  1. Download the following files
    HTAPP-330-SMP-1082_compliant.h5ad: gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/differential_expression/HTAPP-330-SMP-1082/HTAPP-330-SMP-1082_compliant.h5ad
    HTAPP-330-SMP-1082_h5ad_frag.cluster.X_umap.tsv.gz: gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/differential_expression/HTAPP-330-SMP-1082/HTAPP-330-SMP-1082_h5ad_frag.cluster.X_umap.tsv.gz (Note: file will lose the .gz suffix BUT will still need to be decompressed :(
    HTAPP-330-SMP-1082_h5ad_frag.metadata.tsv.gz: gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/differential_expression/HTAPP-330-SMP-1082/HTAPP-330-SMP-1082_h5ad_frag.metadata.tsv.gz (Note: file will lose the .gz suffix BUT will still need to be decompressed :(
  2. Initialize your dev environment and export BYPASS_MONGO_WRITES='yes'
  3. Run the following ingest pipeline job:
    python ingest_pipeline.py --study-id addedfeed000000000000000 --study-file-id dec0dedfeed1111111111111 differential_expression --annotation-name leiden --annotation-type group --annotation-scope study --annotation-file HTAPP-330-SMP-1082_h5ad_frag.metadata.tsv.gz --cluster-file HTAPP-330-SMP-1082_h5ad_frag.cluster.X_umap.tsv.gz --cluster-name umap --matrix-file-path HTAPP-330-SMP-1082_compliant.h5ad --matrix-file-type h5ad --study-accession SCPdev --differential-expression
  4. Confirm that the first non-index column in the resulting DE files is gene name and not gene id.
  5. [Optional] Compare your output to the reference output file HTAPP-330-SMP-1082/umap--leiden--0--study--wilcoxon.tsv: gs://fc-2f8ef4c0-b7eb-44b1-96fe-a07f0ea9a982/test_Data/differential_expression/HTAPP-330-SMP-1082/umap--leiden--0--study--wilcoxon.tsv

@jlchang jlchang requested review from bistline and eweitz December 3, 2024 13:07
@codecov
Copy link

codecov bot commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 79.31034% with 6 lines in your changes missing coverage. Please review.

Project coverage is 75.71%. Comparing base (79a6800) to head (bfb9260).
Report is 9 commits behind head on development.

Files with missing lines Patch % Lines
ingest/de.py 79.31% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           development     #372   +/-   ##
============================================
  Coverage        75.70%   75.71%           
============================================
  Files               30       30           
  Lines             4400     4418   +18     
============================================
+ Hits              3331     3345   +14     
- Misses            1069     1073    +4     
Files with missing lines Coverage Δ
ingest/de.py 86.23% <79.31%> (-0.67%) ⬇️

Copy link
Contributor

@bistline bistline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, modulo the actions comment and a question about a minor change that seems obtuse to me.

# push: # Uncomment, update branches to develop / debug
# branches:
# jb-metadata-boolean
push: # Uncomment, update branches to develop / debug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to re-comment this out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is my first time dealing with minify_ontologies.yml. I didn't realize I'd need to do that.

Copy link
Member

@eweitz eweitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, thanks for the helpful context in SCP demo!

fix error in uniquefied name splitting
@jlchang jlchang merged commit 9652c86 into development Dec 3, 2024
4 checks passed
@jlchang jlchang deleted the jlc_show_gene_name branch December 3, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants