-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Step 0 - Download all available molecular profile IDs and extract mRNA expression molecular profile IDs
This script is located here.
#!/bin/bash
output_dir="/data/shared/repos/bioxpress/downloads/cbio"
url="https://www.cbioportal.org/api/molecular-profiles?projection=SUMMARY&pageSize=100000&pageNumber=0&direction=ASC"
curl -G "${url}" \
-H "accept: application/json" \
-o "${output_dir}/all_molecular_profiles.json"
Only molecular profile IDs with MRNA_EXPRESSION as molecularAlterationType are relevant. They can be found here.
Example of relevant data:
{
"molecularAlterationType": "MRNA_EXPRESSION",
"datatype": "Z-SCORE",
"name": "mRNA expression z-scores relative to normal samples (log RNA Seq V2 RSEM)",
"description": "Expression z-scores of tumor samples compared to the expression distribution of all log-transformed mRNA expression of adjacent normal samples in the cohort.",
"showProfileInAnalysisTab": true,
"patientLevel": false,
"molecularProfileId": "brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores",
"studyId": "brca_tcga_pan_can_atlas_2018"
}
Moreover, not all molecular profiles are relevant; we are only interested in the Z-scores relative to matched normal samples, i.e. the description and molecularProfileId fields should contain terms like "normal", "diploid" etc.
Step 1 - Fetch Sample List IDs
Go to Sample Lists -> GET /api/studies/{studyId}/sample-lists and find sample list IDs of interest.
In our example, studyId is brca_tcga_pan_can_atlas_2018.
This sample list seems relevant:
{
"category": "all_cases_with_mrna_rnaseq_data",
"name": "Samples with mRNA data (RNA Seq V2)",
"description": "Samples with mRNA expression data (1082 samples)",
"sampleListId": "brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna",
"studyId": "brca_tcga_pan_can_atlas_2018"
}
Step 2 - Fetch molecular data
Go to Molecular Data -> GET /api/molecular-profiles/{molecularProfileId}/molecular-data
We need to provide molecularProfileId, sampleListId and entrezGeneId. For example, this is the response for
molecularProfileId=brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_ZscoressampleListId=brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrnaentrezGeneId= 1
[
{
"uniqueSampleKey": "VENHQS0zQy1BQUFVLTAxOmJyY2FfdGNnYV9wYW5fY2FuX2F0bGFzXzIwMTg",
"uniquePatientKey": "VENHQS0zQy1BQUFVOmJyY2FfdGNnYV9wYW5fY2FuX2F0bGFzXzIwMTg",
"entrezGeneId": 1,
"molecularProfileId": "brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores",
"sampleId": "TCGA-3C-AAAU-01",
"patientId": "TCGA-3C-AAAU",
"studyId": "brca_tcga_pan_can_atlas_2018",
"value": 1.7607
},
{
"uniqueSampleKey": "VENHQS0zQy1BQUxJLTAxOmJyY2FfdGNnYV9wYW5fY2FuX2F0bGFzXzIwMTg",
"uniquePatientKey": "VENHQS0zQy1BQUxJOmJyY2FfdGNnYV9wYW5fY2FuX2F0bGFzXzIwMTg",
"entrezGeneId": 1,
"molecularProfileId": "brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores",
"sampleId": "TCGA-3C-AALI-01",
"patientId": "TCGA-3C-AALI",
"studyId": "brca_tcga_pan_can_atlas_2018",
"value": 2.039
},
<...>
]