Skip to content

cBioPortal mRNA expression Z-scores API endpoints #7

@mariacuria

Description

@mariacuria

cBioPortal Swagger API page

Step 0 - Download all available molecular profile IDs and extract mRNA expression molecular profile IDs

This script is located here.

#!/bin/bash

output_dir="/data/shared/repos/bioxpress/downloads/cbio"
url="https://www.cbioportal.org/api/molecular-profiles?projection=SUMMARY&pageSize=100000&pageNumber=0&direction=ASC"
curl -G "${url}" \
     -H "accept: application/json" \
     -o "${output_dir}/all_molecular_profiles.json"

Only molecular profile IDs with MRNA_EXPRESSION as molecularAlterationType are relevant. They can be found here.

Example of relevant data:

{
        "molecularAlterationType": "MRNA_EXPRESSION",
        "datatype": "Z-SCORE",
        "name": "mRNA expression z-scores relative to normal samples (log RNA Seq V2 RSEM)",
        "description": "Expression z-scores of tumor samples compared to the expression distribution of all log-transformed mRNA expression of adjacent normal samples in the cohort.",
        "showProfileInAnalysisTab": true,
        "patientLevel": false,
        "molecularProfileId": "brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores",
        "studyId": "brca_tcga_pan_can_atlas_2018"
    }

Moreover, not all molecular profiles are relevant; we are only interested in the Z-scores relative to matched normal samples, i.e. the description and molecularProfileId fields should contain terms like "normal", "diploid" etc.

Step 1 - Fetch Sample List IDs

Go to Sample Lists -> GET /api/studies/{studyId}/sample-lists and find sample list IDs of interest.
In our example, studyId is brca_tcga_pan_can_atlas_2018.
This sample list seems relevant:

{
    "category": "all_cases_with_mrna_rnaseq_data",
    "name": "Samples with mRNA data (RNA Seq V2)",
    "description": "Samples with mRNA expression data (1082 samples)",
    "sampleListId": "brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna",
    "studyId": "brca_tcga_pan_can_atlas_2018"
  }

Step 2 - Fetch molecular data

Go to Molecular Data -> GET /api/molecular-profiles/{molecularProfileId}/molecular-data
We need to provide molecularProfileId, sampleListId and entrezGeneId. For example, this is the response for

  • molecularProfileId = brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores
  • sampleListId = brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna
  • entrezGeneId = 1
[
  {
    "uniqueSampleKey": "VENHQS0zQy1BQUFVLTAxOmJyY2FfdGNnYV9wYW5fY2FuX2F0bGFzXzIwMTg",
    "uniquePatientKey": "VENHQS0zQy1BQUFVOmJyY2FfdGNnYV9wYW5fY2FuX2F0bGFzXzIwMTg",
    "entrezGeneId": 1,
    "molecularProfileId": "brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores",
    "sampleId": "TCGA-3C-AAAU-01",
    "patientId": "TCGA-3C-AAAU",
    "studyId": "brca_tcga_pan_can_atlas_2018",
    "value": 1.7607
  },
  {
    "uniqueSampleKey": "VENHQS0zQy1BQUxJLTAxOmJyY2FfdGNnYV9wYW5fY2FuX2F0bGFzXzIwMTg",
    "uniquePatientKey": "VENHQS0zQy1BQUxJOmJyY2FfdGNnYV9wYW5fY2FuX2F0bGFzXzIwMTg",
    "entrezGeneId": 1,
    "molecularProfileId": "brca_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores",
    "sampleId": "TCGA-3C-AALI-01",
    "patientId": "TCGA-3C-AALI",
    "studyId": "brca_tcga_pan_can_atlas_2018",
    "value": 2.039
  },
<...>
]

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions