Skip to content

Commit 95ed069

Browse files
jamesqoi-am-leslie
authored andcommitted
Update FAQ for CDA studies (#11671)
1 parent ebed192 commit 95ed069

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

docs/user-guide/faq.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ The cBioPortal is an exploratory analysis tool for exploring large-scale cancer
127127

128128
By contrast, the [Genomic Data Commons (GDC)](https://gdc.cancer.gov/) aims to be the definitive place for full-download and access to all data generated by TCGA and TARGET. If you want to download raw mRNA expression files or full segmented copy number files, the GDC is probably where you want to start.
129129

130-
As of August 2024, the public cBioPortal contains datasets sourced from the GDC through [ISB-CGC BigQuery](https://bq-search.isb-cgc.org/search?status=current). Currently TCGA and CPTAC are supported, with more programs coming in the future. For an explanation of how these studies differ from their non-GDC counterparts, [see below](#how-do-the-different-tcga-datasets-compare).
130+
As of August 2024, the public cBioPortal contains datasets sourced from the GDC through the [Cancer Data Aggregator](https://cda.readthedocs.io/en/latest/). Currently TCGA, CPTAC, and TARGET are supported, with more programs coming in the future. For an explanation of how these studies differ from their non-GDC counterparts, [see below](#how-do-the-different-tcga-datasets-compare).
131131
#### Does the cBioPortal provide a Web Service API? R interface? MATLAB interface?
132132
Yes, the cBioPortal provides a [Swagger API](https://www.cbioportal.org/api/swagger-ui.html), and [R/MATLAB interfaces](/web-API-and-Clients.md#r-client).
133133
#### Can I use cBioPortal with my own data?
@@ -171,7 +171,7 @@ Check out the [Data Sets Page](https://www.cbioportal.org/datasets) for the comp
171171
#### Which resources are integrated for variant annotation?
172172
cBioPortal supports the annotation of variants from several different databases. These databases provide information about the recurrence of, or prior knowledge about, specific amino acid changes. For each variant, the number of occurrences of mutations at the same amino acid position present in the COSMIC database are reported. Furthermore, variants are annotated as “hotspots” if the amino acid positions were found to be recurrent linear hotspots, as defined by the Cancer Hotspots method ([cancerhotspots.org](https://www.cancerhotspots.org/)), or three-dimensional hotspots, as defined by 3D Hotspots ([3dhotspots.org](https://www.3dhotspots.org/)). Prior knowledge about variants, including clinical actionability information, is provided from three different sources: OncoKB ([www.oncokb.org](https://www.oncokb.org/)), CIViC ([civicdb.org](https://civicdb.org/)), as well as My Cancer Genome ([mycancergenome.org](https://www.mycancergenome.org/)). For OncoKB, exact levels of clinical actionability are displayed in cBioPortal, as defined by [the OncoKB paper](https://ascopubs.org/doi/full/10.1200/PO.17.00011).
173173
#### What version of the human reference genome is being used in cBioPortal?
174-
The [public cBioPortal](https://www.cbioportal.org) largely uses hg19/GRCh37. However, there are studies that use the hg38/GRCh38 reference genome, including datasets sourced from the GDC through ISB-CGC BigQuery.
174+
The [public cBioPortal](https://www.cbioportal.org) largely uses hg19/GRCh37. However, there are studies that use the hg38/GRCh38 reference genome, including datasets sourced from the GDC through the Cancer Data Aggregator.
175175
#### How does cBioPortal handle duplicate samples or sample IDs across different studies?
176176
The cBioPortal generally assumes that samples or patients that have the same ID are actually the same. This is important for cross-cancer queries, where each sample should only be counted once. If a sample is part of multiple cancer cohorts, its alterations are only counted once in the Mutations tab (it will be listed multiple times in the table, but is only counted once in the lollipop plot). However, other tabs (including OncoPrint and Cancer Types Summary) will count the sample twice - for this reason, we advise against querying multiple studies that contain the same samples (e.g., TCGA PanCancer Atlas and TCGA Firehose Legacy).
177177
#### Are there any normal tissue samples available through cBioPortal?
@@ -189,7 +189,7 @@ Data from AACR Project GENIE are provided in a [dedicated instance of cBioPortal
189189

190190
### TCGA
191191
#### What are the TCGA studies sourced from the Genomic Data Commons (GDC)?
192-
The GDC TCGA studies mirror the [Cancer Gateway in the Cloud (ISB-CGC)](https://bq-search.isb-cgc.org/search?status=current) that is hosted on Google BigQuery, which in turn pulls data from GDC. Our [NCI-CRDC pipeline](https://github.com/cBioPortal/nci-crdc-pipeline) pulls data from ISB-CGC and transforms it into cBioPortal-formatted files. The resulting studies are intended to be a pure reflection of what is available inside ISB-CGC; we do not augment them with data from our other TCGA studies. For more information on how ISB-CGC handles GDC data, see [Programs and Data Sets](https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/Hosted-Data.html) and [GDC Overview](https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/data/GDC_top.html).
192+
The GDC TCGA studies mirror the [Cancer Data Aggregator (CDA)](https://cda.readthedocs.io/en/latest/) that pulls data from CRDC. Our ETL pipeline pulls GDC data from CDA and transforms it into cBioPortal-formatted files. The resulting studies are intended to be a pure reflection of what is available inside CDA / GDC; we do not augment them with data from our other TCGA studies. For more information on the CDA and how it handles data from the CRDC, see [here](https://cda.readthedocs.io/en/latest/about_us/).
193193

194194
#### How do the different TCGA datasets compare?
195195
The Firehose Legacy dataset (formerly Provisional datasets) for each TCGA cancer type contains all data available from the Broad Firehose. The publication datasets reflect the data that were used for each of the publications. The samples in a published dataset are usually a subset of the Firehose Legacy dataset, since manuscripts were often written before TCGA completed their goal of sequencing 500 tumors.
@@ -200,7 +200,7 @@ The TCGA PanCancer Atlas datasets derive from an effort to unify TCGA data acros
200200

201201
TCGA studies not sourced from GDC have the original mutation data generated by the individual TCGA sequencing centers. The source of the data is the Broad Firehose (or the publication pages for data that matches a specific manuscript). These data are usually a combination of two mutation callers, e.g. a variant caller like MuTect plus an indel caller. Note that the specific tools used and the overall process for identifying mutations can vary between centers and may have changed over time.
202202

203-
TCGA studies sourced from GDC use a newer version of the human reference genome, GRCh38 instead of GRCh37. For more information about the GDC data processing pipeline, see [GDC Data Processing](https://gdc.cancer.gov/about-data/gdc-data-processing). Transformations specific to our NCI-CRDC pipeline are documented in the [cBioPortal Datahub](https://github.com/cBioPortal/datahub/tree/master/crdc).
203+
TCGA studies sourced from GDC use a newer version of the human reference genome, GRCh38 instead of GRCh37. For more information about the GDC data processing pipeline, see [GDC Data Processing](https://gdc.cancer.gov/about-data/gdc-data-processing). Transformations specific to our CDA pipeline are documented in the [cBioPortal Datahub](https://github.com/cBioPortal/datahub/tree/master/crdc).
204204
#### What happened to TCGA Provisional datasets?
205205
We renamed TCGA Provisional datasets to TCGA Firehose Legacy to better reflect that this data comes from a legacy processing pipeline. The exact same data is now available in TCGA Firehose Legacy studies.
206206
#### Where do the thresholded copy number call in TCGA Firehose Legacy data come from?

0 commit comments

Comments
 (0)