bioconductor v3.9.0 TCGAbiolinks

The aim of TCGAbiolinks is : i) facilitate the GDC open-access

Link to this section Summary

Functions

BRCA_rnaseqv2

Download GDC data

Prepare GDC data

Parsing clinical xml files

Query GDC data

Retrieve open access ATAC-seq files from GDC server

Retrieve open access maf files from GDC server

Get GDC clinical data

GeneSplitRegulon

GenesCutID

Retrieve table with TCGA molecular subtypes

Creates a volcano plot for DNA methylation or expression

Retrieve molecular subtypes for given TCGA barcodes

Hierarchical cluster analysis

Differential expression analysis (DEA) using edgeR or limma package.

Differentially expression analysis (DEA) using limma package.

Differentially methylated regions Analysis

Enrichment analysis of a gene-set with GO [BP,MF,CC] and pathways.

Enrichment analysis for Gene Ontology (GO) [BP,MF,CC] and Pathways

Filtering mRNA transcripts and miRNA selecting a threshold.

Adding information related to DEGs genes from DEA as mean values in two conditions.

normalization mRNA transcripts and miRNA using EDASeq package.

Generate pathview graph

Array Array Intensity correlation (AAIC) and correlation boxplot to define outlier

Generate Stemness Score based on RNASeq (mRNAsi stemness index) Malta et al., Cell, 2018

survival analysis (SA) univariate with Kaplan-Meier (KM) method.

Generate network

infer gene regulatory networks

Creates survival analysis

Batch correction using ComBat and Voom transformation using limma package.

The aim of TCGAbiolinks is : i) facilitate the TCGA open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) allow the user to download a specific version of the data and thus to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Prepare CEL files into an AffyBatch.

Retrieve multiple tissue types from the same patients.

Retrieve multiple tissue types not from the same patients.

Query gene counts of TCGA and GTEx data from the Recount2 project

Retrieve molecular subtypes for a given tumor

Filters TCGA barcodes according to purity parameters

Barplot of subtypes and clinical info in groups of gene expression clustered.

barPlot for a complete Enrichment Analysis

Heatmap with more sensible behavior using heatmap.plus

Principal components analysis (PCA) plot

Survival analysis with univariate Cox regression package (dnet)

Mean methylation boxplot

Creating a oncoprint

Create starburst plot

TCGA samples with their Pam50 subtypes

TCGA samples with their Tumor Purity measures

Use raw count from the DataPrep object which genes are removed by normalization and filtering steps.

TCGA batch information from Biospecimen Metadata Browser

Calculate pvalues

TCGA CHOL MAF transformed to maftools obejct

Clinical data TCGA BRCA

A list of data frames with clinical data parsed from XML (code in vignettes)

Create samples information matrix for GDC samples

TCGA data matrix BRCA

TCGA data matrix BRCA DEGs

TCGA data SummarizedExperiment READ

TCGA data matrix READ

Calculate diffmean methylation between two groups

Creates a plot for GAIA ouptut (all significant aberrant regions.)

A RangedSummarizedExperiment two samples with gene expression data from vignette aligned against hg38

A RangedSummarizedExperiment two samples with gene expression data from vignette aligned against hg19

geneInfo for normalization of RNAseq data

geneInfoHT for normalization of HTseq data

Get a matrix of interactions of genes from biogrid

Create a Summary table for each sample in a project saying if it contains or not files for a certain data category

Check GDC server status

Retrieve all GDC projects

Get hg19 or hg38 information from biomaRt

Download GISTIC data from firehose

Get a Manifest from GDCquery output that can be used with GDC-client

Get the results table from query

Retrieve summary of files per sample in a project

getTSS to fetch GENCODE gene annotation (transcripts level) from Bioconductor package biomaRt If upstream and downstream are specified in TSS list, promoter regions of GENCODE gene will be generated.

Extract information from TCGA barcodes.

Biplot for Principal Components using ggplot2

Check GDC server status is OK

Get GDC samples with both DNA methylation (HM450K) and Gene expression data from GDC databse

A DNA methylation RangedSummarizedExperiment for 8 samples (only first 20 probes) aligned against hg19

MSI data for two samples

A data frame with all TCGA molecular subtypes

tabSurvKMcompleteDEGs

Link to this section Functions

Link to this function

BRCA_rnaseqv2()

BRCA_rnaseqv2

Description

BRCA_rnaseqv2

Format

A data frame with 200 rows (genes) and 1172 variables (samples)

Download GDC data

Description

Uses GDC API or GDC transfer tool to download gdc data The user can use query argument The data from query will be save in a folder: project/data.category

Usage

GDCdownload(query, token.file, method = "api", directory = "GDCdata",
  files.per.chunk = NULL)

Arguments

ArgumentDescription
queryA query for GDCquery function
token.fileToken file to download controled data (only for method = "client")
methodUses the API (POST method) or gdc client tool. Options "api", "client". API is faster, but the data might get corrupted in the download, and it might need to be executed again
directoryDirectory/Folder where the data was downloaded. Default: GDCdata
files.per.chunkThis will make the API method only download n (files.per.chunk) files at a time. This may reduce the download problems when the data size is too large. Expected a integer number (example files.per.chunk = 6)

Value

Shows the output from the GDC transfer tools

Examples

query <- GDCquery(project = "TCGA-ACC",
data.category =  "Copy number variation",
legacy = TRUE,
file.type = "hg19.seg",
barcode = c("TCGA-OR-A5LR-01A-11D-A29H-01", "TCGA-OR-A5LJ-10A-01D-A29K-01"))
# data will be saved in  GDCdata/TCGA-ACC/legacy/Copy_number_variation/Copy_number_segmentation
GDCdownload(query, method = "api")
# Download clinical data from XML
query <- GDCquery(project = "TCGA-COAD", data.category = "Clinical")
GDCdownload(query, files.per.chunk = 200)
query <- GDCquery(project = "TARGET-AML",
data.category = "Transcriptome Profiling",
data.type = "miRNA Expression Quantification",
workflow.type = "BCGSC miRNA Profiling",
barcode = c("TARGET-20-PARUDL-03A-01R","TARGET-20-PASRRB-03A-01R"))
# data will be saved in:
# example_data_dir/TARGET-AML/harmonized/Transcriptome_Profiling/miRNA_Expression_Quantification
GDCdownload(query, method = "client", directory = "example_data_dir")
acc.gbm <- GDCquery(project =  c("TCGA-ACC","TCGA-GBM"),
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts")
GDCdownload(acc.gbm, method = "api", directory = "example", files.per.chunk = 50)

Prepare GDC data

Description

Reads the data downloaded and prepare it into an R object

Usage

GDCprepare(query, save = FALSE, save.filename, directory = "GDCdata",
  summarizedExperiment = TRUE, remove.files.prepared = FALSE,
  add.gistic2.mut = NULL, mut.pipeline = "mutect2",
  mutant_variant_classification = c("Frame_Shift_Del", "Frame_Shift_Ins",
  "Missense_Mutation", "Nonsense_Mutation", "Splice_Site", "In_Frame_Del",
  "In_Frame_Ins", "Translation_Start_Site", "Nonstop_Mutation"))

Arguments

ArgumentDescription
queryA query for GDCquery function
saveSave result as RData object?
save.filenameName of the file to be save if empty an automatic will be created
directoryDirectory/Folder where the data was downloaded. Default: GDCdata
summarizedExperimentCreate a summarizedExperiment? Default TRUE (if possible)
remove.files.preparedRemove the files read? Default: FALSE This argument will be considered only if save argument is set to true
add.gistic2.mutIf a list of genes (gene symbol) is given, columns with gistic2 results from GDAC firehose (hg19) and a column indicating if there is or not mutation in that gene (hg38) (TRUE or FALSE - use the MAF file for more information) will be added to the sample matrix in the summarized Experiment object.
mut.pipelineIf add.gistic2.mut is not NULL this field will be taken in consideration. Four separate variant calling pipelines are implemented for GDC data harmonization. Options: muse, varscan2, somaticsniper, MuTect2. For more information: https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
mutant_variant_classificationList of mutant_variant_classification that will be consider a sample mutant or not. Default: "Frame_Shift_Del", "Frame_Shift_Ins", "Missense_Mutation", "Nonsense_Mutation", "Splice_Site", "In_Frame_Del", "In_Frame_Ins", "Translation_Start_Site", "Nonstop_Mutation"

Value

A summarizedExperiment or a data.frame

Examples

query <- GDCquery(project = "TCGA-KIRP",
data.category = "Simple Nucleotide Variation",
data.type = "Masked Somatic Mutation",
workflow.type = "MuSE Variant Aggregation and Masking")
GDCdownload(query, method = "api", directory = "maf")
maf <- GDCprepare(query, directory = "maf")

# Get GISTIC values
gistic.query <- GDCquery(project = "TCGA-ACC",
data.category = "Copy Number Variation",
data.type = "Gene Level Copy Number Scores",
access="open")
GDCdownload(gistic.query)
gistic <- GDCprepare(gistic.query)
Link to this function

GDCprepare_clinic()

Parsing clinical xml files

Description

This function receives the query argument and parses the clinical xml files based on the desired information

Usage

GDCprepare_clinic(query, clinical.info, directory = "GDCdata")

Arguments

ArgumentDescription
queryResult from GDCquery, with data.category set to Clinical
clinical.infoWhich information should be retrieved. Options Clinical: drug, admin, follow_up,radiation, patient, stage_event or new_tumor_event Options Biospecimen: protocol, admin, aliquot, analyte, bio_patient, sample, portion, slide
directoryDirectory/Folder where the data was downloaded. Default: GDCdata

Value

A data frame with the parsed values from the XML

Examples

query <- GDCquery(project = "TCGA-COAD",
data.category = "Clinical",
file.type = "xml",
barcode = c("TCGA-RU-A8FL","TCGA-AA-3972"))
GDCdownload(query)
clinical <- GDCprepare_clinic(query,"patient")
clinical.drug <- GDCprepare_clinic(query,"drug")
clinical.radiation <- GDCprepare_clinic(query,"radiation")
clinical.admin <- GDCprepare_clinic(query,"admin")
query <- GDCquery(project = "TCGA-COAD",
data.category = "Biospecimen",
file.type = "xml",
data.type = "Biospecimen Supplement",
barcode = c("TCGA-RU-A8FL","TCGA-AA-3972"))
GDCdownload(query)
clinical <- GDCprepare_clinic(query,"admin")
clinical.drug <- GDCprepare_clinic(query,"sample")
clinical.radiation <- GDCprepare_clinic(query,"portion")
clinical.admin <- GDCprepare_clinic(query,"slide")

Query GDC data

Description

Uses GDC API to search for search, it searches for both controlled and open-acess data. For GDC data arguments project, data.category, data.type and workflow.type should be used For the legacy data arguments project, data.category, platform and/or file.extension should be used. Please, see the vignette for a table with the possibilities.

Usage

GDCquery(project, data.category, data.type, workflow.type,
  legacy = FALSE, access, platform, file.type, barcode,
  experimental.strategy, sample.type)

Arguments

ArgumentDescription
projectA list of valid project (see list with TCGAbiolinks:::getGDCprojects()$project_id)]
data.categoryA valid project (see list with TCGAbiolinks:::getProjectSummary(project))
data.typeA data type to filter the files to download
workflow.typeGDC workflow type
legacySearch in the legacy repository
accessFilter by access type. Possible values: controlled, open

|platform | Example: list(list("ll"), list(" ", "CGH- 1x1M_G4447A ", list(), " IlluminaGA_RNASeqV2 ", list(), " ", "AgilentG4502A_07 ", list(), " IlluminaGA_mRNA_DGE ", list(), " ", "Human1MDuo ", list(), " HumanMethylation450 ", list(), " ", "HG-CGH-415K_G4124A ", list(), " IlluminaGA_miRNASeq ", list(), " ", "HumanHap550 ", list(), " IlluminaHiSeq_miRNASeq ", list(), " ", "ABI ", |

list(), " H-miRNA_8x15K  ", list(), "

", "HG-CGH-244A ", list(), " SOLiD_DNASeq ", list(), " ", "IlluminaDNAMethylation_OMA003_CPI ", list(), " IlluminaGA_DNASeq_automated ", list(), " ", "IlluminaDNAMethylation_OMA002_CPI ", list(), " HG-U133_Plus_2 ", list(), " ", "HuEx- 1_0-st-v2 ", list(), " Mixed_DNASeq ", list(), " ", "H-miRNA_8x15Kv2 ", list(), " IlluminaGA_DNASeq_curated ", list(), " ", "MDA_RPPA_Core ",

list(), " IlluminaHiSeq_TotalRNASeqV2    ", list(), "

", "HT_HG-U133A ", list(), " IlluminaHiSeq_DNASeq_automated ", list(), " ", "diagnostic_images ", list(), " microsat_i ", list(), " ", "IlluminaHiSeq_RNASeq ", list(), " SOLiD_DNASeq_curated ", list(), " ", "IlluminaHiSeq_DNASeqC ", list(), " Mixed_DNASeq_curated ", list(), " ", "IlluminaGA_RNASeq ", list(), " IlluminaGA_DNASeq_Cont_automated ",

list(), "

", "IlluminaGA_DNASeq ", list(), " IlluminaHiSeq_WGBS ", list(), " ", "pathology_reports ", list(), " IlluminaHiSeq_DNASeq_Cont_automated", list(), " ", "Genome_Wide_SNP_6 ", list(), " bio ", list(), " ", "tissue_images ", list(), " Mixed_DNASeq_automated ", list(), " ", "HumanMethylation27 ", list(), " Mixed_DNASeq_Cont_curated ", list(), " ",

"IlluminaHiSeq_RNASeqV2            ", list(), " Mixed_DNASeq_Cont

")) |file.type | To be used in the legacy database for some platforms, to define which file types to be used.| |barcode | A list of barcodes to filter the files to download| |experimental.strategy | Filter to experimental stratey. Harmonized: WXS, RNA-Seq, miRNA-Seq, Genotyping Array. Legacy: WXS, RNA-Seq, miRNA-Seq, Genotyping Array, DNA-Seq, Methylation array, Protein expression array, WXS,CGH array, VALIDATION, Gene expression array,WGS, MSI-Mono-Dinucleotide Assay, miRNA expression array, Mixed strategies, AMPLICON, Exon array, Total RNA-Seq, Capillary sequencing, Bisulfite-Seq| |sample.type | A sample type to filter the files to download|

Value

A data frame with the results and the parameters used

Examples

query <- GDCquery(project = "TCGA-ACC",
data.category = "Copy Number Variation",
data.type = "Copy Number Segment")
query <- GDCquery(project = "TARGET-AML",
data.category = "Transcriptome Profiling",
data.type = "miRNA Expression Quantification",
workflow.type = "BCGSC miRNA Profiling",
barcode = c("TARGET-20-PARUDL-03A-01R","TARGET-20-PASRRB-03A-01R"))
query <- GDCquery(project = "TARGET-AML",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts",
barcode = c("TARGET-20-PADZCG-04A-01R","TARGET-20-PARJCR-09A-01R"))
query <- GDCquery(project = "TCGA-ACC",
data.category =  "Copy Number Variation",
data.type = "Masked Copy Number Segment",
sample.type = c("Primary solid Tumor"))
query.met <- GDCquery(project = c("TCGA-GBM","TCGA-LGG"),
legacy = TRUE,
data.category = "DNA methylation",
platform = "Illumina Human Methylation 450")
query <- GDCquery(project = "TCGA-ACC",
data.category =  "Copy number variation",
legacy = TRUE,
file.type = "hg19.seg",
barcode = c("TCGA-OR-A5LR-01A-11D-A29H-01"))
Link to this function

GDCquery_ATAC_seq()

Retrieve open access ATAC-seq files from GDC server

Description

Retrieve open access ATAC-seq files from GDC server https://gdc.cancer.gov/about-data/publications/ATACseq-AWG Manifest available at: https://gdc.cancer.gov/files/public/file/ATACseq-AWG_Open_GDC-Manifest.txt

Usage

GDCquery_ATAC_seq(tumor = NULL, file.type = NULL)

Arguments

ArgumentDescription
tumora valid tumor
file.typeWrite maf file into a csv document

Value

A data frame with the maf file information

Examples

query <- GDCquery_ATAC_seq(file.type = "txt")
GDCdownload(query)
query <- GDCquery_ATAC_seq(file.type = "bigWigs")
GDCdownload(query)

Retrieve open access maf files from GDC server

Description

GDCquery_Maf uses the following guide to download maf files https://gdc-docs.nci.nih.gov/Data/Release_Notes/Data_Release_Notes/

Usage

GDCquery_Maf(tumor, save.csv = FALSE, directory = "GDCdata",
  pipelines = NULL)

Arguments

ArgumentDescription
tumora valid tumor
save.csvWrite maf file into a csv document
directoryDirectory/Folder where the data will downloaded. Default: GDCdata
pipelinesFour separate variant calling pipelines are implemented for GDC data harmonization. Options: muse, varscan2, somaticsniper, mutect2. For more information: https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/

Value

A data frame with the maf file information

Examples

acc.muse.maf <- GDCquery_Maf("ACC", pipelines = "muse")
acc.varscan2.maf <- GDCquery_Maf("ACC", pipelines = "varscan2")
acc.somaticsniper.maf <- GDCquery_Maf("ACC", pipelines = "somaticsniper")
acc.mutect.maf <- GDCquery_Maf("ACC", pipelines = "mutect2")
Link to this function

GDCquery_clinic()

Get GDC clinical data

Description

GDCquery_clinic will download all clinical information from the API as the one with using the button from each project

Usage

GDCquery_clinic(project, type = "clinical", save.csv = FALSE)

Arguments

ArgumentDescription
projectA valid project (see list with getGDCprojects()$project_id)]
typeA valid type. Options "clinical", "Biospecimen" (see list with getGDCprojects()$project_id)]
save.csvWrite clinical information into a csv document

Value

A data frame with the clinical information

Examples

clin <- GDCquery_clinic("TCGA-ACC", type = "clinical", save.csv = TRUE)
clin <- GDCquery_clinic("TCGA-ACC", type = "biospecimen", save.csv = TRUE)
Link to this function

GeneSplitRegulon()

GeneSplitRegulon

Description

GeneSplitRegulon

Usage

GeneSplitRegulon(Genelist, Sep)

Arguments

ArgumentDescription
GenelistGenelist
SepSep

Value

GeneSplitRegulon

GenesCutID

Description

GenesCutID

Usage

GenesCutID(GeneList)

Arguments

ArgumentDescription
GeneListGeneList

Value

list of gene symbol without IDs

Link to this function

PanCancerAtlas_subtypes()

Retrieve table with TCGA molecular subtypes

Description

PanCancerAtlas_subtypes is a curated table with molecular subtypes for 24 TCGA cancer types

Usage

PanCancerAtlas_subtypes()

Value

a data.frame with barcode and molecular subtypes for 24 cancer types

Examples

molecular.subtypes <- PanCancerAtlas_subtypes()
Link to this function

TCGAVisualize_volcano()

Creates a volcano plot for DNA methylation or expression

Description

Creates a volcano plot from the expression and methylation analysis.

Usage

TCGAVisualize_volcano(x, y, filename = "volcano.pdf",
  ylab = expression(paste(-Log[10], " (FDR corrected -P values)")),
  xlab = NULL, title = "Volcano plot", legend = NULL, label = NULL,
  xlim = NULL, ylim = NULL, color = c("black", "red", "green"),
  names = NULL, names.fill = TRUE, show.names = "significant",
  x.cut = 0, y.cut = 0.01, height = 5, width = 10,
  highlight = NULL, highlight.color = "orange", names.size = 4,
  dpi = 300)

Arguments

ArgumentDescription
xx-axis data
yy-axis data
filenameFilename. Default: volcano.pdf, volcano.svg, volcano.png
ylaby axis text
xlabx axis text
titlemain title. If not specified it will be "Volcano plot (group1 vs group2)
legendLegend title
labelvector of labels to be used in the figure. Example: c("Not Significant","Hypermethylated in group1", "Hypomethylated in group1"))#'
xlimx limits to cut image
ylimy limits to cut image
colorvector of colors to be used in graph
namesNames to be ploted if significant. Should be the same size of x and y
names.fillNames should be filled in a color box? Default: TRUE
show.namesWhat names will be showd? Possibilities: "both", "significant", "highlighted"
x.cutx-axis threshold. Default: 0.0 If you give only one number (e.g. 0.2) the cut-offs will be -0.2 and 0.2. Or you can give diffenrent cutt-ofs as a vector (e.g. c(-0.3,0.4))
y.cutp-values threshold.
heightFigure height
widthFigure width
highlightList of genes/probes to be highlighted. It should be in the names argument.
highlight.colorColor of the points highlighted
names.sizeSize of the names text
dpiFigure dpi

Details

Creates a volcano plot from the expression and methylation analysis. Please see the vignette for more information Observation: This function automatically is called by TCGAanalyse_DMR

Value

Saves the volcano plot in the current folder

Examples

x <- runif(200, -1, 1)
y <- runif(200, 0.01, 1)
TCGAVisualize_volcano(x,y)
TCGAVisualize_volcano(x,y,filename = NULL,y.cut = 10000000,x.cut=0.8,
names = rep("AAAA",length(x)), legend = "Status",
names.fill = FALSE)
TCGAVisualize_volcano(x,y,filename = NULL,y.cut = 10000000,x.cut=0.8,
names = as.character(1:length(x)), legend = "Status",
names.fill = TRUE, highlight = c("1","2"),show="both")
TCGAVisualize_volcano(x,y,filename = NULL,y.cut = 10000000,x.cut=c(-0.3,0.8),
names = as.character(1:length(x)), legend = "Status",
names.fill = TRUE, highlight = c("1","2"),show="both")
while (!(is.null(dev.list()["RStudioGD"]))){dev.off()}
Link to this function

TCGA_MolecularSubtype()

Retrieve molecular subtypes for given TCGA barcodes

Description

TCGA_MolecularSubtype Retrieve molecular subtypes from TCGA consortium for a given set of barcodes

Usage

TCGA_MolecularSubtype(barcodes)

Arguments

ArgumentDescription
barcodesis a vector of TCGA barcodes

Value

List with $subtypes attribute as a dataframe with barcodes, samples, subtypes, and colors. The $filtered attribute is returned as filtered samples with no subtype info

Examples

TCGA_MolecularSubtype("TCGA-60-2721-01A-01R-0851-07")
Link to this function

TCGAanalyze_Clustering()

Hierarchical cluster analysis

Description

Hierarchical cluster analysis using several methods such as ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).

Usage

TCGAanalyze_Clustering(tabDF, method, methodHC = "ward.D2")

Arguments

ArgumentDescription
tabDFis a dataframe or numeric matrix, each row represents a gene, each column represents a sample come from TCGAPrepare.
methodis method to be used for generic cluster such as 'hclust' or 'consensus'
methodHCis method to be used for Hierarchical cluster.

Value

object of class hclust if method selected is 'hclust'. If method selected is 'Consensus' returns a list of length maxK (maximum cluster number to evaluate.). Each element is a list containing consensusMatrix (numerical matrix), consensusTree (hclust), consensusClass (consensus class asssignments). ConsensusClusterPlus also produces images.

Link to this function

TCGAanalyze_DEA()

Differential expression analysis (DEA) using edgeR or limma package.

Description

TCGAanalyze_DEA allows user to perform Differentially expression analysis (DEA), using edgeR package or limma to identify differentially expressed genes (DEGs). It is possible to do a two-class analysis.

TCGAanalyze_DEA performs DEA using following functions from edgeR:

  • edgeR::DGEList converts the count matrix into an edgeR object.

  • edgeR::estimateCommonDisp each gene gets assigned the same dispersion estimate.

  • edgeR::exactTest performs pair-wise tests for differential expression between two groups.

  • edgeR::topTags takes the output from exactTest(), adjusts the raw p-values using the False Discovery Rate (FDR) correction, and returns the top differentially expressed genes.
    TCGAanalyze_DEA performs DEA using following functions from limma:

  • limma::makeContrasts construct matrix of custom contrasts.

  • limma::lmFit Fit linear model for each gene given a series of arrays.

  • limma::contrasts.fit Given a linear model fit to microarray data, compute estimated coefficients and standard errors for a given set of contrasts.

  • limma::eBayes Given a microarray linear model fit, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a common value.

  • limma::toptable Extract a table of the top-ranked genes from a linear model fit.

Usage

TCGAanalyze_DEA(mat1, mat2, metadata = TRUE, Cond1type, Cond2type,
  pipeline = "edgeR", method = "exactTest", fdr.cut = 1,
  logFC.cut = 0, elementsRatio = 30000, batch.factors = NULL,
  ClinicalDF = data.frame(), paired = FALSE, log.trans = FALSE,
  voom = FALSE, trend = FALSE, MAT = data.frame(),
  contrast.formula = "", Condtypes = c())

Arguments

ArgumentDescription
mat1numeric matrix, each row represents a gene, each column represents a sample with Cond1type
mat2numeric matrix, each row represents a gene, each column represents a sample with Cond2type
metadataAdd metadata
Cond1typea string containing the class label of the samples in mat1 (e.g., control group)
Cond2typea string containing the class label of the samples in mat2 (e.g., case group)
pipelinea string to specify which package to use ("limma" or "edgeR")
methodis 'glmLRT' (1) or 'exactTest' (2) used for edgeR (1) Fit a negative binomial generalized log-linear model to the read counts for each gene (2) Compute genewise exact tests for differences in the means between two groups of negative-binomially distributed counts.
fdr.cutis a threshold to filter DEGs according their p-value corrected
logFC.cutis a threshold to filter DEGs according their logFC
elementsRatiois number of elements processed for second for time consumation estimation
batch.factorsa vector containing strings to specify options for batch correction. Options are "Plate", "TSS", "Year", "Portion", "Center", and "Patients"
ClinicalDFa dataframe returned by GDCquery_clinic() to be used to extract year data
pairedboolean to account for paired or non-paired samples. Set to TRUE for paired case
log.transboolean to perform log cpm transformation. Set to TRUE for log transformation
voomboolean to perform voom transformation for limma-voom pipeline. Set to TRUE for voom transformation
trendboolean to perform limma-trend pipeline. Set to TRUE to go through limma-trend
MATmatrix containing expression set as all samples in columns and genes as rows. Do not provide if mat1 and mat2 are used
contrast.formulastring input to determine coefficients and to design contrasts in a customized way
Condtypesvector of grouping for samples in MAT

Value

table with DEGs containing for each gene logFC, logCPM, pValue,and FDR, also for each contrast

Examples

dataNorm <- TCGAbiolinks::TCGAanalyze_Normalization(dataBRCA, geneInfo)
dataFilt <- TCGAanalyze_Filtering(tabDF = dataBRCA, method = "quantile", qnt.cut =  0.25)
samplesNT <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("NT"))
samplesTP <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("TP"))
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,samplesNT],
mat2 = dataFilt[,samplesTP],
Cond1type = "Normal",
Cond2type = "Tumor")
Link to this function

TCGAanalyze_DEA_Affy()

Differentially expression analysis (DEA) using limma package.

Description

Differentially expression analysis (DEA) using limma package.

Usage

TCGAanalyze_DEA_Affy(AffySet, FC.cut = 0.01)

Arguments

ArgumentDescription
AffySetA matrix-like data object containing log-ratios or log-expression values for a series of arrays, with rows corresponding to genes and columns to samples
FC.cutwrite

Value

List of list with tables in 2 by 2 comparison of the top-ranked genes from a linear model fitted by DEA's limma

Examples

to add example
Link to this function

TCGAanalyze_DMR()

Differentially methylated regions Analysis

Description

This function will search for differentially methylated CpG sites, which are regarded as possible functional regions involved in gene transcriptional regulation.

In order to find these regions we use the beta-values (methylation values ranging from 0.0 to 1.0) to compare two groups.

Firstly, it calculates the difference between the mean methylation of each group for each probes. Secondly, it calculates the p-value using the wilcoxon test using the Benjamini-Hochberg adjustment method. The default parameters will require a minimum absolute beta values delta of 0.2 and a false discovery rate (FDR)-adjusted Wilcoxon rank-sum P-value of < 0.01 for the difference.

After these analysis, we save a volcano plot (x-axis:diff mean methylation, y-axis: significance) that will help the user identify the differentially methylated CpG sites and return the object with the calculus in the rowRanges.

If the calculus already exists in the object it will not recalculated. You should set overwrite parameter to TRUE to force it, or remove the collumns with the results from the object.

Usage

TCGAanalyze_DMR(data, groupCol = NULL, group1 = NULL, group2 = NULL,
  calculate.pvalues.probes = "all",
  plot.filename = "methylation_volcano.pdf",
  ylab = expression(paste(-Log[10], " (FDR corrected -P values)")),
  xlab = expression(paste("DNA Methylation difference (", beta,
  "-values)")), title = NULL, legend = "Legend", color = c("black",
  "red", "darkgreen"), label = NULL, xlim = NULL, ylim = NULL,
  p.cut = 0.01, probe.names = FALSE, diffmean.cut = 0.2,
  paired = FALSE, adj.method = "BH", overwrite = FALSE, cores = 1,
  save = TRUE, save.directory = ".", filename = NULL)

Arguments

ArgumentDescription
dataSummarizedExperiment obtained from the TCGAPrepare
groupColColumns with the groups inside the SummarizedExperiment object. (This will be obtained by the function colData(data))
group1In case our object has more than 2 groups, you should set the name of the group
group2In case our object has more than 2 groups, you should set the name of the group
calculate.pvalues.probesIn order to get the probes faster the user can select to calculate the pvalues only for the probes with a difference in DNA methylation. The default is to calculate to all probes. Possible values: "all", "differential". Default "all"
plot.filenameFilename. Default: volcano.pdf, volcano.svg, volcano.png. If set to FALSE, there will be no plot.
ylaby axis text
xlabx axis text
titlemain title. If not specified it will be "Volcano plot (group1 vs group2)
legendLegend title
colorvector of colors to be used in graph
labelvector of labels to be used in the figure. Example: c("Not Significant","Hypermethylated in group1", "Hypomethylated in group1"))
xlimx limits to cut image
ylimy limits to cut image
p.cutp values threshold. Default: 0.01
probe.namesis probe.names
diffmean.cutdiffmean threshold. Default: 0.2
pairedWilcoxon paired parameter. Default: FALSE
adj.methodAdjusted method for the p-value calculation
overwriteOverwrite the pvalues and diffmean values if already in the object for both groups? Default: FALSE
coresNumber of cores to be used in the non-parametric test Default = groupCol.group1.group2.rda
saveSave object with results? Default: TRUE
save.directoryDirectory to save the files. Default: working directory
filenameName of the file to save the object.

Value

Volcano plot saved and the given data with the results (diffmean.group1.group2,p.value.group1.group2, p.value.adj.group1.group2,status.group1.group2) in the rowRanges where group1 and group2 are the names of the groups

Examples

nrows <- 200; ncols <- 20
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GenomicRanges::GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges::IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- S4Vectors::DataFrame(Treatment=rep(c("ChIP", "Input"), 5),
row.names=LETTERS[1:20],
group=rep(c("group1","group2"),c(10,10)))
data <- SummarizedExperiment::SummarizedExperiment(
assays=S4Vectors::SimpleList(counts=counts),
rowRanges=rowRanges,
colData=colData)
SummarizedExperiment::colData(data)$group <- c(rep("group 1",ncol(data)/2),
rep("group 2",ncol(data)/2))
hypo.hyper <- TCGAanalyze_DMR(data, p.cut = 0.85,"group","group 1","group 2")
SummarizedExperiment::colData(data)$group2 <- c(rep("group_1",ncol(data)/2),
rep("group_2",ncol(data)/2))
hypo.hyper <- TCGAanalyze_DMR(data, p.cut = 0.85,"group2","group_1","group_2")
Link to this function

TCGAanalyze_EA()

Enrichment analysis of a gene-set with GO [BP,MF,CC] and pathways.

Description

The rational behind a enrichment analysis ( gene-set, pathway etc) is to compute statistics of whether the overlap between the focus list (signature) and the gene-set is significant. ie the confidence that overlap between the list is not due to chance. The Gene Ontology project describes genes (gene products) using terms from three structured vocabularies: biological process, cellular component and molecular function. The Gene Ontology Enrichment component, also referred to as the GO Terms" component, allows the genes in any such "changed-gene" list to be characterized using the Gene Ontology terms annotated to them. It asks, whether for any particular GO term, the fraction of genes assigned to it in the "changed-gene" list is higher than expected by chance (is over-represented), relative to the fraction of genes assigned to that term in the reference set. In statistical terms it peform the analysis tests the null hypothesis that, for any particular ontology term, there is no diffeerence in the proportion of genes annotated to it in the reference list and the proportion annotated to it in the test list. We adopted a Fisher Exact Test to perform the EA.

Usage

TCGAanalyze_EA(GeneName, RegulonList, TableEnrichment, EAGenes, GOtype,
  FDRThresh = 0.01)

Arguments

ArgumentDescription
GeneNameis the name of gene signatures list
RegulonListis a gene signature (lisf of genes) in which perform EA.
TableEnrichmentis a table related to annotations of gene symbols such as GO[BP,MF,CC] and Pathways. It was created from DAVID gene ontology on-line.
EAGenesis a table with informations about genes such as ID, Gene, Description, Location and Family.
GOtypeis type of gene ontology Biological process (BP), Molecular Function (MF), Cellular componet (CC)
FDRThreshpvalue corrected (FDR) as threshold to selected significant BP, MF,CC, or pathways. (default FDR < 0.01)

Value

Table with enriched GO or pathways by selected gene signature.

Examples

EAGenes <- get("EAGenes")
RegulonList <- rownames(dataDEGsFiltLevel)
ResBP <- TCGAanalyze_EA(GeneName="DEA genes Normal Vs Tumor",
RegulonList,DAVID_BP_matrix,
EAGenes,GOtype = "DavidBP")
Link to this function

TCGAanalyze_EAcomplete()

Enrichment analysis for Gene Ontology (GO) [BP,MF,CC] and Pathways

Description

Researchers, in order to better understand the underlying biological processes, often want to retrieve a functional profile of a set of genes that might have an important role. This can be done by performing an enrichment analysis.

We will perform an enrichment analysis on gene sets using the TCGAanalyze_EAcomplete function. Given a set of genes that are up-regulated under certain conditions, an enrichment analysis will find identify classes of genes or proteins that are #'over-represented using annotations for that gene set.

Usage

TCGAanalyze_EAcomplete(TFname, RegulonList)

Arguments

ArgumentDescription
TFnameis the name of the list of genes or TF's regulon.
RegulonListList of genes such as TF's regulon or DEGs where to find enrichment.

Value

Enrichment analysis GO[BP,MF,CC] and Pathways complete table enriched by genelist.

Examples

Genelist <- c("FN1","COL1A1")
ansEA <- TCGAanalyze_EAcomplete(TFname="DEA genes Normal Vs Tumor",Genelist)
Genelist <- rownames(dataDEGsFiltLevel)
system.time(ansEA <- TCGAanalyze_EAcomplete(TFname="DEA genes Normal Vs Tumor",Genelist))
Link to this function

TCGAanalyze_Filtering()

Filtering mRNA transcripts and miRNA selecting a threshold.

Description

TCGAanalyze_Filtering allows user to filter mRNA transcripts and miRNA, selecting a threshold. For istance returns all mRNA or miRNA with mean across all samples, higher than the threshold defined quantile mean across all samples.

Usage

TCGAanalyze_Filtering(tabDF, method, qnt.cut = 0.25, var.func = IQR,
  var.cutoff = 0.75, eta = 0.05, foldChange = 1)

Arguments

ArgumentDescription
tabDFis a dataframe or numeric matrix, each row represents a gene, each column represents a sample come from TCGAPrepare
methodis method of filtering such as 'quantile', 'varFilter', 'filter1', 'filter2'
qnt.cutis threshold selected as mean for filtering
var.funcis function used as the per-feature filtering statistic. See genefilter documentation
var.cutoffis a numeric value. See genefilter documentation
etais a paramter for filter1. default eta = 0.05.
foldChangeis a paramter for filter2. default foldChange = 1.

Value

A filtered dataframe or numeric matrix where each row represents a gene, each column represents a sample

Examples

dataNorm <- TCGAbiolinks::TCGAanalyze_Normalization(dataBRCA, geneInfo)
dataNorm <- TCGAanalyze_Normalization(tabDF = dataBRCA,
geneInfo = geneInfo,
method = "geneLength")
dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm, method = "quantile", qnt.cut = 0.25)
Link to this function

TCGAanalyze_LevelTab()

Adding information related to DEGs genes from DEA as mean values in two conditions.

Description

TCGAanalyze_LevelTab allows user to add information related to DEGs genes from Differentially expression analysis (DEA) such as mean values and in two conditions.

Usage

TCGAanalyze_LevelTab(FC_FDR_table_mRNA, typeCond1, typeCond2, TableCond1,
  TableCond2, typeOrder = TRUE)

Arguments

ArgumentDescription
FC_FDR_table_mRNAOutput of dataDEGs filter by abs(LogFC) >=1
typeCond1a string containing the class label of the samples in TableCond1 (e.g., control group)
typeCond2a string containing the class label of the samples in TableCond2 (e.g., case group)
TableCond1numeric matrix, each row represents a gene, each column represents a sample with Cond1type
TableCond2numeric matrix, each row represents a gene, each column represents a sample with Cond2type
typeOrdertypeOrder

Value

table with DEGs, log Fold Change (FC), false discovery rate (FDR), the gene expression level for samples in Cond1type, and Cond2type, and Delta value (the difference of gene expression between the two conditions multiplied logFC)

Examples

dataNorm <- TCGAbiolinks::TCGAanalyze_Normalization(dataBRCA, geneInfo)
dataFilt <- TCGAanalyze_Filtering(tabDF = dataBRCA, method = "quantile", qnt.cut =  0.25)
samplesNT <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("NT"))
samplesTP <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("TP"))
dataDEGs <- TCGAanalyze_DEA(dataFilt[,samplesNT],
dataFilt[,samplesTP],
Cond1type = "Normal",
Cond2type = "Tumor")
dataDEGsFilt <- dataDEGs[abs(dataDEGs$logFC) >= 1,]
dataTP <- dataFilt[,samplesTP]
dataTN <- dataFilt[,samplesNT]
dataDEGsFiltLevel <- TCGAanalyze_LevelTab(dataDEGsFilt,"Tumor","Normal",
dataTP,dataTN)
Link to this function

TCGAanalyze_Normalization()

normalization mRNA transcripts and miRNA using EDASeq package.

Description

TCGAanalyze_Normalization allows user to normalize mRNA transcripts and miRNA, using EDASeq package.

Normalization for RNA-Seq Numerical and graphical summaries of RNA-Seq read data. Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al., 2011). Between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al., 2010).

For istance returns all mRNA or miRNA with mean across all samples, higher than the threshold defined quantile mean across all samples.

TCGAanalyze_Normalization performs normalization using following functions from EDASeq

  • EDASeq::newSeqExpressionSet

  • EDASeq::withinLaneNormalization

  • EDASeq::betweenLaneNormalization

  • EDASeq::counts

Usage

TCGAanalyze_Normalization(tabDF, geneInfo, method = "geneLength")

Arguments

ArgumentDescription
tabDFRnaseq numeric matrix, each row represents a gene, each column represents a sample
geneInfoInformation matrix of 20531 genes about geneLength and gcContent. Two objects are provided: TCGAbiolinks::geneInfoHT,TCGAbiolinks::geneInfo
methodis method of normalization such as 'gcContent' or 'geneLength'

Value

Rnaseq matrix normalized with counts slot holds the count data as a matrix of non-negative integer count values, one row for each observational unit (gene or the like), and one column for each sample.

Examples

dataNorm <- TCGAbiolinks::TCGAanalyze_Normalization(dataBRCA, geneInfo)
Link to this function

TCGAanalyze_Pathview()

Generate pathview graph

Description

TCGAanalyze_Pathview pathway based data integration and visualization.

Usage

TCGAanalyze_Pathview(dataDEGs, pathwayKEGG = "hsa05200")

Arguments

ArgumentDescription
dataDEGsdataDEGs
pathwayKEGGpathwayKEGG

Value

an adjacent matrix

Examples

dataDEGs <- data.frame(mRNA = c("TP53","TP63","TP73"), logFC = c(1,2,3))
TCGAanalyze_Pathview(dataDEGs)
Link to this function

TCGAanalyze_Preprocessing()

Array Array Intensity correlation (AAIC) and correlation boxplot to define outlier

Description

TCGAanalyze_Preprocessing perform Array Array Intensity correlation (AAIC). It defines a square symmetric matrix of spearman correlation among samples. According this matrix and boxplot of correlation samples by samples it is possible to find samples with low correlation that can be identified as possible outliers.

Usage

TCGAanalyze_Preprocessing(object, cor.cut = 0, filename = NULL,
  width = 1000, height = 1000, datatype = names(assays(object))[1])

Arguments

ArgumentDescription
objectof gene expression of class RangedSummarizedExperiment from TCGAprepare
cor.cutis a threshold to filter samples according their spearman correlation in samples by samples. default cor.cut is 0
filenameFilename of the image file
widthImage width
heightImage height
datatypeis a string from RangedSummarizedExperiment assay

Value

Plot with array array intensity correlation and boxplot of correlation samples by samples

Link to this function

TCGAanalyze_Stemness()

Generate Stemness Score based on RNASeq (mRNAsi stemness index) Malta et al., Cell, 2018

Description

TCGAanalyze_Stemness generate the mRNAsi score

Usage

TCGAanalyze_Stemness(stemSig, dataGE)

Arguments

ArgumentDescription
stemSigis a vector of the stemness Signature generated using gelnet package
dataGEis a matrix of Gene expression (genes in rows, samples in cols) from TCGAprepare

Value

table with samples and stemness score

Examples

# Selecting TCGA breast cancer (10 samples) for example stored in dataBRCA
dataNorm <- TCGAanalyze_Normalization(tabDF = dataBRCA, geneInfo =  geneInfo)
# quantile filter of genes
dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm,
method = "quantile",
qnt.cut =  0.25)
dataBRCA_stemness <- TCGAanalyze_Stemness(stemSig = PCBC_stemSig, dataGE = dataFilt)
Link to this function

TCGAanalyze_SurvivalKM()

survival analysis (SA) univariate with Kaplan-Meier (KM) method.

Description

TCGAanalyze_SurvivalKM perform an univariate Kaplan-Meier (KM) survival analysis (SA). It performed Kaplan-Meier survival univariate using complete follow up with all days taking one gene a time from Genelist of gene symbols. For each gene according its level of mean expression in cancer samples, defining two thresholds for quantile expression of that gene in all samples (default ThreshTop=0.67,ThreshDown=0.33) it is possible to define a threshold of intensity of gene expression to divide the samples in 3 groups (High, intermediate, low). TCGAanalyze_SurvivalKM performs SA between High and low groups using following functions from survival package

  • survival::Surv

  • survival::survdiff

  • survival::survfit

Usage

TCGAanalyze_SurvivalKM(clinical_patient, dataGE, Genelist,
  Survresult = FALSE, ThreshTop = 0.67, ThreshDown = 0.33,
  p.cut = 0.05, group1, group2)

Arguments

ArgumentDescription
clinical_patientis a data.frame using function 'clinic' with information related to barcode / samples such as bcr_patient_barcode, days_to_death , days_to_last_follow_up , vital_status, etc
dataGEis a matrix of Gene expression (genes in rows, samples in cols) from TCGAprepare
Genelistis a list of gene symbols where perform survival KM.
Survresultis a parameter (default = FALSE) if is TRUE will show KM plot and results.
ThreshTopis a quantile threshold to identify samples with high expression of a gene
ThreshDownis a quantile threshold to identify samples with low expression of a gene
p.cutp.values threshold. Default: 0.05
group1a string containing the barcode list of the samples in in control group
group2a string containing the barcode list of the samples in in disease group

Value

table with survival genes pvalues from KM.

Examples

# Selecting only 20 genes for example
dataBRCAcomplete <- log2(dataBRCA[1:20,] + 1)

# clinical_patient_Cancer <- GDCquery_clinic("TCGA-BRCA","clinical")
clinical_patient_Cancer <- data.frame(
bcr_patient_barcode = substr(colnames(dataBRCAcomplete),1,12),
vital_status = c(rep("alive",3),"dead",rep("alive",2),rep(c("dead","alive"),2)),
days_to_death = c(NA,NA,NA,172,NA,NA,3472,NA,786,NA),
days_to_last_follow_up = c(3011,965,718,NA,1914,423,NA,5,656,1417)
)

group1 <- TCGAquery_SampleTypes(colnames(dataBRCAcomplete), typesample = c("NT"))
group2 <- TCGAquery_SampleTypes(colnames(dataBRCAcomplete), typesample = c("TP"))

tabSurvKM <- TCGAanalyze_SurvivalKM(clinical_patient_Cancer,
dataBRCAcomplete,
Genelist = rownames(dataBRCAcomplete),
Survresult = FALSE,
p.cut = 0.4,
ThreshTop = 0.67,
ThreshDown = 0.33,
group1 = group1, # Control group
group2 = group2) # Disease group

# If the groups are not specified group1 == group2 and all samples are used
tabSurvKM <- TCGAanalyze_SurvivalKM(clinical_patient_Cancer,
dataBRCAcomplete,
Genelist = rownames(dataBRCAcomplete),
Survresult = TRUE,
p.cut = 0.2,
ThreshTop = 0.67,
ThreshDown = 0.33)
Link to this function

TCGAanalyze_analyseGRN()

Generate network

Description

TCGAanalyze_analyseGRN perform gene regulatory network.

Usage

TCGAanalyze_analyseGRN(TFs, normCounts, kNum)

Arguments

ArgumentDescription
TFsa vector of genes.
normCountsis a matrix of gene expression with genes in rows and samples in columns.
kNumthe number of nearest neighbors to consider to estimate the mutual information. Must be less than the number of columns of normCounts.

Value

an adjacent matrix

Link to this function

TCGAanalyze_networkInference()

infer gene regulatory networks

Description

TCGAanalyze_networkInference taking expression data as input, this will return an adjacency matrix of interactions

Usage

TCGAanalyze_networkInference(data, optionMethod = "clr")

Arguments

ArgumentDescription
dataexpression data, genes in columns, samples in rows
optionMethodinference method, chose from aracne, c3net, clr and mrnet

Value

an adjacent matrix

Link to this function

TCGAanalyze_survival()

Creates survival analysis

Description

Creates a survival plot from TCGA patient clinical data using survival library. It uses the fields days_to_death and vital, plus a columns for groups.

Usage

TCGAanalyze_survival(data, clusterCol = NULL, legend = "Legend",
  labels = NULL, risk.table = TRUE, xlim = NULL,
  main = "Kaplan-Meier Overall Survival Curves",
  ylab = "Probability of survival",
  xlab = "Time since diagnosis (days)", filename = "survival.pdf",
  color = NULL, height = 8, width = 12, dpi = 300, pvalue = TRUE,
  conf.int = TRUE, ...)

Arguments

ArgumentDescription
dataTCGA Clinical patient with the information days_to_death
clusterColColumn with groups to plot. This is a mandatory field, the caption will be based in this column
legendLegend title of the figure
labelslabels of the plot
risk.tableshow or not the risk table
xlimx axis limits e.g. xlim = c(0, 1000). Present narrower X axis, but not affect survival estimates.
mainmain title of the plot
ylaby axis text of the plot
xlabx axis text of the plot
filenameThe name of the pdf file.
colorDefine the colors/Pallete for lines.
heightImage height
widthImage width
dpiFigure quality
pvalueshow p-value of log-rank test
conf.intshow confidence intervals for point estimaes of survival curves.
...Further arguments passed to ggsurvplot .

Value

Survival plot

Examples

# clin <- GDCquery_clinic("TCGA-BRCA","clinical")
clin <- data.frame(
vital_status = c("alive","alive","alive","dead","alive",
"alive","dead","alive","dead","alive"),
days_to_death = c(NA,NA,NA,172,NA,NA,3472,NA,786,NA),
days_to_last_follow_up = c(3011,965,718,NA,1914,423,NA,5,656,1417),
gender = c(rep("male",5),rep("female",5))
)
TCGAanalyze_survival(clin, clusterCol="gender")
TCGAanalyze_survival(clin, clusterCol="gender", xlim = 1000)
TCGAanalyze_survival(clin,
clusterCol="gender",
risk.table = FALSE,
conf.int = FALSE,
color = c("pink","blue"))
TCGAanalyze_survival(clin,
clusterCol="gender",
risk.table = FALSE,
xlim = c(100,1000),
conf.int = FALSE,
color = c("Dark2"))
Link to this function

TCGAbatch_Correction()

Batch correction using ComBat and Voom transformation using limma package.

Description

TCGAbatch_correction allows user to perform a Voom correction on gene expression data and have it ready for DEA. One can also use ComBat for batch correction for exploratory analysis. If batch.factor or adjustment argument is "Year" please provide clinical data. If no batch factor is provided, the data will be voom corrected only

TCGAanalyze_DEA performs DEA using following functions from sva and limma:

  • limma::voom Transform RNA-Seq Data Ready for Linear Modelling.

  • sva::ComBat Adjust for batch effects using an empirical Bayes framework.

Usage

TCGAbatch_Correction(tabDF, batch.factor = NULL, adjustment = NULL,
  ClinicalDF = data.frame(), UnpublishedData = FALSE,
  AnnotationDF = data.frame())

Arguments

ArgumentDescription
tabDFnumeric matrix, each row represents a gene, each column represents a sample
batch.factora string containing the batch factor to use for correction. Options are "Plate", "TSS", "Year", "Portion", "Center"
adjustmentvector containing strings for factors to adjust for using ComBat. Options are "Plate", "TSS", "Year", "Portion", "Center"
ClinicalDFa dataframe returned by GDCquery_clinic() to be used to extract year data
UnpublishedDataif TRUE perform a batch correction after adding new data
AnnotationDFa dataframe with column Batch indicating different batches of the samples in the tabDF

Value

data frame with ComBat batch correction applied

The aim of TCGAbiolinks is : i) facilitate the TCGA open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) allow the user to download a specific version of the data and thus to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Description

The functions you're likely to need from TCGAbiolinks is GDCdownload , GDCquery . Otherwise refer to the vignettes to see how to format the documentation.

Link to this function

TCGAprepare_Affy()

Prepare CEL files into an AffyBatch.

Description

Prepare CEL files into an AffyBatch.

Usage

TCGAprepare_Affy(ClinData, PathFolder, TabCel)

Arguments

ArgumentDescription
ClinDatawrite
PathFolderwrite
TabCelwrite

Value

Normalizd Expression data from Affy eSets

Examples

to add example
Link to this function

TCGAquery_MatchedCoupledSampleTypes()

Retrieve multiple tissue types from the same patients.

Description

TCGAquery_MatchedCoupledSampleTypes

Usage

TCGAquery_MatchedCoupledSampleTypes(barcode, typesample)

Arguments

ArgumentDescription
barcodebarcode
typesampletypesample

Value

a list of samples / barcode filtered by type sample selected

Examples

TCGAquery_MatchedCoupledSampleTypes(c("TCGA-B0-4698-01Z-00-DX1",
"TCGA-B0-4698-02Z-00-DX1"),
c("TP","TR"))
barcode <- c("TARGET-20-PANSBH-02A-02D","TARGET-20-PANSBH-01A-02D",
"TCGA-B0-4698-01Z-00-DX1","TCGA-CZ-4863-02Z-00-DX1",
"TARGET-20-PANSZZ-02A-02D","TARGET-20-PANSZZ-11A-02D",
"TCGA-B0-4699-01Z-00-DX1","TCGA-B0-4699-02Z-00-DX1"
)
TCGAquery_MatchedCoupledSampleTypes(barcode,c("TR","TP"))
Link to this function

TCGAquery_SampleTypes()

Retrieve multiple tissue types not from the same patients.

Description

TCGAquery_SampleTypes for a given list of samples and types, return the union of samples that are from theses type.

Usage

TCGAquery_SampleTypes(barcode, typesample)

Arguments

ArgumentDescription
barcodeis a list of samples as TCGA barcodes

|typesample | a character vector indicating tissue type to query. Example: list(list("ll"), list(" ", "TP ", list(), " PRIMARY SOLID TUMOR ", list(), " ", "TR ", list(), " RECURRENT SOLID TUMOR ", list(), " ", "TB ", list(), " Primary Blood Derived Cancer-Peripheral Blood ", list(), " ", "TRBM ", list(), " Recurrent Blood Derived Cancer-Bone Marrow ", list(), " ", "TAP ", list(), " Additional-New Primary ", list(), " ", "TM ", list(), " Metastatic ", list(), " ", "TAM ", list(), " Additional Metastatic ", list(), " ", "THOC ", list(), " Human Tumor Original Cells ", |

list(), "

", "TBM ", list(), " Primary Blood Derived Cancer-Bone Marrow ", list(), " ", "NB ", list(), " Blood Derived Normal ", list(), " ", "NT ", list(), " Solid Tissue Normal ", list(), " ", "NBC ", list(), " Buccal Cell Normal ", list(), " ", "NEBV ", list(), " EBV Immortalized Normal ", list(), " ", "NBM ", list(), " Bone Marrow Normal ", list(), " "))

Value

a list of samples / barcode filtered by type sample selected

Examples

# selection of normal samples "NT"
barcode <- c("TCGA-B0-4698-01Z-00-DX1","TCGA-CZ-4863-02Z-00-DX1")
# Returns the second barcode
TCGAquery_SampleTypes(barcode,"TR")
# Returns both barcode
TCGAquery_SampleTypes(barcode,c("TR","TP"))
barcode <- c("TARGET-20-PANSBH-14A-02D","TARGET-20-PANSBH-01A-02D",
"TCGA-B0-4698-01Z-00-DX1","TCGA-CZ-4863-02Z-00-DX1")
TCGAquery_SampleTypes(barcode,c("TR","TP"))
Link to this function

TCGAquery_recount2()

Query gene counts of TCGA and GTEx data from the Recount2 project

Description

TCGArecount2_query queries and downloads data produced by the Recount2 project. User can specify which project and which tissue to query

Usage

TCGAquery_recount2(project, tissue = c())

Arguments

ArgumentDescription
projectis a string denoting which project the user wants. Options are "tcga" and "gtex"
tissuea vector of tissue(s) to download. Options are "adipose tissue", "adrenal", "gland", "bladder","blood", "blood vessel", "bone marrow", "brain", "breast","cervix uteri", "colon", "esophagus", "fallopian tube","heart", "kidney", "liver", "lung", "muscle", "nerve", "ovary","pancreas", "pituitary", "prostate", "salivary", "gland", "skin", "small intestine", "spleen", "stomach", "testis", "thyroid", "uterus", "vagina"

Value

List with $subtypes attribute as a dataframe with barcodes, samples, subtypes, and colors. The $filtered attribute is returned as filtered samples with no subtype info

Examples

brain.rec<-TCGAquery_recount2(project = "gtex", tissue = "brain")
Link to this function

TCGAquery_subtype()

Retrieve molecular subtypes for a given tumor

Description

TCGAquery_subtype Retrieve molecular subtypes for a given tumor

Usage

TCGAquery_subtype(tumor)

Arguments

ArgumentDescription

|tumor | is a cancer Examples: list(list("lllll"), list(" ", "lgg ", list(), " gbm ", list(), " luad ", list(), " stad ", list(), " brca", list(), " ", "coad ", list(), " read ", list(), " ", list(), " ", list(), " "))|

Value

a data.frame with barcode and molecular subtypes

Examples

dataSubt <- TCGAquery_subtype(tumor = "lgg")
Link to this function

TCGAtumor_purity()

Filters TCGA barcodes according to purity parameters

Description

TCGAtumor_purity Filters TCGA samples using 5 estimates from 5 methods as thresholds.

Usage

TCGAtumor_purity(barcodes, estimate, absolute, lump, ihc, cpe)

Arguments

ArgumentDescription
barcodesis a vector of TCGA barcodes
estimateuses gene expression profiles of 141 immune genes and 141 stromal genes
absolutewhich uses somatic copy-number data (estimations were available for only 11 cancer types)
lump(leukocytes unmethylation for purity), which averages 44 non-methylated immune-specific CpG sites
ihcas estimated by image analysis of haematoxylin and eosin stain slides produced by the Nationwide Childrens Hospital Biospecimen Core Resource
cpeCPE is a derived consensus measurement as the median purity level after normalizing levels from all methods to give them equal means and s.ds

Value

List with $pure_barcodes attribute as a vector of pure samples and $filtered attribute as filtered samples with no purity info

Examples

dataTableSubt <- TCGAtumor_purity("TCGA-60-2721-01A-01R-0851-07",
estimate = 0.6,
absolute = 0.6,
ihc = 0.8,
lump = 0.8,
cpe = 0.7)
Link to this function

TCGAvisualize_BarPlot()

Barplot of subtypes and clinical info in groups of gene expression clustered.

Description

Barplot of subtypes and clinical info in groups of gene expression clustered.

Usage

TCGAvisualize_BarPlot(DFfilt, DFclin, DFsubt, data_Hc2, Subtype, cbPalette,
  filename, width, height, dpi)

Arguments

ArgumentDescription
DFfiltwrite
DFclinwrite
DFsubtwrite
data_Hc2write
Subtypewrite
cbPaletteDefine the colors of the bar.
filenameThe name of the pdf file
widthImage width
heightImage height
dpiImage dpi

Value

barplot image in pdf or png file

Link to this function

TCGAvisualize_EAbarplot()

barPlot for a complete Enrichment Analysis

Description

The figure shows canonical pathways significantly overrepresented (enriched) by the DEGs (differentially expressed genes). The most statistically significant canonical pathways identified in DEGs list are listed according to their p value corrected FDR (-Log) (colored bars) and the ratio of list genes found in each pathway over the total number of genes in that pathway (Ratio, red line).

Usage

TCGAvisualize_EAbarplot(tf, GOMFTab, GOBPTab, GOCCTab, PathTab, nBar,
  nRGTab, filename = "TCGAvisualize_EAbarplot_Output.pdf",
  text.size = 1, mfrow = c(2, 2), xlim = NULL, color = c("orange",
  "cyan", "green", "yellow"))

Arguments

ArgumentDescription
tfis a list of gene symbols
GOMFTabis results from TCGAanalyze_EAcomplete related to Molecular Function (MF)
GOBPTabis results from TCGAanalyze_EAcomplete related to Biological Process (BP)
GOCCTabis results from TCGAanalyze_EAcomplete related to Cellular Component (CC)
PathTabis results from TCGAanalyze_EAcomplete related to Pathways EA
nBaris the number of bar histogram selected to show (default = 10)
nRGTabis the gene signature list with gene symbols.
filenameName for the pdf. If null it will return the plot.
text.sizeText size
mfrowVector with number of rows/columns of the plot. Default 2 rows/2 columns "c(2,2)"
xlimUpper limit of the x-axis.
colorA vector of colors for each barplot. Deafult: c("orange", "cyan","green","yellow")

Value

Complete barPlot from Enrichment Analysis showing significant (default FDR < 0.01) BP,CC,MF and pathways enriched by list of genes.

Examples

Genelist <- c("FN1","COL1A1")
ansEA <- TCGAanalyze_EAcomplete(TFname="DEA genes Normal Vs Tumor",Genelist)
TCGAvisualize_EAbarplot(tf = rownames(ansEA$ResBP),
GOBPTab = ansEA$ResBP,
GOCCTab = ansEA$ResCC,
GOMFTab = ansEA$ResMF,
PathTab = ansEA$ResPat,
nRGTab = Genelist,
nBar = 10,
filename="a.pdf")
while (!(is.null(dev.list()["RStudioGD"]))){dev.off()}
Genelist <- rownames(dataDEGsFiltLevel)
system.time(ansEA <- TCGAanalyze_EAcomplete(TFname="DEA genes Normal Vs Tumor",Genelist))
# Enrichment Analysis EA (TCGAVisualize)
# Gene Ontology (GO) and Pathway enrichment barPlot
TCGAvisualize_EAbarplot(tf = rownames(ansEA$ResBP),
GOBPTab = ansEA$ResBP,
GOCCTab = ansEA$ResCC,
GOMFTab = ansEA$ResMF,
PathTab = ansEA$ResPat,
nRGTab = Genelist,
nBar = 10)
Link to this function

TCGAvisualize_Heatmap()

Heatmap with more sensible behavior using heatmap.plus

Description

Heatmap with more sensible behavior using heatmap.plus

Usage

TCGAvisualize_Heatmap(data, col.metadata, row.metadata,
  col.colors = NULL, row.colors = NULL, show_column_names = FALSE,
  show_row_names = FALSE, cluster_rows = FALSE,
  cluster_columns = FALSE, sortCol, extrems = NULL,
  rownames.size = 12, title = NULL, color.levels = NULL,
  values.label = NULL, filename = "heatmap.pdf", width = 10,
  height = 10, type = "expression", scale = "none",
  heatmap.legend.color.bar = "continuous")

Arguments

ArgumentDescription
dataThe object to with the heatmap data (expression, methylation)
col.metadataMetadata for the columns (samples). It should have on of the following columns: barcode (28 characters) column to match with the samples. It will also work with "bcr_patient_barcode"(12 chars),"patient"(12 chars),"sample"(16 chars) columns but as one patient might have more than one sample, this coul lead to errors in the annotation. The code will throw a warning in case two samples are from the same patient.
row.metadataMetadata for the rows genes (expression) or probes (methylation)
col.colorsA list of names colors
row.colorsA list of named colors
show_column_namesShow column names names? Dafault: FALSE
show_row_namesShow row names? Dafault: FALSE
cluster_rowsCluster rows ? Dafault: FALSE
cluster_columnsCluster columns ? Dafault: FALSE
sortColName of the column to be used to sort the columns
extremsExtrems of colors (vector of 3 values)
rownames.sizeRownames size
titleTitle of the plot
color.levelsA vector with the colors (low level, middle level, high level)
values.labelText of the levels in the heatmap
filenameFilename to save the heatmap. Default: heatmap.png
widthfigure width
heightfigure height
typeSelect the colors of the heatmap values. Possible values are "expression" (default), "methylation"
scaleUse z-score to make the heatmap? If we want to show differences between genes, it is good to make Z-score by samples (force each sample to have zero mean and standard deviation=1). If we want to show differences between samples, it is good to make Z-score by genes (force each gene to have zero mean and standard deviation=1). Possibilities: "row", "col". Default "none"
heatmap.legend.color.barHeatmap legends values type. Options: "continuous", "disctrete

Value

Heatmap plotted in the device

Examples

row.mdat <- matrix(c("FALSE","FALSE",
"TRUE","TRUE",
"FALSE","FALSE",
"TRUE","FALSE",
"FALSE","TRUE"
),
nrow = 5, ncol = 2, byrow = TRUE,
dimnames = list(
c("probe1", "probe2","probe3","probe4","probe5"),
c("duplicated", "Enhancer region")))
dat <- matrix(c(0.3,0.2,0.3,1,1,0.1,1,1,0, 0.8,1,0.7,0.7,0.3,1),
nrow = 5, ncol = 3, byrow = TRUE,
dimnames = list(
c("probe1", "probe2","probe3","probe4","probe5"),
c("TCGA-DU-6410",
"TCGA-DU-A5TS",
"TCGA-HT-7688")))

mdat <- data.frame(patient=c("TCGA-DU-6410","TCGA-DU-A5TS","TCGA-HT-7688"),
Sex=c("Male","Female","Male"),
COCCluster=c("coc1","coc1","coc1"),
IDHtype=c("IDHwt","IDHMut-cod","IDHMut-noncod"))

TCGAvisualize_Heatmap(dat,
col.metadata = mdat,
row.metadata = row.mdat,
row.colors = list(duplicated = c("FALSE" = "pink",
"TRUE"="green"),
"Enhancer region" = c("FALSE" = "purple",
"TRUE"="grey")),
col.colors = list(Sex = c("Male" = "blue", "Female"="red"),
COCCluster=c("coc1"="grey"),
IDHtype=c("IDHwt"="cyan",
"IDHMut-cod"="tomato"
,"IDHMut-noncod"="gold")),
type = "methylation",
show_row_names=TRUE)
if (!(is.null(dev.list()["RStudioGD"]))){dev.off()}
Link to this function

TCGAvisualize_PCA()

Principal components analysis (PCA) plot

Description

TCGAvisualize_PCA performs a principal components analysis (PCA) on the given data matrix and returns the results as an object of class prcomp, and shows results in PCA level.

Usage

TCGAvisualize_PCA(dataFilt, dataDEGsFiltLevel, ntopgenes, group1, group2)

Arguments

ArgumentDescription
dataFiltA filtered dataframe or numeric matrix where each row represents a gene, each column represents a sample from function TCGAanalyze_Filtering
dataDEGsFiltLeveltable with DEGs, log Fold Change (FC), false discovery rate (FDR), the gene expression level, etc, from function TCGAanalyze_LevelTab.
ntopgenesnumber of DEGs genes to plot in PCA
group1a string containing the barcode list of the samples in in control group
group2a string containing the barcode list of the samples in in disease group the name of the group

Value

principal components analysis (PCA) plot of PC1 and PC2

Examples

# normalization of genes
dataNorm <- TCGAbiolinks::TCGAanalyze_Normalization(tabDF = dataBRCA, geneInfo = geneInfo,
method = "geneLength")
# quantile filter of genes
dataFilt <- TCGAanalyze_Filtering(tabDF = dataBRCA, method = "quantile", qnt.cut =  0.25)
# Principal Component Analysis plot for ntop selected DEGs
# selection of normal samples "NT"
group1 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("NT"))
# selection of normal samples "TP"
group2 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("TP"))
pca <- TCGAvisualize_PCA(dataFilt,dataDEGsFiltLevel, ntopgenes = 200, group1, group2)
if (!(is.null(dev.list()["RStudioGD"]))){dev.off()}
Link to this function

TCGAvisualize_SurvivalCoxNET()

Survival analysis with univariate Cox regression package (dnet)

Description

TCGAvisualize_SurvivalCoxNET can help an user to identify a group of survival genes that are significant from univariate Kaplan Meier Analysis and also for Cox Regression. It shows in the end a network build with community of genes with similar range of pvalues from Cox regression (same color) and that interaction among those genes is already validated in literatures using the STRING database (version 9.1). TCGAvisualize_SurvivalCoxNET perform survival analysis with univariate Cox regression and package (dnet) using following functions wrapping from these packages:

  • survival::coxph

  • igraph::subgraph.edges

  • igraph::layout.fruchterman.reingold

  • igraph::spinglass.community

  • igraph::communities

  • dnet::dRDataLoader

  • dnet::dNetInduce

  • dnet::dNetPipeline

  • dnet::visNet

  • dnet::dCommSignif

Usage

TCGAvisualize_SurvivalCoxNET(clinical_patient, dataGE, Genelist,
  org.Hs.string, scoreConfidence = 700,
  titlePlot = "TCGAvisualize_SurvivalCoxNET Example")

Arguments

ArgumentDescription
clinical_patientis a data.frame using function 'clinic' with information related to barcode / samples such as bcr_patient_barcode, days_to_death , days_to_last_followup , vital_status, etc
dataGEis a matrix of Gene expression (genes in rows, samples in cols) from TCGAprepare
Genelistis a list of gene symbols where perform survival KM.
org.Hs.stringan igraph object that contains a functional protein association network in human. The network is extracted from the STRING database (version 10).
scoreConfidencerestrict to those edges with high confidence (eg. score>=700)
titlePlotis the title to show in the final plot.

Details

TCGAvisualize_SurvivalCoxNET allow user to perform the complete workflow using coxph and dnet package related to survival analysis with an identification of gene-active networks from high-throughput omics data using gene expression and clinical data.

  • Cox regression survival analysis to obtain hazard ratio (HR) and pvaules

  • fit a Cox proportional hazards model and ANOVA (Chisq test)

  • Network comunites

  • An igraph object that contains a functional protein association network in human. The network is extracted from the STRING database (version 9.1). Only those associations with medium confidence (score>=400) are retained.

  • restrict to those edges with high confidence (score>=700)

  • extract network that only contains genes in pvals

  • Identification of gene-active network

  • visualisation of the gene-active network itself

  • the layout of the network visualisation (fixed in different visuals)

  • color nodes according to communities (identified via a spin-glass model and simulated annealing)

  • node sizes according to degrees

  • highlight different communities

  • visualise the subnetwork

Value

net IGRAPH with related Cox survival genes in community (same pval and color) and with interactions from STRING database.

Link to this function

TCGAvisualize_meanMethylation()

Mean methylation boxplot

Description

Creates a mean methylation boxplot for groups (groupCol), subgroups will be highlited as shapes if the subgroupCol was set.

Observation: Data is a summarizedExperiment.

Usage

TCGAvisualize_meanMethylation(data, groupCol = NULL,
  subgroupCol = NULL, shapes = NULL, print.pvalue = FALSE,
  plot.jitter = TRUE, jitter.size = 3, filename = "groupMeanMet.pdf",
  ylab = expression(paste("Mean DNA methylation (", beta, "-values)")),
  xlab = NULL, title = "Mean DNA methylation", labels = NULL,
  group.legend = NULL, subgroup.legend = NULL, color = NULL,
  y.limits = NULL, sort, order, legend.position = "top",
  legend.title.position = "top", legend.ncols = 3,
  add.axis.x.text = TRUE, width = 10, height = 10, dpi = 600,
  axis.text.x.angle = 90)

Arguments

ArgumentDescription
dataSummarizedExperiment object obtained from TCGAPrepare
groupColColumns in colData(data) that defines the groups. If no columns defined a columns called "Patients" will be used
subgroupColColumns in colData(data) that defines the subgroups.
shapesShape vector of the subgroups. It must have the size of the levels of the subgroups. Example: shapes = c(21,23) if for two levels
print.pvaluePrint p-value for two groups
plot.jitterPlot jitter? Default TRUE
jitter.sizePlot jitter size? Default 3
filenameThe name of the pdf that will be saved
ylaby axis text in the plot
xlabx axis text in the plot
titlemain title in the plot
labelsLabels of the groups
group.legendName of the group legend. DEFAULT: groupCol
subgroup.legendName of the subgroup legend. DEFAULT: subgroupCol
colorvector of colors to be used in graph
y.limitsChange lower/upper y-axis limit
sortSort boxplot by mean or median. Possible values: mean.asc, mean.desc, median.asc, meadian.desc
orderOrder of the boxplots
legend.positionLegend position ("top", "right","left","bottom")
legend.title.positionLegend title position ("top", "right","left","bottom")
legend.ncolsNumber of columns of the legend
add.axis.x.textAdd text to x-axis? Default: FALSE
widthPlot width default:10
heightPlot height default:10
dpiPdf dpi default:600
axis.text.x.angleAngle of text in the x axis

Value

Save the pdf survival plot

Examples

nrows <- 200; ncols <- 21
counts <- matrix(runif(nrows * ncols, 0, 1), nrows)
rowRanges <- GenomicRanges::GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges::IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- S4Vectors::DataFrame(Treatment=rep(c("ChIP", "Input","Other"), 7),
row.names=LETTERS[1:21],
group=rep(c("group1","group2","group3"),c(7,7,7)),
subgroup=rep(c("subgroup1","subgroup2","subgroup3"),7))
data <- SummarizedExperiment::SummarizedExperiment(
assays=S4Vectors::SimpleList(counts=counts),
rowRanges=rowRanges,
colData=colData)
TCGAvisualize_meanMethylation(data,groupCol  = "group")
# change lower/upper y-axis limit
TCGAvisualize_meanMethylation(data,groupCol  = "group", y.limits = c(0,1))
# change lower y-axis limit
TCGAvisualize_meanMethylation(data,groupCol  = "group", y.limits = 0)
TCGAvisualize_meanMethylation(data,groupCol  = "group", subgroupCol="subgroup")
TCGAvisualize_meanMethylation(data,groupCol  = "group")
TCGAvisualize_meanMethylation(data,groupCol  = "group",sort="mean.desc",filename="meandesc.pdf")
TCGAvisualize_meanMethylation(data,groupCol  = "group",sort="mean.asc",filename="meanasc.pdf")
TCGAvisualize_meanMethylation(data,groupCol  = "group",sort="median.asc",filename="medianasc.pdf")
TCGAvisualize_meanMethylation(data,groupCol  = "group",sort="median.desc",filename="mediandesc.pdf")
if (!(is.null(dev.list()["RStudioGD"]))){dev.off()}
Link to this function

TCGAvisualize_oncoprint()

Creating a oncoprint

Description

Creating a oncoprint

Usage

TCGAvisualize_oncoprint(mut, genes, filename, color,
  annotation.position = "bottom", annotation, height, width = 10,
  rm.empty.columns = FALSE, show.column.names = FALSE,
  show.row.barplot = TRUE, label.title = "Mutation",
  column.names.size = 8, label.font.size = 16, rows.font.size = 16,
  dist.col = 0.5, dist.row = 0.5, information = "Variant_Type",
  row.order = TRUE, col.order = TRUE, heatmap.legend.side = "bottom",
  annotation.legend.side = "bottom")

Arguments

ArgumentDescription
mutA dataframe from the mutation annotation file (see TCGAquery_maf from TCGAbiolinks)
genesGene list
filenamename of the pdf
colornamed vector for the plot
annotation.positionPosition of the annotation "bottom" or "top"
annotationMatrix or data frame with the annotation. Should have a column bcr_patient_barcode with the same ID of the mutation object
heightpdf height
widthpdf width
rm.empty.columnsIf there is no alteration in that sample, whether remove it on the oncoprint
show.column.namesShow column names? Default: FALSE
show.row.barplotShow barplot annotation on rows?
label.titleTitle of the label
column.names.sizeSize of the fonts of the columns names
label.font.sizeSize of the fonts
rows.font.sizeSize of the fonts
dist.coldistance between columns in the plot
dist.rowdistance between rows in the plot
informationWhich column to use as informastion from MAF. Options: 1) "Variant_Classification" (The information will be "Frame_Shift_Del", "Frame_Shift_Ins", "In_Frame_Del", "In_Frame_Ins", "Missense_Mutation", "Nonsense_Mutation", "Nonstop_Mutation", "RNA", "Silent" , "Splice_Site", "Targeted_Region", "Translation_Start_Site") 2) "Variant_Type" (The information will be INS,DEL,SNP)
row.orderOrder the genes (rows) Default:TRUE. Genes with more mutations will be in the first rows
col.orderOrder columns. Default:TRUE.
heatmap.legend.sidePosition of the heatmap legend
annotation.legend.sidePosition of the annotation legend

Value

A oncoprint plot

Examples

mut <- GDCquery_Maf(tumor = "ACC", pipelines = "mutect")
TCGAvisualize_oncoprint(mut = mut, genes = mut$Hugo_Symbol[1:10], rm.empty.columns = TRUE)
TCGAvisualize_oncoprint(mut = mut, genes = mut$Hugo_Symbol[1:10],
filename = "onco.pdf",
color=c("background"="#CCCCCC","DEL"="purple","INS"="yellow","SNP"="brown"))
clin <- GDCquery_clinic("TCGA-ACC","clinical")
clin <- clin[,c("bcr_patient_barcode","disease","gender","tumor_stage","race","vital_status")]
TCGAvisualize_oncoprint(mut = mut, genes = mut$Hugo_Symbol[1:20],
filename = "onco.pdf",
annotation = clin,
color=c("background"="#CCCCCC","DEL"="purple","INS"="yellow","SNP"="brown"),
rows.font.size=10,
heatmap.legend.side = "right",
dist.col = 0,
label.font.size = 10)
Link to this function

TCGAvisualize_starburst()

Create starburst plot

Description

Create Starburst plot for comparison of DNA methylation and gene expression. The log10 (FDR-corrected P value) is plotted for beta value for DNA methylation (x axis) and gene expression (y axis) for each gene.

The black dashed line shows the FDR-adjusted P value of 0.01.

You can set names to TRUE to get the names of the significant genes.

Candidate biologically significant genes will be circled in the plot.

Candidate biologically significant are the genes that respect the expression (logFC.cut), DNA methylation (diffmean.cut) and significance thresholds (exp.p.cut, met.p.cut)

Usage

TCGAvisualize_starburst(met, exp, group1 = NULL, group2 = NULL,
  exp.p.cut = 0.01, met.p.cut = 0.01, diffmean.cut = 0,
  logFC.cut = 0, met.platform, genome, names = FALSE,
  names.fill = TRUE, filename = "starburst.pdf", return.plot = FALSE,
  ylab = expression(atop("Gene Expression", paste(Log[10],
  " (FDR corrected P values)"))),
  xlab = expression(atop("DNA Methylation", paste(Log[10],
  " (FDR corrected P values)"))), title = "Starburst Plot",
  legend = "DNA Methylation/Expression Relation", color = NULL,
  label = c("Not Significant", "Up regulated & Hypo methylated",
  "Down regulated & Hypo methylated", "hypo methylated",
  "hyper methylated", "Up regulated", "Down regulated",
  "Up regulated & Hyper methylated", "Down regulated & Hyper methylated"),
  xlim = NULL, ylim = NULL, height = 10, width = 20, dpi = 600)

Arguments

ArgumentDescription
metA SummarizedExperiment with methylation data obtained from the TCGAPrepare or Data frame from DMR_results file. Expected colData columns: diffmean, p.value.adj and p.value Execute volcanoPlot function in order to obtain these values for the object.
expObject obtained by DEArnaSEQ function
group1The name of the group 1 Obs: Column p.value.adj.group1.group2 should exist
group2The name of the group 2. Obs: Column p.value.adj.group1.group2 should exist
exp.p.cutexpression p value cut-off
met.p.cutmethylation p value cut-off
diffmean.cutIf set, the probes with diffmean higher than methylation cut-off will be highlighted in the plot. And the data frame return will be subseted.
logFC.cutIf set, the probes with expression fold change higher than methylation cut-off will be highlighted in the plot. And the data frame return will be subseted.
met.platformDNA methylation platform ("27K","450K" or "EPIC")
genomeGenome of reference ("hg38" or "hg19") used to identify nearest probes TSS
namesAdd the names of the significant genes? Default: FALSE
names.fillNames should be filled in a color box? Default: TRUE
filenameThe filename of the file (it can be pdf, svg, png, etc)
return.plotIf true only plot object will be returned (pdf will not be created)
ylaby axis text
xlabx axis text
titlemain title
legendlegend title
colorvector of colors to be used in graph
labelvector of labels to be used in graph
xlimx limits to cut image
ylimy limits to cut image
heightFigure height
widthFigure width
dpiFigure dpi

Details

Input: data with gene expression/methylation expression Output: starburst plot

Value

Save a starburst plot

Examples

library(SummarizedExperiment)
met <- TCGAbiolinks:::getMetPlatInfo(genome = "hg38",platform = "27K")
values(met) <- NULL
met$probeID <- names(met)
nrows <- length(met); ncols <- 20
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colData <- S4Vectors::DataFrame(Treatment=rep(c("ChIP", "Input"), 5),
row.names=LETTERS[1:20],
group=rep(c("group1","group2"),c(10,10)))
met <- SummarizedExperiment::SummarizedExperiment(
assays=S4Vectors::SimpleList(counts=counts),
rowRanges=met,
colData=colData)
rowRanges(met)$diffmean.g1.g2 <- c(runif(nrows, -0.1, 0.1))
rowRanges(met)$diffmean.g2.g1 <- -1*(rowRanges(met)$diffmean.g1.g2)
rowRanges(met)$p.value.g1.g2 <- c(runif(nrows, 0, 1))
rowRanges(met)$p.value.adj.g1.g2 <- c(runif(nrows, 0, 1))
exp <- TCGAbiolinks:::get.GRCh.bioMart("hg38")
exp$logFC <- runif(nrow(exp), -5, 5)
exp$FDR <- runif(nrow(exp), 0.01, 1)

result <- TCGAvisualize_starburst(met,
exp,
exp.p.cut = 0.05,
met.p.cut = 0.05,
logFC.cut = 2,
group1 = "g1",
group2 = "g2",
genome = "hg38",
met.platform = "27K",
diffmean.cut = 0.0,
names  = TRUE)
# It can also receive a data frame as input
result <- TCGAvisualize_starburst(SummarizedExperiment::values(met),
exp,
exp.p.cut = 0.05,
met.p.cut = 0.05,
logFC.cut = 2,
group1 = "g1",
group2 = "g2",
genome = "hg38",
met.platform = "27K",
diffmean.cut = 0.0,
names  = TRUE)
Link to this function

TabSubtypesCol_merged()

TCGA samples with their Pam50 subtypes

Description

A dataset containing the Sample Ids from TCGA and PAM50 subtyping attributes of 4768 tumor patients

Format

A data frame with 4768 rows and 3 variables: list(" ", " ", list(list("samples"), list("Sample ID from TCGA barcodes, character string")), " ", " ", list(list("subtype"), list("Pam50 classification, character string")), " ", " ", list(list("color"), list("color, character string")), " ", " ... ")

Usage

TabSubtypesCol_merged

TCGA samples with their Tumor Purity measures

Description

A dataset containing the Sample Ids from TCGA tumor purity measured according to 4 estimates attributes of 9364 tumor patients

Format

A data frame with 9364 rows and 7 variables: list(" ", " ", list(list("Sample.ID"), list("Sample ID from TCGA barcodes, character string")), " ", " ", list(list("Cancer.type"), list("Cancer type, character string")), " ", " ", list(list("ESTIMATE"), list("uses gene expression profiles of 141 immune genes and 141 stromal genes, 0-1 value")), " ", " ", list(list("ABSOLUTE"), list("uses somatic copy-number data (estimations were available for only 11 cancer types), 0-1 value")), " ", " ", list(list("LUMP"), list("(leukocytes unmethylation for purity), which averages 44 non-methylated immune-specific CpG sites, ",

"0-1value")), "

", " ", list(list("IHC"), list("as estimated by image analysis of haematoxylin and eosin stain slides produced by the Nationwide ", "Childrens Hospital Biospecimen Core Resource, 0-1 value")), " ", " ", list(list("CPE"), list("derived consensus measurement as the median purity level after normalizing levels from all methods to ", "give them equal means and s.ds, 0-1 value")), " ", " ... ")

Usage

Tumor.purity
Link to this function

UseRaw_afterFilter()

Use raw count from the DataPrep object which genes are removed by normalization and filtering steps.

Description

function to keep raw counts after filtering and/or normalizing.

Usage

UseRaw_afterFilter(DataPrep, DataFilt)

Arguments

ArgumentDescription
DataPrepDataPrep object returned by TCGAanalyze_Preprocessing()
DataFiltFiltered data frame containing samples in columns and genes in rows after normalization and/or filtering steps

Value

Filtered return object similar to DataPrep with genes removed after normalization and filtering process.

Examples

dataPrep_raw <- UseRaw_afterFilter(dataPrep, dataFilt)

TCGA batch information from Biospecimen Metadata Browser

Description

TCGA batch information from Biospecimen Metadata Browser

Format

A data frame with 11382 rows and 3 variables

Link to this function

bcgscca_CHOLIlluminaHiSeq_DNASeq1somaticmaf()

TCGA CHOL MAF

Description

TCGA CHOL MAF

Format

A tibble: 3,555 x 34

Link to this function

calculatepvalues()

Calculate pvalues

Description

Calculate pvalues using wilcoxon test

Usage

calculate.pvalues(data, groupCol = NULL, group1 = NULL,
  group2 = NULL, paired = FALSE, method = "BH", exact = TRUE,
  cores = 1, save = FALSE)

Arguments

ArgumentDescription
dataSummarizedExperiment obtained from the TCGAPrepare
groupColColumns with the groups inside the SummarizedExperiment object. (This will be obtained by the function colData(data))
group1In case our object has more than 2 groups, you should set the groups
group2In case our object has more than 2 groups, you should set the groups
pairedDo a paired wilcoxon test? Default: True
methodP-value adjustment method. Default:"BH" Benjamini-Hochberg
exactDo a exact wilcoxon test? Default: True
coresNumber of cores to be used
saveSave histogram of pvalues

Details

Verify if the data is significant between two groups. For the methylation we search for probes that have a difference in the mean methylation and also a significant value. Input: A SummarizedExperiment object that will be used to compared two groups with wilcoxon test, a boolean value to do a paired or non-paired test Output: p-values (non-adj/adj) histograms, p-values (non-adj/adj)

Value

Data frame with cols p values/p values adjusted

Data frame with two cols p-values/p-values adjusted

Examples

nrows <- 200; ncols <- 20
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GenomicRanges::GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges::IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- S4Vectors::DataFrame(Treatment=rep(c("ChIP", "Input"), 10),
row.names=LETTERS[1:20],
group=rep(c("group1","group2"),c(10,10)))
data <- SummarizedExperiment::SummarizedExperiment(
assays=S4Vectors::SimpleList(counts=counts),
rowRanges=rowRanges,
colData=colData)
data <- calculate.pvalues(data,"group")

TCGA CHOL MAF transformed to maftools obejct

Description

TCGA CHOL MAF transformed to maftools obejct

Format

An object of class MAF

Clinical data TCGA BRCA

Description

Clinical data TCGA BRCA

Format

A data frame with 1061 rows and 109 variables

Link to this function

clinicalbiotab()

A list of data frames with clinical data parsed from XML (code in vignettes)

Description

A list of data frames with clinical data parsed from XML (code in vignettes)

Format

A list with 7 elements

Link to this function

colDataPrepare()

Create samples information matrix for GDC samples

Description

Create samples information matrix for GDC samples add subtype information

Usage

colDataPrepare(barcode)

Arguments

ArgumentDescription
barcodeTCGA or TARGET barcode

Examples

query.met <- GDCquery(project = c("TCGA-GBM","TCGA-LGG"),
legacy = TRUE,
data.category = "DNA methylation",
platform = c("Illumina Human Methylation 450",
"Illumina Human Methylation 27"))
colDataPrepare(getResults(query.met)$cases)

TCGA data matrix BRCA

Description

TCGA data matrix BRCA

Format

A data frame with 20531 rows (genes) and 50 variables (samples)

Link to this function

dataDEGsFiltLevel()

TCGA data matrix BRCA DEGs

Description

TCGA data matrix BRCA DEGs

Format

A data frame with 3649 rows and 6 variables

TCGA data SummarizedExperiment READ

Description

TCGA data SummarizedExperiment READ

Format

A SummarizedExperiment of READ with 2 samples

TCGA data matrix READ

Description

TCGA data matrix READ

Format

A data frame with 20531 rows (genes) and 2 variables (samples)

Calculate diffmean methylation between two groups

Description

Calculate diffmean methylation of probes between two groups removing lines that has NA values.

Usage

diffmean(data, groupCol = NULL, group1 = NULL, group2 = NULL,
  save = FALSE)

Arguments

ArgumentDescription
dataSummarizedExperiment object obtained from TCGAPrepare
groupColColumns in colData(data) that defines the groups.
group1Name of group1 to be used in the analysis
group2Name of group2 to be used in the analysis
saveSave histogram of diffmean

Value

Saves in the rowRages(data) the columns: mean.group1, mean.group2 diffmean.group1.group2; Where group1 and group2 are the names of the groups.

Examples

nrows <- 200; ncols <- 20
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GenomicRanges::GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges::IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- S4Vectors::DataFrame(Treatment=rep(c("ChIP", "Input"), 10),
row.names=LETTERS[1:20],
group=rep(c("group1","group2"),c(10,10)))
data <- SummarizedExperiment::SummarizedExperiment(
assays=S4Vectors::SimpleList(counts=counts),
rowRanges=rowRanges,
colData=colData)
diff.mean <- TCGAbiolinks:::diffmean(data,groupCol = "group")

Creates a plot for GAIA ouptut (all significant aberrant regions.)

Description

This function is a auxiliary function to visualize GAIA ouptut (all significant aberrant regions.)

Usage

gaiaCNVplot(calls, threshold = 0.01)

Arguments

ArgumentDescription
callsA matrix with the following columns: Chromossome, Aberration Kind Region Start, Region End, Region Size and score
thresholdScore threshold (orange horizontal line in the plot)

Value

A plot with all significant aberrant regions.

Examples

call <- data.frame("Chromossome" = rep(9,100),
"Aberration Kind" = rep(c(-2,-1,0,1,2),20),
"Region Start [bp]" = 18259823:18259922,
"Region End [bp]" = 18259823:18259922,
"score" = rep(c(1,2,3,4),25))
gaiaCNVplot(call,threshold = 0.01)
call <- data.frame("Chromossome" = rep(c(1,9),50),
"Aberration Kind" = rep(c(-2,-1,0,1,2),20),
"Region Start [bp]" = 18259823:18259922,
"Region End [bp]" = 18259823:18259922,
"score" = rep(c(1,2,3,4),25))
gaiaCNVplot(call,threshold = 0.01)
Link to this function

gbmexpharmonized()

A RangedSummarizedExperiment two samples with gene expression data from vignette aligned against hg38

Description

A RangedSummarizedExperiment two samples with gene expression data from vignette aligned against hg38

Format

A RangedSummarizedExperiment: 56963 genes, 2 samples

A RangedSummarizedExperiment two samples with gene expression data from vignette aligned against hg19

Description

A RangedSummarizedExperiment two samples with gene expression data from vignette aligned against hg19

Format

A RangedSummarizedExperiment: 21022 genes, 2 samples

geneInfo for normalization of RNAseq data

Description

geneInfo for normalization of RNAseq data

Format

A data frame with 20531 rows and 2 variables

geneInfoHT for normalization of HTseq data

Description

geneInfoHT for normalization of HTseq data

Format

A data frame with 23486 rows and 2 variables

Link to this function

getAdjacencyBiogrid()

Get a matrix of interactions of genes from biogrid

Description

Using biogrid database, it will create a matrix of gene interations. If columns A and row B has value 1, it means the gene A and gene B interatcs.

Usage

getAdjacencyBiogrid(tmp.biogrid, names.genes = NULL)

Arguments

ArgumentDescription
tmp.biogridBiogrid table
names.genesList of genes to filter from output. Default: consider all genes

Value

A matrix with 1 for genes that interacts, 0 for no interaction.

Examples

names.genes.de <- c("PLCB1","MCL1","PRDX4","TTF2","TACC3", "PARP4","LSM1")
tmp.biogrid <- data.frame("Official.Symbol.Interactor.A" = names.genes.de,
"Official.Symbol.Interactor.B" = rev(names.genes.de))
net.biogrid.de <- getAdjacencyBiogrid(tmp.biogrid, names.genes.de)
file <- paste0("http://thebiogrid.org/downloads/archives/",
"Release%20Archive/BIOGRID-3.4.133/BIOGRID-ALL-3.4.133.tab2.zip")
downloader::download(file,basename(file))
unzip(basename(file),junkpaths =TRUE)
tmp.biogrid <- read.csv(gsub("zip","txt",basename(file)),
header=TRUE, sep="  ", stringsAsFactors=FALSE)
names.genes.de <- c("PLCB1","MCL1","PRDX4","TTF2","TACC3", "PARP4","LSM1")
net.biogrid.de <- getAdjacencyBiogrid(tmp.biogrid, names.genes.de)
Link to this function

getDataCategorySummary()

Create a Summary table for each sample in a project saying if it contains or not files for a certain data category

Description

Create a Summary table for each sample in a project saying if it contains or not files for a certain data category

Usage

getDataCategorySummary(project, legacy = FALSE)

Arguments

ArgumentDescription
projectA GDC project
legacyAccess legacy (hg19) or harmonized database (hg38).

Value

A data frame

Examples

summary <- getDataCategorySummary("TCGA-ACC", legacy = TRUE)

Check GDC server status

Description

Check GDC server status using the api https://api.gdc.cancer.gov/status

Usage

getGDCInfo()

Value

Return true all status

Examples

info <- getGDCInfo()
Link to this function

getGDCprojects()

Retrieve all GDC projects

Description

getGDCprojects uses the following api to get projects https://api.gdc.cancer.gov/projects

Usage

getGDCprojects()

Value

A data frame with last GDC projects

Examples

projects <- getGDCprojects()
Link to this function

getGRChbioMart()

Get hg19 or hg38 information from biomaRt

Description

Get hg19 or hg38 information from biomaRt

Usage

get.GRCh.bioMart(genome = "hg19", as.granges = FALSE)

Arguments

ArgumentDescription
genomehg38 or hg19
as.grangesOutput as GRanges or data.frame

Download GISTIC data from firehose

Description

Download GISTIC data from firehose from http://gdac.broadinstitute.org/runs/analyses__latest/data/

Usage

getGistic(disease, type = "thresholded")

Arguments

ArgumentDescription
diseaseTCGA disease. Option available in http://gdac.broadinstitute.org/runs/analyses__latest/data/
typeResults type: thresholded or data

Get a Manifest from GDCquery output that can be used with GDC-client

Description

Get a Manifest from GDCquery output that can be used with GDC-client

Usage

getManifest(query, save = F)

Arguments

ArgumentDescription
queryA query for GDCquery function
saveWrite Manifest to a txt file (tab separated)

Examples

query <- GDCquery(project = "TARGET-AML",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts",
barcode = c("TARGET-20-PADZCG-04A-01R","TARGET-20-PARJCR-09A-01R"))
getManifest(query)

Get the results table from query

Description

Get the results table from query, it can select columns with cols argument and return a number of rows using rows argument.

Usage

getResults(query, rows, cols)

Arguments

ArgumentDescription
queryA object from GDCquery
rowsRows identifiers (row numbers)
colsColumns identifiers (col names)

Value

Table with query results

Examples

query <- GDCquery(project = "TCGA-GBM",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts",
barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"))
results <- getResults(query)
Link to this function

getSampleFilesSummary()

Retrieve summary of files per sample in a project

Description

Retrieve the numner of files under each data_category + data_type + experimental_strategy + platform Almost like https://portal.gdc.cancer.gov/exploration

Usage

getSampleFilesSummary(project, legacy = FALSE, files.access = NA)

Arguments

ArgumentDescription
projectA GDC project
legacyAccess legacy database ? Deafult: FALSE
files.accessFilter by file access ("open" or "controlled"). Default: no filter

Value

A data frame with the maf file information

Examples

summary <- getSampleFilesSummary("TCGA-LUAD")
summary <- getSampleFilesSummary(c("TCGA-OV","TCGA_ACC"))

getTSS to fetch GENCODE gene annotation (transcripts level) from Bioconductor package biomaRt If upstream and downstream are specified in TSS list, promoter regions of GENCODE gene will be generated.

Description

getTSS to fetch GENCODE gene annotation (transcripts level) from Bioconductor package biomaRt If upstream and downstream are specified in TSS list, promoter regions of GENCODE gene will be generated.

Usage

getTSS(genome = "hg38", TSS = list(upstream = NULL, downstream = NULL))

Arguments

ArgumentDescription
genomeWhich genome build will be used: hg38 (default) or hg19.
TSSA list. Contains upstream and downstream like TSS=list(upstream, downstream). When upstream and downstream is specified, coordinates of promoter regions with gene annotation will be generated.

Value

GENCODE gene annotation if TSS is not specified. Coordinates of GENCODE gene promoter regions if TSS is specified.

Examples

# get GENCODE gene annotation (transcripts level)
getTSS <- getTSS()
getTSS <- getTSS(genome.build = "hg38", TSS=list(upstream=1000, downstream=1000))

Extract information from TCGA barcodes.

Description

get_IDs allows user to extract metadata from barcodes. The dataframe returned has columnns for 'project', 'tss','participant', 'sample', "portion", "plate", and "center"

Usage

get_IDs(data)

Arguments

ArgumentDescription
datanumeric matrix, each row represents a gene, each column represents a sample

Value

data frame with columns 'project', 'tss','participant', 'sample', "portion", "plate", "center", "condition"

Biplot for Principal Components using ggplot2

Description

Biplot for Principal Components using ggplot2

Usage

ggbiplot(pcobj, choices = 1:2, scale = 1, pc.biplot = TRUE,
  obs.scale = 1 - scale, var.scale = scale, groups = NULL,
  ellipse = FALSE, ellipse.prob = 0.68, labels = NULL,
  labels.size = 3, alpha = 1, var.axes = TRUE, circle = FALSE,
  circle.prob = 0.69, varname.size = 3, varname.adjust = 1.5,
  varname.abbrev = FALSE)

Arguments

ArgumentDescription
pcobjan object returned by prcomp() or princomp()
choiceswhich PCs to plot
scalecovariance biplot (scale = 1), form biplot (scale = 0). When scale = 1, the inner product between the variables approximates the covariance and the distance between the points approximates the Mahalanobis distance.
pc.biplotfor compatibility with biplot.princomp()
obs.scalescale factor to apply to observations
var.scalescale factor to apply to variables
groupsoptional factor variable indicating the groups that the observations belong to. If provided the points will be colored according to groups
ellipsedraw a normal data ellipse for each group?
ellipse.probsize of the ellipse in Normal probability
labelsoptional vector of labels for the observations
labels.sizesize of the text used for the labels
alphaalpha transparency value for the points (0 = transparent, 1 = opaque)
var.axesdraw arrows for the variables?
circledraw a correlation circle? (only applies when prcomp was called with scale = TRUE and when var.scale = 1)
circle.probdefinition of circle.prob
varname.sizesize of the text for variable names
varname.adjustadjustment factor the placement of the variable names, >= 1 means farther from the arrow
varname.abbrevwhether or not to abbreviate the variable names

Value

a ggplot2 plot

Author

Vincent Q. Vu.

Check GDC server status is OK

Description

Check GDC server status using the api https://api.gdc.cancer.gov/status

Usage

isServeOK()

Value

Return true if status is ok

Examples

status <- isServeOK()
Link to this function

matchedMetExp()

Get GDC samples with both DNA methylation (HM450K) and Gene expression data from GDC databse

Description

For a given TCGA project it gets the samples (barcode) with both DNA methylation and Gene expression data from GDC database

Usage

matchedMetExp(project, legacy = FALSE, n = NULL)

Arguments

ArgumentDescription
projectA GDC project
legacyAccess legacy (hg19) or harmonized database (hg38).
nNumber of samples to return. If NULL return all (default)

Value

A vector of barcodes

Examples

# Get ACC samples with both  DNA methylation (HM450K) and gene expression aligned to hg19
samples <- matchedMetExp("TCGA-ACC", legacy = TRUE)

A DNA methylation RangedSummarizedExperiment for 8 samples (only first 20 probes) aligned against hg19

Description

A DNA methylation RangedSummarizedExperiment for 8 samples (only first 20 probes) aligned against hg19

Format

A RangedSummarizedExperiment: 20 probes, 8 samples

MSI data for two samples

Description

MSI data for two samples

Format

A data frame: 2 rows, 4 columns

A data frame with all TCGA molecular subtypes

Description

A data frame with all TCGA molecular subtypes

Format

A data frame with 7,734 lines and 10 columns

Link to this function

tabSurvKMcompleteDEGs()

tabSurvKMcompleteDEGs

Description

tabSurvKMcompleteDEGs

Format

A data frame with 200 rows and 7 variables