bioconductor v3.9.0 Fgsea

The package implements an algorithm for fast gene set enrichment

Link to this section Summary

Functions

Calculates GSEA statistics for a given query gene set

Calculates GSEA statistic valus for all gene sets in selectedStats list.

Collapse list of enriched pathways to independent ones.

Example list of mouse Reactome pathways.

Example vector of gene-level statistics obtained for Th1 polarization.

Runs preranked gene set enrichment analysis.

Runs label-permuring gene set enrichment analysis.

Runs preranked gene set enrichment analysis.

Runs preranked gene set enrichment analysis for preprocessed input data.

Returns a list of pathways from a GMT file.

Calculates the expected error for the standard deviation of the P-value logarithm.

Calculates P-values for preprocessed data.

Plots GSEA enrichment plot.

Plots table of enrichment graphs using ggplot and gridExtra.

Returns a list of Reactome pathways for given Entrez gene IDs

Link to this section Functions

Calculates GSEA statistics for a given query gene set

Description

Takes O(k log k) time, where k is a size of selectedSize.

Usage

calcGseaStat(stats, selectedStats, gseaParam = 1,
  returnAllExtremes = FALSE, returnLeadingEdge = FALSE)

Arguments

ArgumentDescription
statsNamed numeric vector with gene-level statistics sorted in decreasing order (order is not checked).
selectedStatsIndexes of selected genes in the stats array.
gseaParamGSEA weight parameter (0 is unweighted, suggested value is 1).
returnAllExtremesIf TRUE return not only the most extreme point, but all of them. Can be used for enrichment plot
returnLeadingEdgeIf TRUE return also leading edge genes.

Value

Value of GSEA statistic if both returnAllExtremes and returnLeadingEdge are FALSE. Otherwise returns list with the folowing elements:

Examples

data(exampleRanks)
data(examplePathways)
ranks <- sort(exampleRanks, decreasing=TRUE)
es <- calcGseaStat(ranks, na.omit(match(examplePathways[[1]], names(ranks))))
Link to this function

calcGseaStatBatchCpp()

Calculates GSEA statistic valus for all gene sets in selectedStats list.

Description

Takes O(n + mKlogK) time, where n is the number of genes, m is the number of gene sets, and k is the mean gene set size.

Usage

calcGseaStatBatchCpp(stats, selectedGenes, geneRanks)

Arguments

ArgumentDescription
statsNumeric vector of gene-level statistics sorted in decreasing order
selectedGenesList of integer vector with integer gene IDs (from 1 to n)
geneRanksInteger vector of gene ranks

Value

Numeric vector of GSEA statistics of the same length as selectedGenes list

Link to this function

collapsePathways()

Collapse list of enriched pathways to independent ones.

Description

Collapse list of enriched pathways to independent ones.

Usage

collapsePathways(fgseaRes, pathways, stats, pval.threshold = 0.05,
  nperm = 10/pval.threshold, gseaParam = 1)

Arguments

ArgumentDescription
fgseaResTable with results of running fgsea(), should be filtered by p-value, for example by selecting ones with padj < 0.01.
pathwaysList of pathways, should contain all the pathways present in fgseaRes.
statsGene-level statistic values used for ranking, the same as in fgsea().
pval.thresholdTwo pathways are considered dependent when p-value of enrichment of one pathways on background of another is greater then pval.threshold.
npermNumber of permutations to test for independence, should be several times greater than 1/pval.threhold. Default value: 10/pval.threshold.
gseaParamGSEA parameter, same as for fgsea()

Value

Named list with two elments: mainPathways containing IDs of pathways not reducable to each other, and parentPathways with vector describing for all the pathways to which ones they can be reduced. For pathways from mainPathwyas vector parentPathways contains NA values.

Examples

data(examplePathways)
data(exampleRanks)
fgseaRes <- fgsea(examplePathways, exampleRanks, nperm=10000, maxSize=500)
collapsedPathways <- collapsePathways(fgseaRes[order(pval)][padj < 0.01],
examplePathways, exampleRanks)
mainPathways <- fgseaRes[pathway %in% collapsedPathways$mainPathways][
order(-NES), pathway]
Link to this function

examplePathways()

Example list of mouse Reactome pathways.

Description

The list was obtained by selecting all the pathways from reactome.db package that contain mouse genes. The exact script is available as system.file("gen_reactome_pathways.R", package="fgsea")

Example vector of gene-level statistics obtained for Th1 polarization.

Description

The data were obtained by doing differential expression between Naive and Th1-activated states for GEO dataset GSE14308. The exact script is available as system.file("gen_gene_ranks.R", package="fgsea")

Runs preranked gene set enrichment analysis.

Description

The function takes about O(nk^{3/2}) time, where n is number of permutations and k is a maximal size of the pathways. That means that setting maxSize parameter with a value of ~500 is strongly recommended.

Usage

fgsea(pathways, stats, nperm, minSize = 1, maxSize = Inf, nproc = 0,
  gseaParam = 1, BPPARAM = NULL)

Arguments

ArgumentDescription
pathwaysList of gene sets to check.
statsNamed vector of gene-level stats. Names should be the same as in 'pathways'
npermNumber of permutations to do. Minimial possible nominal p-value is about 1/nperm
minSizeMinimal size of a gene set to test. All pathways below the threshold are excluded.
maxSizeMaximal size of a gene set to test. All pathways above the threshold are excluded.
nprocIf not equal to zero sets BPPARAM to use nproc workers (default = 0).
gseaParamGSEA parameter value, all gene-level statis are raised to the power of gseaParam before calculation of GSEA enrichment scores.
BPPARAMParallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting nproc default value bpparam() is used.

Value

A table with GSEA results. Each row corresponds to a tested pathway. The columns are the following:

  • pathway -- name of the pathway as in names(pathway);

  • pval -- an enrichment p-value;

  • padj -- a BH-adjusted p-value;

  • ES -- enrichment score, same as in Broad GSEA implementation;

  • NES -- enrichment score normalized to mean enrichment of random samples of the same size;

  • nMoreExtreme-- a number of times a random gene set had a more extreme enrichment score value; * size -- size of the pathway after removing genes not present innames(stats)`.
    * leadingEdge -- vector with indexes of leading edge genes that drive the enrichment, see http://software.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#_Running_a_Leading . ## Examples r data(examplePathways) data(exampleRanks) fgseaRes <- fgsea(examplePathways, exampleRanks, nperm=10000, maxSize=500) # Testing only one pathway is implemented in a more efficient manner fgseaRes1 <- fgsea(examplePathways[1], exampleRanks, nperm=10000)

Runs label-permuring gene set enrichment analysis.

Description

Runs label-permuring gene set enrichment analysis.

Usage

fgseaLabel(pathways, mat, labels, nperm, minSize = 1, maxSize = Inf,
  nproc = 0, gseaParam = 1, BPPARAM = NULL)

Arguments

ArgumentDescription
pathwaysList of gene sets to check.
matGene expression matrix. Row name should be the same as in 'pathways'
labelsNumeric vector of labels for the correlation score of the same length as the number of columns in mat
npermNumber of permutations to do. Minimial possible nominal p-value is about 1/nperm
minSizeMinimal size of a gene set to test. All pathways below the threshold are excluded.
maxSizeMaximal size of a gene set to test. All pathways above the threshold are excluded.
nprocIf not equal to zero sets BPPARAM to use nproc workers (default = 0).
gseaParamGSEA parameter value, all gene-level statis are raised to the power of gseaParam before calculation of GSEA enrichment scores.
BPPARAMParallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting nproc default value bpparam() is used.

Value

A table with GSEA results. Each row corresponds to a tested pathway. The columns are the following:

  • pathway -- name of the pathway as in names(pathway);

  • pval -- an enrichment p-value;

  • padj -- a BH-adjusted p-value;

  • ES -- enrichment score, same as in Broad GSEA implementation;

  • NES -- enrichment score normalized to mean enrichment of random samples of the same size;

  • nMoreExtreme-- a number of times a random gene set had a more extreme enrichment score value; * size -- size of the pathway after removing genes not present innames(stats). * leadingEdge -- vector with indexes of leading edge genes that drive the enrichment, see [http://software.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#_Running_a_Leading](http://software.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#_Running_a_Leading) . ## Examples ```r list(" ", "library(limma) ", "library(GEOquery) ", "es <- getGEO("GSE19429", AnnotGPL = TRUE)[[1]] ", "exprs(es) <- normalizeBetweenArrays(log2(exprs(es)+1), method="quantile") ", "es <- es[!grepl("///", fData(es)$Gene ID), ] ", "es <- es[fData(es)$Gene ID!= "", ] ", "es <- es[order(apply(exprs(es), 1, mean), decreasing=TRUE), ] ", "es <- es[!duplicated(fData(es)$Gene ID), ] ", "rownames(es) <- fData(es)$Gene ID` ", " ", "pathways <- reactomePathways(rownames(es)) ", "mat <- exprs(es) ", "labels <- as.numeric(as.factor(gsub(" .*", "", es$title))) ", "fgseaRes <- fgseaLabel(pathways, mat, labels, nperm = 1000, minSize = 15, maxSize = 500) ") ```

Link to this function

fgseaMultilevel()

Runs preranked gene set enrichment analysis.

Description

This feature is based on the adaptive multilevel splitting Monte Carlo approach. This allows us to exceed the results of simple sampling and calculate arbitrarily small P-values.

Usage

fgseaMultilevel(pathways, stats, sampleSize = 101, minSize = 1,
  maxSize = Inf, absEps = 0, nproc = 0, BPPARAM = NULL)

Arguments

ArgumentDescription
pathwaysList of gene sets to check.
statsNamed vector of gene-level stats. Names should be the same as in 'pathways'
sampleSizeThe size of a random set of genes which in turn has size = pathwaySize
minSizeMinimal size of a gene set to test. All pathways below the threshold are excluded.
maxSizeMaximal size of a gene set to test. All pathways above the threshold are excluded.
absEpsThis parameter sets the boundary for calculating the p value.
nprocIf not equal to zero sets BPPARAM to use nproc workers (default = 0).
BPPARAMParallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting nproc default value bpparam() is used.

Value

A table with GSEA results. Each row corresponds to a tested pathway. The columns are the following

  • pathway -- name of the pathway as in names(pathway);

  • pval -- an enrichment p-value;

  • padj -- a BH-adjusted p-value;

  • log2err -- the expected error for the standard deviation of the P-value logarithm.

  • ES -- enrichment score, same as in Broad GSEA implementation;

  • NES -- enrichment score normalized to mean enrichment of random samples of the same size;

  • size -- size of the pathway after removing genes not present in names(stats).

  • leadingEdge -- vector with indexes of leading edge genes that drive the enrichment, see http://software.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#_Running_a_Leading .

Examples

data(examplePathways)
data(exampleRanks)
fgseaMultilevelRes <- fgseaMultilevel(examplePathways, exampleRanks, maxSize=500)
Link to this function

fgseaSimpleImpl()

Runs preranked gene set enrichment analysis for preprocessed input data.

Description

Runs preranked gene set enrichment analysis for preprocessed input data.

Usage

fgseaSimpleImpl(pathwayScores, pathwaysSizes, pathwaysFiltered,
  leadingEdges, permPerProc, seeds, toKeepLength, stats, BPPARAM)

Arguments

ArgumentDescription
pathwayScoresVector with enrichment scores for the pathways.
pathwaysSizesVector of path sizes.
pathwaysFilteredFiltered pathways.
leadingEdgesLeading edge genes.
permPerProcParallelization parameter for permutations.
seedsSeed vector
toKeepLengthNumber of pathways that meet the condition for minSize and maxSize.
statsNamed vector of gene-level stats. Names should be the same as in 'pathways'
BPPARAMParallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting nproc default value bpparam() is used.

Value

A table with GSEA results. Each row corresponds to a tested pathway. The columns are the following:

  • pathway -- name of the pathway as in names(pathway);

  • pval -- an enrichment p-value;

  • padj -- a BH-adjusted p-value;

  • ES -- enrichment score, same as in Broad GSEA implementation;

  • NES -- enrichment score normalized to mean enrichment of random samples of the same size;

  • nMoreExtreme-- a number of times a random gene set had a more extreme enrichment score value; * size -- size of the pathway after removing genes not present innames(stats)`.
    * leadingEdge -- vector with indexes of leading edge genes that drive the enrichment, see http://software.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#_Running_a_Leading .

Returns a list of pathways from a GMT file.

Description

Returns a list of pathways from a GMT file.

Usage

gmtPathways(gmt.file)

Arguments

ArgumentDescription
gmt.filePath to a GMT file.

Value

A list of vectors with gene sets.

Examples

pathways <- gmtPathways(system.file(
"extdata", "mouse.reactome.gmt", package="fgsea"))
Link to this function

multilevelError()

Calculates the expected error for the standard deviation of the P-value logarithm.

Description

Calculates the expected error for the standard deviation of the P-value logarithm.

Usage

multilevelError(pval, sampleSize)

Arguments

ArgumentDescription
pvalP-value
sampleSizeequivavlent to sampleSize in fgseaMultilevel

Value

The value of the expected error

Examples

expectedError <- multilevelError(pval=1e-10, sampleSize=1001)
Link to this function

multilevelImpl()

Calculates P-values for preprocessed data.

Description

Calculates P-values for preprocessed data.

Usage

multilevelImpl(multilevelPathwaysList, stats, sampleSize, seed, absEps,
  sign = FALSE, BPPARAM = NULL)

Arguments

ArgumentDescription
multilevelPathwaysListList of pathways for which P-values will be calculated.
statsNamed vector of gene-level stats. Names should be the same as in 'pathways'
sampleSizeThe size of a random set of genes which in turn has size = pathwaySize
seedseed parameter from fgseaMultilevel
absEpsThis parameter sets the boundary for calculating the p value.
signThis option will be used in future implementations.
BPPARAMParallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting nproc default value bpparam() is used.

Value

List of P-values.

Link to this function

plotEnrichment()

Plots GSEA enrichment plot.

Description

Plots GSEA enrichment plot.

Usage

plotEnrichment(pathway, stats, gseaParam = 1, ticksSize = 0.2)

Arguments

ArgumentDescription
pathwayGene set to plot.
statsGene-level statistics.
gseaParamGSEA parameter.
ticksSizewidth of vertical line corresponding to a gene (default: 0.2)

Value

ggplot object with the enrichment plot.

Examples

data(examplePathways)
data(exampleRanks)
plotEnrichment(examplePathways[["5991130_Programmed_Cell_Death"]],
exampleRanks)
Link to this function

plotGseaTable()

Plots table of enrichment graphs using ggplot and gridExtra.

Description

Plots table of enrichment graphs using ggplot and gridExtra.

Usage

plotGseaTable(pathways, stats, fgseaRes, gseaParam = 1,
  colwidths = c(5, 3, 0.8, 1.2, 1.2), render = TRUE)

Arguments

ArgumentDescription
pathwaysPathways to plot table, as in fgsea function.
statsGene-level stats, as in fgsea function.
fgseaResTable with fgsea results.
gseaParamGSEA-like parameter. Adjusts displayed statistic values, values closer to 0 flatten plots. Default = 1, value of 0.5 is a good choice too.
colwidthsVector of five elements corresponding to column width for grid.arrange. If column width is set to zero, the column is not drawn.
renderIf true, the plot is rendered to the current device. Otherwise, the grob is returned. Default is true.

Value

TableGrob object returned by grid.arrange.

Examples

data(examplePathways)
data(exampleRanks)
fgseaRes <- fgsea(examplePathways, exampleRanks, nperm=1000,
minSize=15, maxSize=100)
topPathways <- fgseaRes[head(order(pval), n=15)][order(NES), pathway]
plotGseaTable(examplePathways[topPathways], exampleRanks,
fgseaRes, gseaParam=0.5)
Link to this function

reactomePathways()

Returns a list of Reactome pathways for given Entrez gene IDs

Description

Returns a list of Reactome pathways for given Entrez gene IDs

Usage

reactomePathways(genes)

Arguments

ArgumentDescription
genesEntrez IDs of query genes.

Value

A list of vectors with gene sets.

Examples

data(exampleRanks)
pathways <- reactomePathways(names(exampleRanks))