bioconductor v3.9.0 Category

A collection of tools for performing category (gene set

Link to this section Summary

Functions

Category_defunct()

Defunct Functions in Package Category

ChrBandTree_class()

Class "ChrBandTree"

ChrMapHyperGParams_class()

Class "ChrMapHyperGParams"

ChrMapHyperGResult_class()

Class "ChrMapHyperGResult"

ChrMapLinearMParams_class()

Class "ChrMapLinearMParams"

ChrMapLinearMResult_class()

Class "ChrMapLinearMResult"

DatPkg_class()

Class "DatPkg"

GOHyperGParams_class()

Class "GOHyperGParams"

GSEAGOHyperGParams()

Helper function for constructing a GOHyperGParams objects or KEGGHyperGParams objects from a GeneSetCollection

HyperGParams_class()

Class "HyperGParams"

HyperGResultBase_class()

Class "HyperGResultBase"

HyperGResult_accessors()

Accessors for HyperGResult Objects

HyperGResult_class()

Class "HyperGResult"

KEGGHyperGParams_class()

Class "KEGGHyperGParams" and "PFAMHyperGParams"

LinearMParams_class()

Class "LinearMParams"

LinearMResultBase_class()

Class "LinearMResultBase"

LinearMResult_class()

Class "LinearMResult"

MAPAmat()

Mapping chromosome bands to genes

NewChrBandTree()

Create a new ChrBandTree object

OBOHyperGParams_class()

Class "OBOHyperGParams"

applyByCategory()

Apply a function to a vector of statistics, by category

cateGOryMatrix()

Construct a category membership matrix from a list of gene identifiers and their annotated GO categories.

categoryToEntrezBuilder()

Return a list mapping category ids to Entrez Gene ids

cb_contingency()

Create and Test Contingency Tables of Chromosome Band Annotations

cb_parse_band_Hs()

Parse Homo Sapiens Chromosome Band Annotations

cb_parse_band_Mm()

Parse Mus Musculus Chromosome Band Annotations

cb_test()

Chromosome Band Tree-Based Hypothesis Testing

effectSize()

Extract estimated effect sizes

exampleLevels()

Display a sample node from each level of a ChrBandTree object

findAMstats()

Compute per category summary statistics

getPathNames()

A function to print pathway names given their numeric ID.

gseattperm()

Permutation p-values for GSEA

hyperGTest()

Hypergeometric Test for association of categories and genes

hyperg()

Hypergeometric (gene set enrichment) tests on character vectors.

linearMTest()

A linear model-based test to detect enrichment of unusual genes in categories

local_test_factory()

Local and Global Test Function Factories

makeChrBandGraph()

Create a graph representing chromosome band annotation data

makeEBcontr()

A function to make the contrast vectors needed for EBarrays

makeValidParams()

Non-standard Generic for Checking Validity of Parameter Objects

probes2MAP()

Map probe IDs to MAP regions.

probes2Path()

A function to map probe identifiers to pathways.

tree_visitor()

Tree Visitor Function

ttperm()

A simple function to compute a permutation t-test.

universeBuilder()

Return a vector of gene identifiers with category annotations

Link to this section Functions

Category_defunct()

Defunct Functions in Package Category

Description

The functions or variables listed here are no longer part of the Category package.

Usage

condGeneIdUniverse()
isConditional()
geneGoHyperGeoTest()
geneKeggHyperGeoTest()
cb_parse_band_hsa()
chrBandInciMat()

ChrBandTree_class()

Class "ChrBandTree"

Description

This class represents chromosome band annotation data for a given experiment. The class is responsible for storing the mapping of band to set of gene IDs located within that band as well as for representing the tree structured relationship among the bands.

Note

Not all known chromosome bands will be represented in a given instance. The set of bands that will be present is determined by the available annotation data and the specified gene universe. The annotation source maps genes to their most specific band. Such bands and all bands on the path to the root will be represented in the resulting tree.

Currently there is only support for human and mouse data.

Author

S. Falcon

Examples

library("hgu95av2.db")
set.seed(0xfeee)
univ = NULL ## use all Entrez Gene IDs on the chip (not recommended)
ct = NewChrBandTree("hgu95av2.db", univ)

length(allGeneIds(ct))

exampleLevels(ct)

geneIds(ct, "10p11")
lgeneIds(ct, "10p11")
lgeneIds(ct, c("10p11", "Yq11.22"))

pp = parentOf(ct, c("10p11", "Yq11.22"))
childrenOf(ct, unlist(pp))

treeLevels(ct)

level2nodes(ct, 0)
level2nodes(ct, 0L)
level2nodes(ct, "0")

level2nodes(ct, 1)

ChrMapHyperGParams_class()

Class "ChrMapHyperGParams"

Description

This class encapsulates parameters needed for Hypergeometric testing of over or under representation of chromosome bands among a selected gene list using hyperGTest .

Author

Seth Falcon

Examples

showClass("ChrMapHyperGParams")

ChrMapHyperGResult_class()

Class "ChrMapHyperGResult"

Description

This class represents the results of a Hypergeometric test for over-representation of genes in a selected gene list in the chromosome band annotation. The hyperGTest function returns an instance of ChrMapHyperGResult when given a parameter object of class ChrMapHyperGParams . For details on accessing the results, see HyperGResult-accessors .

Author

Seth Falcon

Examples

showClass("ChrMapHyperGResult")
## For details on accessing the results:
##     help("HyperGResult-accessors")

ChrMapLinearMParams_class()

Class "ChrMapLinearMParams"

Description

This class encapsulates parameters needed for testing systematic variations in some gene-level statistic by chromosome bands using linearMTest .

Author

Deepayan Sarkar

Examples

showClass("ChrMapLinearMParams")

ChrMapLinearMResult_class()

Class "ChrMapLinearMResult"

Description

This class represents the results of a linear model-based test for systematic changes in a per-gene statistic by chromosome band annotation. The linearMTest function returns an instance of ChrMapLinearMResult when given a parameter object of class ChrMapLinearMParams . Most slots can be queried using accessors.

Author

Deepayan Sarkar, Michael Lawrence

Examples

showClass("ChrMapLinearMResult")

DatPkg_class()

Class "DatPkg"

Description

DatPkg is a VIRTUAL class for representing annotation data packages.

AffyDatPkg is a subclass of DatPkg used to represent standard annotation data packages that follow the format of Affymetrix expression array annotation.

YeastDatPkg is a subclass of DatPkg used to represent the annotation data packages for yeast. The yeast chip packages are based on sgd and are internally different from the AffyDatPkg conforming packages.

ArabidopsisDatPkg is a subclass of DatPkg used to represent the annotation packages for Arabidopsis. These packages are internally slightly different from the AffyDatPkg conforming packages.

Org.XX.egDatPkg is a subclass of DatPkg used to represent the org.*.eg.db organism-level Entez Gene based annotation data packages.

OBOCollectionDatPkg is a subclass of DatPkg used to represent the OBO based annotation data packages.

GeneSetCollectionDatPkg is a subclass of DatPkg used to represent annotations in the form of GeneSetCollection objects which are not based on any annotation packages but are instead derived from custom (user supplied) annotations.

These methods have been extended to accommodate uninstalled annotation objects, primarily those available from the AnnotationHub package. See below for an example.

Author

Seth Falcon

Examples

DatPkgFactory("hgu95av2")
DatPkgFactory("org.Sc.sgd")
DatPkgFactory("org.Hs.eg.db")
DatPkgFactory("ag")
library(AnnotationHub)
hub <- AnnotationHub()
## get an OrgDb for Atlantic salmon
query(hub, c("salmo salar","orgdb"))
salmodb <- hub[["AH58003"]]
DatPkgFactory(salmodb)

GOHyperGParams_class()

Class "GOHyperGParams"

Description

A parameter class for representing all parameters needed for running the hyperGTest method with one of the GO ontologies (BP, CC, MF) as the category.

Author

S. Falcon

GSEAGOHyperGParams()

Helper function for constructing a GOHyperGParams objects or KEGGHyperGParams objects from a GeneSetCollection

Description

Helps to create A parameter class for representing all parameters needed for running the hyperGTest method. If it is a GOHyperGParams object, being made, then with one of the GO ontologies (BP, CC, MF) as the category. This function will construct the parameter object from a GeneSetCollection object and if necessary will also try to check to make sure that the object is based on a GO2ALL mapping.

Usage

GSEAGOHyperGParams(name, geneSetCollection, geneIds, universeGeneIds,
  ontology, pvalueCutoff, conditional, testDirection, ...)
  GSEAKEGGHyperGParams(name, geneSetCollection, geneIds, universeGeneIds,
  pvalueCutoff, testDirection, ...)

Arguments

Argument	Description
`name`	String specifying name of the GeneSetCollection.
`geneSetCollection`	A GeneSetCollection Object. If a GOHyperGParams object is sought, then this GeneSetCollection should be based on a GO2ALLFrame object and so the idType of that GeneSetCollection should be GOAllFrameIdentifier. If a KEGGHyperGParams object is sought then a GeneSetCollection based on a KEGGFrame object should be used and the idType will be a KEGGFrameIdentifier.
`geneIds`	Object of class `"ANY"` : A vector of gene identifiers. Numeric and character vectors are probably the only things that make sense. These are the gene ids for the selected gene set.
`universeGeneIds`	Object of class `"ANY"` : A vector of gene ids in the same format as `geneIds` defining a subset of the gene ids on the chip that will be used as the universe for the hypergeometric calculation. If this is `NULL` or has length zero, then all gene ids on the chip will be used.
`ontology`	A string specifying the GO ontology to use. Must be one of "BP", "CC", or "MF". (used with GO only)
`pvalueCutoff`	A numeric values between zero and one used as a p-value cutoff for p-values generated by the Hypergeometric test. When the test being performed is non-conditional, this is only used as a default value for printing and summarizing the results. For a conditional analysis, the cutoff is used during the computation to determine perform the conditioning: child terms with a p-value less than pvalueCutoff are conditioned out of the test for their parent term.
`conditional`	A logical indicating whether the calculation should condition on the GO structure. (GO only)
`testDirection`	A string which can be either "over" or "under". This determines whether the test performed detects over or under represented GO terms.
`...`	optional arguments to configure the GOHyperGParams object.

Author

M. Carlson

HyperGParams_class()

Class "HyperGParams"

Description

An abstract (VIRTUAL) parameter class for representing all parameters needed by a method specializing the hyperGTest generic. You should only use subclasses of this class directly.

Author

S. Falcon

HyperGResultBase_class()

Class "HyperGResultBase"

Description

This VIRTUAL class represents common elements of the return values of generic functions like hyperGTest . All subclasses are intended to implement the accessor functions documented at HyperGResult-accessors .

Author

Seth Falcon

HyperGResult_accessors()

Accessors for HyperGResult Objects

Description

This manual page documents generic functions for extracting data from the result object returned from a call to hyperGTest . The result object will be a subclass of HyperGResultBase . Methods apply to all result object classes unless otherwise noted.

Usage

pvalues(r)
oddsRatios(r)
expectedCounts(r)
geneCounts(r)
universeCounts(r)
universeMappedCount(r)
geneMappedCount(r)
geneIds(object, ...)
geneIdUniverse(r, cond = TRUE)
geneIdsByCategory(r, catids = NULL)
sigCategories(r, p)
## R CMD check doesn't like these
## annotation(r)
## description(r)
testName(r)
pvalueCutoff(r)
testDirection(r)
chrGraph(r)

Arguments

Argument	Description
`r, object`	An instance of a subclass of `HyperGResultBase` .
`catids`	A character vector of category identifiers.
`p`	Numeric p-value used as a cutoff for selecting a subset of the result.
`cond`	A logical value indicating whether to return conditional results for a conditional test. The default is `TRUE` . For non-conditional results, this argument is ignored.
`...`	Additional arguments that may be used by specializing methods.

Author

Seth Falcon

Examples

## Note that more in-depth examples can be found in the GOstats
## vignette (Hypergeometric tests using GOstats).
library("hgu95av2.db")
library("annotate")

## Retrieve 300 probeids that have PFAM ids
probids <- keys(hgu95av2.db,keytype="PROBEID",column="PFAM")[1:300]

## get unique Entrez Gene IDs
geneids <- select(hgu95av2.db, probids, 'ENTREZID', 'PROBEID')
geneids <- unique(geneids[['ENTREZID']])

## Now do the same for the universe
univ <- keys(hgu95av2.db,keytype="PROBEID",column="PFAM")
univ <- select(hgu95av2.db, univ, 'ENTREZID', 'PROBEID')
univ <- unique(univ[['ENTREZID']])

p <- new("PFAMHyperGParams", geneIds=geneids, universeGeneIds=univ,
annotation="hgu95av2")
## this takes a while...
if(interactive()){
hypt <- hyperGTest(p)
summary(hypt)
htmlReport(hypt, file="temp.html", summary.args=list("htmlLinks"=TRUE))
}

HyperGResult_class()

Class "HyperGResult"

Description

This class represents the results of a test for over-representation of categories among genes in a selected gene set based upon the Hypergeometric distribution. The hyperGTest generic function returns an instance of the HyperGResult class. For details on accessing the results, see HyperGResult-accessors .

Author

Seth Falcon

KEGGHyperGParams_class()

Class "KEGGHyperGParams" and "PFAMHyperGParams"

Description

Parameter classes for representing all parameters needed for running the hyperGTest method with KEGG or PFAM as the category.

Author

S. Falcon

LinearMParams_class()

Class "LinearMParams"

Description

A parameter class for representing all parameters needed by a method specializing the linearMTest generic.

Author

Deepayan Sarkar, Michael Lawrence

LinearMResultBase_class()

Class "LinearMResultBase"

Description

This VIRTUAL class represents common elements of the return values of generic functions like linearMTest . These elements are essentially those that are passed through from the input parameters. See LinearMResult for a concrete result class with the basic outputs.

Author

Deepayan Sarkar, Michael Lawrence

LinearMResult_class()

Class "LinearMResult"

Description

This class represents the results of a test for systematic change in some gene-level statistic by gene sets. The linearMTest generic function returns an instance of the LinearMResult class.

Author

Deepayan Sarkar, Michael Lawrence

Examples

showClass("LinearMResult")

MAPAmat()

Mapping chromosome bands to genes

Description

These functions return a mapping of chromosome bands to genes. makeChrBandGSC returns a GeneSetCollection object, with a GeneSet for each band. The other functions return a 0/1 incidence matrix with a row for each chromosme band and a column for each gene. Only those chromosome bands with at least one gene annotation will be included.

Usage

MAPAmat(chip, univ = NULL, minCount = 0)
makeChrBandInciMat(chrGraph)
makeChrBandGSC(chrGraph)

Arguments

Argument	Description
`chip`	A string giving the annotation source. For example, `"hgu133plus2"`
`univ`	A vector of gene IDs (these should be Entrez IDs for most annotation sources). The the annotations will be limited to those in the set specified by `univ` . If `univ` is `NULL` (default), then the gene IDs are those found in the annotation data source.
`chrGraph`	A `graph` object as returned by `makeChrBandGraph`
`minCount`	Bands with less than `minCount` genes will be excluded from the returned matrix. If `minCount` is `0` , no bands will be removed, this is the default.

Value

For makeChrBandGSC , a GeneSetCollection object with a GeneSet for each band.

For the other functions, (0/1) incidence matrix with chromosome bands as rows and gene IDs as columns. A 1 in m[i, j] indicates that the chromosome band rownames(m)[i] contains the geneID colnames(m)[j] .

Author

Seth Falcon, Michael Lawrence

Examples

have_hgu95av2.db <- suppressWarnings(require("hgu95av2.db"))
if (have_hgu95av2.db)
mam <- MAPAmat("hgu95av2.db")

NewChrBandTree()

Create a new ChrBandTree object

Description

NewChrBandTree and ChrBandTreeFromGraph provide constructors for the ChrBandTree class.

Usage

NewChrBandTree(chip, univ)
ChrBandTreeFromGraph(g)

Arguments

Argument	Description
`chip`	The name of an annotation data package
`univ`	A vector of gene identifiers that defines the universe of genes. Usually, this will be a vector of Entez Gene IDs. If `univ` is `NULL` , then all genes probed on the specified chip will be in the universe. We strongly recommend using the set of genes that remains after applying a non-specific filter as the universe.
`g`	A `graph` instance as returned by `makeChrBandGraph`

Value

A new ChrBandTree instance.

Author

S. Falcon

OBOHyperGParams_class()

Class "OBOHyperGParams"

Description

A parameter class for representing all parameters needed for running the hyperGTest method with an ontology adhered to the OBO Foundry (see http://www.obofoundry.org ) as the category.

Author

R. Castelo

applyByCategory()

Apply a function to a vector of statistics, by category

Description

For each category, apply the function FUN to the set of values of stats belonging to that category.

Usage

applyByCategory(stats, Amat, FUN = mean, ...)

Arguments

Argument	Description
`stats`	Numeric vector with test statistics of interest.
`Amat`	A logical or numeric matrix: the adjacency matrix of the bipartite genes - category graph. Its rows correspond to the categories, columns to the genes, and `TRUE` or a numeric value different from `0` indicates membership. The columns are assumed to be aligned with the elements of `stats` .
`FUN`	A function to apply to the subsets `stats` by categories.
`list()`	Extra parameters passed to `FUN` .

Details

For GO categories, the function cateGOry might be useful for the construction of Amat .

Value

The return value is a list or vector of length equal to the number of categories. Each element corresponds to the values obtained by applying FUN to the subset of values in stats according to the category defined for that row.

Author

R. Gentleman, contributions from W. Huber

Examples

set.seed(0xabcd)
st = rnorm(20)
names(st) = paste("gene", 1:20)

a = matrix(sample(c(FALSE, TRUE), 60, replace=TRUE), nrow=3,
dimnames = list(paste("category", LETTERS[1:3]), names(st)))

applyByCategory(st, a, median)

cateGOryMatrix()

Construct a category membership matrix from a list of gene identifiers and their annotated GO categories.

Description

The function constructs a category membership matrix, such as used by applyByCategory , from a list of gene identifiers and their annotated GO categories. For each of the GO categories stated in categ , all less specific terms (ancestors) are also included, thus one need only obtain the most specific set of GO term mappings, which can be obtained from Bioconductor annotation packages or via biomaRt . The ancestor relationships are obtained from the GO.db package.

Usage

cateGOry(x, categ, sparse=FALSE)

Arguments

Argument	Description
`x`	Character vector with (arbitrary) gene identifiers. They will be used for the column names of the resulting matrix.
`categ`	A character vector of the same length as `x` with GO annotations for the genes in `x` . If a gene has multiple GO annotations, it is expected to occur multiple times in `x` , once for each different annotation.
`sparse`	Logical. If `TRUE` , the resulting matrix is constructed using `Matrix` , otherwise, R's base `matrix` is used.

Details

The function requires the GO package.

For subsequent analyses, it is often useful to remove categories that have only a small number of members. Use the normal matrix subsetting syntax for this, see example.

If a GO category in categ is not found in the GO annotation package, a warning will be generated, and no ancestors for that GO category are added (but that category itself will be part of the returned adjacency matrix).

Value

The adjacency matrix of the bipartite category membership graph, rows are categories and columns genes.

Author

Wolfgang Huber

Examples

g = cateGOry(c("CG2671", "CG2671", "CG2950"),
c("GO:0090079", "GO:0001738", "GO:0003676"), sparse=TRUE)
g

rowSums(g)   ## number of genes in each category

## Filter out categories with less than minMem and more than maxMem members.
## This is toy data, in real applications, a choice of minMem higher
## than 2 will be more appropriate.
filter = function(x, minMemb = 2, maxMemb = 35) ((x>=minMemb) & (x<=maxMemb))
g[filter(rowSums(g)),,drop=FALSE ]

categoryToEntrezBuilder()

Return a list mapping category ids to Entrez Gene ids

Description

Return a list mapping category ids to the Entrez Gene ids annotated at the category id. Only those category ids that have at least one annotation in the set of Entrez Gene ids specified by the geneIds slot of p are included.

Usage

categoryToEntrezBuilder(p)

Arguments

Argument	Description
`p`	A subclass of `HyperGParams-class`

Details

End users should not call this directly. This method gets called from hyperGTest . To add support for a new category, a new method for this generic must be defined. Its signature should match a subclass of HyperGParams-class appropriate for the new category.

Value

A list mapping category ids to Entrez Gene identifiers.

Author

S. Falcon

cb_contingency()

Create and Test Contingency Tables of Chromosome Band Annotations

Description

For each chromosome band identifier in chrVect , cb_contingency builds and performs a test on a 2 x k contingency table for the genes from selids found in the child bands of the given chrVect element.

cb_sigBands extracts the chromosome band identifiers that were in a contingency table that tested significant given the specified p-value cutoff.

cb_children returns the child bands of a given band in the chromosome band graph. The argument must have length equal to one.

Usage

cb_contingency(selids, chrVect, chrGraph, testFun = chisq.test,
               min.expected = 5L, min.k = 1L)
cb_sigBands(b, p.value = 0.01)
cb_children(n, chrGraph)

Arguments

Argument	Description
`selids`	A vector of the selected gene identifiers (usual Entrez IDs).
`chrVect`	A character vector of chromosome band identifiers
`chrGraph`	A `graph` object as returned by `makeChrBandGraph` . The nodes should be chromosome band IDs and the edges should represent the tree structure of the bands. Furthermore, the graph is expected to have a `"geneIds"` node attribute providing a vector of gene IDs annotated at each band.
`testFun`	The function to use for testing the 2 x k contingency tables. The default is `chisq.test` . It will be called with a single argument, a 2 x k matrix representing the contingency table.
`min.expected`	A numeric value specifying the minimum expected count for columns to be included in the contingency table. The expected count is `(rowSum * colSum) / n` . Chromosome bands with a select cell count less than `min.expected` are dropped from the table before testing occurs. If `NULL` , then no bands will be dropped.
`min.k`	An integer giving the minimum number of chromosome bands that must be present in a contingency table in order to proceed with testing.
`b`	A list as returned by `cb_contingency`
`p.value`	A p-value cutoff to use in selecting significant contingency tables.
`n`	A length one character vector specifying a chromosome band annotation. Bands not found in `chrGraph` will return `character(0)` when passed to `cb_children` .

Details

cb_sigBands assumes that the p-value associated with a result of testFun can by accessed as testFun(t)$p.value . We should improve this to be a method call which can then be specialized based on the class of the object returned by testFun .

Value

cb_contingency returns a list with an element for each test performed. This will most often be shorter than length(chrVect) due to skipped tests based on min.found and min.k . Each element of the returned list is itself a list with components:

cb_sigBands returns a character vector of chromosome band identifiers that are in one of the contingency tables that had a p-value less than the cutoff specified by p.value .

Author

Seth Falcon

cb_parse_band_Hs()

Parse Homo Sapiens Chromosome Band Annotations

Description

This function parses chromosome band annotations as found in the MAP map of Bioconductor annotation data packages. The return value is a vector of parent bands up to the relevant chromosome.

Usage

cb_parse_band_Hs(x)

Arguments

Argument	Description
`x`	A chromosome band annotation given as a string.

Details

The former function cb_parse_band_hsa is now deprecated.

Value

A character vector giving the path to the relevant chromosome.

Author

Seth Falcon

Examples

cb_parse_band_Hs("12q32.12")

cb_parse_band_Mm()

Parse Mus Musculus Chromosome Band Annotations

Description

This function parses chromosome band annotations as found in the MAP map of Bioconductor annotation data packages. The return value is a vector of parent bands up to the relevant chromosome.

Usage

cb_parse_band_Mm(x)

Arguments

Argument	Description
`x`	A chromosome band annotation given as a string.

Value

A character vector giving the path to the relevant chromosome.

Author

Seth Falcon & Nolwenn Le Meur

Examples

cb_parse_band_Mm("10 B3")

cb_test()

Chromosome Band Tree-Based Hypothesis Testing

Description

cb_test is a flexible tool for discovering interesting chromosome bands relative to a selected gene list. The function supports local and global tests which can be carried out in a top down or bottom up fashion on the tree of chromosome bands.

Usage

cb_test(selids, chrtree, level, dir = c("up", "down"),
       type = c("local", "global"), next.pval = 0.05,
       cond.pval = 0.05, conditional = FALSE)

Arguments

Argument	Description
`selids`	A vector of gene IDs. The IDs should match those used to annotatate the `ChrBandTree` given by `chrtree` . In most cases, these will be Entrez Gene IDs.
`chrtree`	A `ChrBandTree` object representing the chromosome bands and the mapping to gene identifiers. The genes in the `ChrBandTree` are limited to the universe of gene IDs specified at object creation time.
`level`	An integer giving the level of the chromosome band tree at which testing should begin. The level is conceptualized as the set of nodes with a given path length to the root (organism) node of the chromosome band tree. So level 1 is the chromosome and level 2 is the chromosome arms. You can get a better sense by calling `exampleLevels(chrtree)`
`dir`	A string giving the direction in which the chromosome band tree will be traversed when carrying out the tests. A bottom up traversal, from leaves to root, is specified by `"up"` . A top down, from root to leaves, traversal is specified by `"down"` .
`type`	A string giving the type of test to perform. The current choices are `"local"` and `"global"` . A local test carries out a chisq.test on each 2 x K contingency table induced by each set of siblings at a given level in the tree. A global test uses the Hypergeometric distribution to compute a p-value for the 2 x 2 tables induced by each band treated independently.
`next.pval`	The p-value cutoff used to determine whether the parents or children of a node should be tested. After testing a given level of the tree, the decision of whether or not to continue testing the children (or parents) of the already tested nodes is made by comparing the p-value result for a given node with this cutoff; relatives of nodes with values strictly greater than the cutoff are skipped.
`cond.pval`	The p-value cutoff used to determine whether a node is significant during a conditional test. See `conditional` .
`conditional`	A logical value. Can only be used when `dir="up"` and `type="global"` . In this case, a `TRUE` value causes a conditional Hypergeometric calculation to be performed. The genes annotated at significant children of a given band are removed before testing.

Value

A list with an element for each level of the tree that was tested. Note that the first element will correspond to the level given by level and that subsequent elements will be the next or previous depending on dir .

Each level element is itself a list consisting of a result list for each node or set of nodes tested. These inner-most lists will have, at least, the following components:

Author

Seth Falcon

effectSize()

Extract estimated effect sizes

Description

This function extracts estimated effect sizes from the results of a linear model-based gene-set / category enrichment test.

Usage

effectSize(r)

Arguments

Argument	Description
`r`	The results of the test

Value

A numeric vector.

Author

Deepayan Sarkar

exampleLevels()

Display a sample node from each level of a ChrBandTree object

Description

The "levels" of a chromosome band tree represented by a ChrBandTree object are the sets of nodes with a given path length to the root node. This function displays the available levels along with an example node from each level.

Usage

exampleLevels(g)

Arguments

Argument	Description
`g`	A `ChrBandTree` object

Value

A list with an element for each level. The names of the list are the levels. Each element is an example of a node from the given level.

Author

S. Falcon

findAMstats()

Compute per category summary statistics

Description

For a given incidence matrix, Amat , compute some per category statistics.

Usage

findAMstats(Amat, tstats)

Arguments

Argument	Description
`Amat`	An incidence matrix, with categories as the rows and probes as the columns.
`tstats`	A vector of per probe test statistics (should be the same length as `ncol(Amat)` .

Details

Simple summary statistics are computed, such as the row sums and the vector of per category sums of the test statistics, tstats .

Value

A list with components,

Author

R. Gentleman

Examples

ts = rnorm(100)
Am = matrix(sample(c(0,1), 1000, replace=TRUE), ncol=100)
findAMstats(Am, ts)

getPathNames()

A function to print pathway names given their numeric ID.

Description

Given a KEGG pathway ID this function returns the character name of the pathway.

Usage

getPathNames(iPW)

Arguments

Argument	Description
`iPW`	A vector of KEGG pathway IDs.

Details

This function simply does a look up in KEGGPATHID2NAME and returns a list of the pathway names.

Possible extensions would be to extend it to work with the cMAP library as well.

Value

A list of pathway names.

Author

R. Gentleman

Examples

nms = "00031"
getPathNames(nms)

gseattperm()

Permutation p-values for GSEA

Description

This function performs GSEA computations and returns p-values for each gene set based on repeated permutation of the phenotype labels.

Usage

gseattperm(eset, fac, mat, nperm)

Arguments

Argument	Description
`eset`	An `ExpressionSet` object
`fac`	A `factor` identifying the phenotypes in `eset` . Usually, this will be one of the columns in the phenotype data associated with `eset` .
`mat`	A 0/1 incidence matrix with each row representing a gene set and each column representing a gene. A 1 indicates membership of a gene in a gene set.
`nperm`	Number of permutations to test to build the reference distribution.

Details

The t-statistic is used (via rowttests ) to test for a difference in means between the phenotypes determined by fac within each gene set (given as a row of mat ).

A reference distribution for these statistics is established by permuting fac and repeating the test B times.

Value

A matrix with the same number of rows as mat and two columns, "Lower" and "Upper" . The "Lower" ( "Upper" ) column gives the probability of seeing a t-statistic smaller (larger) than the observed.

Author

Seth Falcon

Examples

## This example uses a random sample of probesets and a randomly
## generated category matrix.  The results, therefore, are not
## meaningful, but the code demonstrates how to use gseattperm without
## requiring any expensive computations.

## Obtain an ExpressionSet with two types of samples (mol.biol)
haveALL <- require("ALL")
if (haveALL) {
data(ALL)
set.seed(0xabcd)
rndIdx <- sample(1:nrow(ALL), 500)
Bcell <- grep("^B", as.character(ALL$BT))
typeNames <- c("NEG", "BCR/ABL")
bcrAblOrNegIdx <- which(as.character(ALL$mol.biol) %in% typeNames)
s <- ALL[rndIdx, intersect(Bcell, bcrAblOrNegIdx)]
s$mol.biol <- factor(s$mol.biol)

## Generate a random category matrix
nCats <- 100
set.seed(0xdcba)
rndCatMat <- matrix(sample(c(0L, 1L), replace=TRUE),
nrow=nCats, ncol=nrow(s),
dimnames=list(
paste("c", 1:nCats, sep=""),
featureNames(s)))

## Demonstrate use of gseattperm
N <- 10
pvals <- gseattperm(s, s$mol.biol, rndCatMat, N)
pvals[1:5, ]
}

hyperGTest()

Hypergeometric Test for association of categories and genes

Description

Given a subclass of HyperGParams , compute Hypergeomtric p-values for over or under-representation of each term in the specified category among the specified gene set.

Usage

hyperGTest(p)

Arguments

Argument	Description
`p`	An instance of a subclass of `HyperGParams` . This parameter object determines the category of interest (e.g., GO or KEGG) as well as the gene set.

Details

The gene identifiers in the geneIds slot of p define the selected set of genes. The universe of gene ids is determined by the chip annotation found in the annotation slot of p . Both the selected genes and the universe are reduced by removing identifiers that do not have any annotations in the specified category.

For each term in the specified category that has at least one annotation in the selected gene set, we determine how many of its annotations are in the universe set and how many are in the selected set. With these counts we perform a Hypergeometric test using phyper . This is equivalent to using Fisher's exact test.

It is important that the correct chip annotation data package be identified as it determines the universe of gene identifiers and is often used to determine the mapping between the category term and the gene identifiers.

For S. cerevisiae if the annotation slot of p is set to '"org.Sc.sgd"' then comparisons and statistics are computed using common names and are with respect to all genes annotated in the S. cerevisiae genome not with respect to any microarray chip. This will not be the right thing to do if you are working with a yeast microarray.

Value

A HyperGResult instance.

Author

S. Falcon

hyperg()

Hypergeometric (gene set enrichment) tests on character vectors.

Description

This function performs a hypergeometric test for over- or under-representation of significant genes amongst those assayed in a universe of genes. It provides an interface based on character vectors of identifying member of gene sets and the gene universe.

Usage

hyperg(assayed, significant, universe,
    representation = c("over", "under"), ...)

Arguments

Argument	Description
`assayed`	A vector of assayed genes (or other identifiers). `assayed` may be a character vector (defining a single gene set) or list of character vectors (defining a collection of gene sets).
`significant`	A vector of assayed genes that were differentially expressed. If `assayed` is a character vector, then `significant` must also be a character vector; likewise when `assayed` is a `list` .
`universe`	A character vector defining the universe of genes.
`representation`	Either over or under , to indicate testing for over- or under-representation, respectively, of differentially expressed genes.
`list()`	Additional arguments, unused.

Value

When invoked with a character vector of assayed genes, a named numeric vector providing the input values, P-value, odds ratio, and expected number of significantly expressed genes.

When invoked with a list of character vectors of assayed genes, a data frame with columns of input values, P-value, odds ratio, and expected number of significantly expressed genes.

Author

Martin Morgan mtmorgan@fhcrc.org with contributions from Paul Shannon.

Examples

set.seed(123)

## artifical sets -- affy probes grouped by protein family
library(hgu95av2.db)
map <- select(hgu95av2.db, keys(hgu95av2.db), "PFAM")
sets <- Filter(function(x) length(x) >= 10, split(map$PROBEID, map$PFAM))

universe <- unlist(sets, use.names=FALSE)
siggenes <- sample(universe, length(universe) / 20)  ## simulate
sigsets <- Map(function(x, y) x[x %in% y], sets, MoreArgs=list(y=siggenes))

result <- hyperg(sets, sigsets, universe)
head(result)

linearMTest()

A linear model-based test to detect enrichment of unusual genes in categories

Description

Given a subclass of LinearMParams , compute p-values for detecting systematic up or downregulation of the specified gene set in the specified category.

Usage

linearMTest(p)

Arguments

Argument	Description
`p`	An instance of a subclass of `LinearMParams` . This parameter object determines the category of interest (currently, only chromosome bands) as well as the gene set.

Details

The per-gene statistics in the geneStats slot of p give a measure of up or down regulation of the individual genes in the universe. % The list of genes is reduced by removing identifiers that do not have % any annotations in the specified category.

%% FIXME: more details needed

% It is important that the correct chip annotation data package be % identified as it determines the universe of gene identifiers and is % often used to determine the mapping between the category term and the % gene identifiers.

% For S. cerevisiae if the code{annotation} slot of code{p} is set to % '"YEAST"' then comparisons and statistics are computed using common % names and are with respect to all genes annotated in the S. cerevisiae % genome not with respect to any microarray chip. This will not be % the right thing to do if you are working with a yeast microarray.

Value

A LinearMResult instance.

Author

D. Sarkar

local_test_factory()

Local and Global Test Function Factories

Description

These functions return functions appropriate for use as the tfun argument to topdown_tree_visitor or bottomup_tree_visitor . In particular, it is these functions that are associated with the "local" and "global" options for the type argument to cb_test .

Usage

local_test_factory(selids, tableTest = chisq.test)
hg_test_factory(selids, PCUT = 0.05, COND = FALSE, OVER = TRUE)

Arguments

Argument	Description
`selids`	A vector of gene IDs. The IDs should match those used to annotatate the `ChrBandTree` given by `chrtree` . In most cases, these will be Entrez Gene IDs.
`tableTest`	A contingency table testing function. The behavior of this function must be reasonably close to that of `chisq.test` .
`PCUT`	A p-value cutoff that will be used to determine if a given test is significant or not when using `hg_test_factory` with `COND=TRUE` .
`COND`	A logical value indicating whether a conditional test should be performed.
`OVER`	If `TRUE` , test for over representation, if `FALSE` , test for under representation.

Details

The returned functions have signature f(start, g, prev_ans) where start is a vector of start nodes, g is a chromosome band tree graph, and prev_ans can contain the previous result returned by a call to this function.

Value

A function that can be used as the tfun argument to the tree visitor functions.

Author

Seth Falcon

makeChrBandGraph()

Create a graph representing chromosome band annotation data

Description

This function returns a graph object representing the nested structure of chromosome bands (also known as cytogenetic bands). The nodes of the graph are band identifiers. Each node has a geneIds node attribute that lists the gene IDs that are annotated at the band (the gene IDs will be Entrez IDs in most cases).

Usage

makeChrBandGraph(chip, univ = NULL)

Arguments

Argument	Description
`chip`	A string giving the annotation source. For example, `"hgu133plus2"`
`univ`	A vector of gene IDs (these should be Entrez IDs for most annotation sources). The annotations attached to the graph will be limited to those specified by `univ` . If `univ` is `NULL` (default), then the gene IDs are those found in the annotation data source.

Details

This function parses the data stored in the <chip>MAP map from the appropriate annotation data package. Although cytogenetic bands are observed in all organisms, currently, only human and mouse band nomenclatures are supported.

Value

A graph-class instance. The graph will be a tree and the root node is labeled for the organism.

Author

Seth Falcon

Examples

chrGraph <- makeChrBandGraph("hgu95av2.db")
chrGraph

makeEBcontr()

A function to make the contrast vectors needed for EBarrays

Description

Using EBarrays to detect differential expression requires the construction of a set of contrasts. This little helper function computes these contrasts for a two level factor.

Usage

makeEBcontr(f1, hival)

Arguments

Argument	Description
`f1`	The factor that will define the contrasts.
`hival`	The `level` of the factor to treat as the high level.

Details

Not much more to add, see EBarrays for more details. This is used in the Category package to let users compute the posterior probability of differential expression, and hence to compute expected numbers of differentially expressed genes, per category.

Value

An object of class `ebarraysPatterns''. ## Seealso [ebPatterns`](#ebpatterns) ## Author R. Gentleman ## Examples r if( require("EBarrays") ) { myfac = factor(rep(c("A", "B"), c(12, 24))) makeEBcontr(myfac, "B") }

makeValidParams()

Non-standard Generic for Checking Validity of Parameter Objects

Description

This function is not intended for end-users, but may be useful for developers extending the Hypergeometric testing capabilities provideded by the Category package.

makeValidParams is intended to validate a parameter object instance (e.g. HyperGParams or subclass). The idea is that unlike validObject , methods for this generic attempt to fix invalid instances when possible, and in this case issuing a warning, and only give an error if the object cannot be fixed.

Usage

makeValidParams(object)

Arguments

Argument	Description
`object`	A parameter object. Consult `showMethods` to see signatures currently supported.

Value

The value must have the same class as the object argument.

Author

Seth Falcon

probes2MAP()

Map probe IDs to MAP regions.

Description

This function maps probe identifiers to MAP positions using the appropriate Bioconductor meta-data package.

Usage

probes2MAP(pids, data = "hgu133plus2")

Arguments

Argument	Description
`pids`	A vector of probe IDs for the chip in use.
`data`	The name of the chip, as a character string.

Details

Probes are mapped to regions, no checking for duplicate Entrez gene IDs is done.

Value

A vector, the same length as pids , with the MAP locations.

Author

R. Gentleman

Examples

set.seed(123)
library("hgu95av2.db")
v1 = sample(names(as.list(hgu95av2MAP)), 100)
pp = probes2MAP(v1, "hgu95av2.db")

probes2Path()

A function to map probe identifiers to pathways.

Description

Given a set of probe identifiers from a microarray this function looks up all KEGG pathways that the probe is documented to be involved in.

Usage

probes2Path(pids, data = "hgu133plus2")

Arguments

Argument	Description
`pids`	A vector of probe identifiers.
`data`	The character name of the chip.

Details

This is a simple look up in the appropriate chip PATH data environment.

Value

A list of pathway vectors. One element for each value of pid that is mapped to at least one pathway.

Author

R. Gentleman

Examples

library("hgu95av2.db")
x = c("1001_at", "1000_at")
probes2Path(x, "hgu95av2.db")

tree_visitor()

Tree Visitor Function

Description

This function visits each node in a tree-like object in an order determined by the relationOf function. The function given by tfun is called for each set of nodes and the function nfun determines which nodes to test next optionally making use of the result of the previous test.

Usage

tree_visitor(g, start, tfun, nfun, relationOf)
topdown_tree_visitor(g, start, tfun, nfun)
bottomup_tree_visitor(g, start, tfun, nfun)

Arguments

Argument	Description
`g`	A tree-like object that supports the method given by `relationOf` .
`start`	The set of nodes to start the computation (can be a list of siblings), but the nodes should all belong to the same level of the tree (same path length to root node).
`tfun`	The test function applied to each list of siblings at each level starting with `start` . The signature of `tfun` should be `(start, g, prev_ans)` .
`nfun`	A function with signature `(ans, g)` that processes the result of `tfun` and returns a character vector of node names corresponding to nodes that were involved in an "interesting" test. This is used to determine the next set of nodes to test (see details).
`relationOf`	The method used to traverse the tree. For example `childrenOf` or `parentOf` .

Details

The tree_visitor function is intended to allow developers to quickly prototype different statistical testing paradigms on trees. It may be possible to extend this to work for DAGs.

The visit begins by calling tfun with the nodes provided by start . The result of each call to tfun is stored in an environment. The concept is visitation by tree level and so each result is stored using a key representing the level (this isn't quite right since the nodes in start need not be first level, but they will be assigned key "1". After storing the result, nfun is used to obtain a vector of accepted node labels. The idea is that the user should have a way of determining which nodes in the next level of the tree are worth testing. The next start set is determined by start <- relationOf(g, accepted) where accepted is unique(nfun(ans, g)) .

Value

A list. See the return value of cb_test to get an idea. Each element of the list represents a call to tfun at a given level of the tree.

Author

Seth Falcon

ttperm()

A simple function to compute a permutation t-test.

Description

The data matrix, x , with two-level factor, fac , is used to compute t-tests. The values of fac are permuted B times and the complete set of t-tests is performed for each permutation.

Usage

ttperm(x, fac, B = 100, tsO = TRUE)

Arguments

Argument	Description
`x`	A data matrix. The number of columns should be the same as the length of `fac` .
`fac`	A factor with two levels.
`B`	An integer specifying the number of permutations.
`tsO`	A logical indicating whether to compute only the t-test statistic for each permuation. If `FALSE` then p-values are also computed - but this can be very slow.

Details

Not much more to say. Probably there is a generic function somewhere, but I could not find it.

Value

A list, the first element is named obs and contains the true, observed, values of the t-statistic. The second element is named ans and contains a list of length B containing the different permuations.

Author

R. Gentleman

Examples

x=matrix(rnorm(100), nc=10)
y = factor(rep(c("A","B"), c(5,5)))
ttperm(x, y, 10)

universeBuilder()

Return a vector of gene identifiers with category annotations

Description

Return all gene ids that are annotated at one or more terms in the category. If the universeGeneIds slot of p has length greater than zero, then the intersection of the gene ids specified in that slot and the normal return value is given.

Usage

universeBuilder(p)

Arguments

Argument	Description
`p`	A subclass of `HyperGParams-class`