bioconductor v3.9.0 Category

A collection of tools for performing category (gene set

Link to this section Summary

Functions

Defunct Functions in Package Category

Class "ChrBandTree"

Class "ChrMapHyperGParams"

Class "ChrMapHyperGResult"

Class "ChrMapLinearMParams"

Class "ChrMapLinearMResult"

Class "DatPkg"

Class "GOHyperGParams"

Helper function for constructing a GOHyperGParams objects or KEGGHyperGParams objects from a GeneSetCollection

Class "HyperGParams"

Class "HyperGResultBase"

Accessors for HyperGResult Objects

Class "HyperGResult"

Class "KEGGHyperGParams" and "PFAMHyperGParams"

Class "LinearMParams"

Class "LinearMResultBase"

Class "LinearMResult"

Mapping chromosome bands to genes

Create a new ChrBandTree object

Class "OBOHyperGParams"

Apply a function to a vector of statistics, by category

Construct a category membership matrix from a list of gene identifiers and their annotated GO categories.

Return a list mapping category ids to Entrez Gene ids

Create and Test Contingency Tables of Chromosome Band Annotations

Parse Homo Sapiens Chromosome Band Annotations

Parse Mus Musculus Chromosome Band Annotations

Chromosome Band Tree-Based Hypothesis Testing

Extract estimated effect sizes

Display a sample node from each level of a ChrBandTree object

Compute per category summary statistics

A function to print pathway names given their numeric ID.

Permutation p-values for GSEA

Hypergeometric Test for association of categories and genes

Hypergeometric (gene set enrichment) tests on character vectors.

A linear model-based test to detect enrichment of unusual genes in categories

Local and Global Test Function Factories

Create a graph representing chromosome band annotation data

A function to make the contrast vectors needed for EBarrays

Non-standard Generic for Checking Validity of Parameter Objects

Map probe IDs to MAP regions.

A function to map probe identifiers to pathways.

Tree Visitor Function

A simple function to compute a permutation t-test.

Return a vector of gene identifiers with category annotations

Link to this section Functions

Link to this function

Category_defunct()

Defunct Functions in Package Category

Description

The functions or variables listed here are no longer part of the Category package.

Usage

condGeneIdUniverse()
isConditional()
geneGoHyperGeoTest()
geneKeggHyperGeoTest()
cb_parse_band_hsa()
chrBandInciMat()

Seealso

Defunct

Link to this function

ChrBandTree_class()

Class "ChrBandTree"

Description

This class represents chromosome band annotation data for a given experiment. The class is responsible for storing the mapping of band to set of gene IDs located within that band as well as for representing the tree structured relationship among the bands.

Note

Not all known chromosome bands will be represented in a given instance. The set of bands that will be present is determined by the available annotation data and the specified gene universe. The annotation source maps genes to their most specific band. Such bands and all bands on the path to the root will be represented in the resulting tree.

Currently there is only support for human and mouse data.

Author

S. Falcon

Examples

library("hgu95av2.db")
set.seed(0xfeee)
univ = NULL ## use all Entrez Gene IDs on the chip (not recommended)
ct = NewChrBandTree("hgu95av2.db", univ)

length(allGeneIds(ct))

exampleLevels(ct)

geneIds(ct, "10p11")
lgeneIds(ct, "10p11")
lgeneIds(ct, c("10p11", "Yq11.22"))

pp = parentOf(ct, c("10p11", "Yq11.22"))
childrenOf(ct, unlist(pp))

treeLevels(ct)

level2nodes(ct, 0)
level2nodes(ct, 0L)
level2nodes(ct, "0")

level2nodes(ct, 1)
Link to this function

ChrMapHyperGParams_class()

Class "ChrMapHyperGParams"

Description

This class encapsulates parameters needed for Hypergeometric testing of over or under representation of chromosome bands among a selected gene list using hyperGTest .

Author

Seth Falcon

Examples

showClass("ChrMapHyperGParams")
Link to this function

ChrMapHyperGResult_class()

Class "ChrMapHyperGResult"

Description

This class represents the results of a Hypergeometric test for over-representation of genes in a selected gene list in the chromosome band annotation. The hyperGTest function returns an instance of ChrMapHyperGResult when given a parameter object of class ChrMapHyperGParams . For details on accessing the results, see HyperGResult-accessors .

Author

Seth Falcon

Examples

showClass("ChrMapHyperGResult")
## For details on accessing the results:
##     help("HyperGResult-accessors")
Link to this function

ChrMapLinearMParams_class()

Class "ChrMapLinearMParams"

Description

This class encapsulates parameters needed for testing systematic variations in some gene-level statistic by chromosome bands using linearMTest .

Seealso

linearMTest

Author

Deepayan Sarkar

Examples

showClass("ChrMapLinearMParams")
Link to this function

ChrMapLinearMResult_class()

Class "ChrMapLinearMResult"

Description

This class represents the results of a linear model-based test for systematic changes in a per-gene statistic by chromosome band annotation. The linearMTest function returns an instance of ChrMapLinearMResult when given a parameter object of class ChrMapLinearMParams . Most slots can be queried using accessors.

Seealso

linearMTest , ChrMapLinearMParams , LinearMResult , LinearMResultBase ,

Author

Deepayan Sarkar, Michael Lawrence

Examples

showClass("ChrMapLinearMResult")

Class "DatPkg"

Description

DatPkg is a VIRTUAL class for representing annotation data packages.

AffyDatPkg is a subclass of DatPkg used to represent standard annotation data packages that follow the format of Affymetrix expression array annotation.

YeastDatPkg is a subclass of DatPkg used to represent the annotation data packages for yeast. The yeast chip packages are based on sgd and are internally different from the AffyDatPkg conforming packages.

ArabidopsisDatPkg is a subclass of DatPkg used to represent the annotation packages for Arabidopsis. These packages are internally slightly different from the AffyDatPkg conforming packages.

Org.XX.egDatPkg is a subclass of DatPkg used to represent the org.*.eg.db organism-level Entez Gene based annotation data packages.

OBOCollectionDatPkg is a subclass of DatPkg used to represent the OBO based annotation data packages.

GeneSetCollectionDatPkg is a subclass of DatPkg used to represent annotations in the form of GeneSetCollection objects which are not based on any annotation packages but are instead derived from custom (user supplied) annotations.

These methods have been extended to accommodate uninstalled annotation objects, primarily those available from the AnnotationHub package. See below for an example.

Author

Seth Falcon

Examples

DatPkgFactory("hgu95av2")
DatPkgFactory("org.Sc.sgd")
DatPkgFactory("org.Hs.eg.db")
DatPkgFactory("ag")
library(AnnotationHub)
hub <- AnnotationHub()
## get an OrgDb for Atlantic salmon
query(hub, c("salmo salar","orgdb"))
salmodb <- hub[["AH58003"]]
DatPkgFactory(salmodb)
Link to this function

GOHyperGParams_class()

Class "GOHyperGParams"

Description

A parameter class for representing all parameters needed for running the hyperGTest method with one of the GO ontologies (BP, CC, MF) as the category.

Seealso

HyperGResult-class GOHyperGParams-class hyperGTest

Author

S. Falcon

Link to this function

GSEAGOHyperGParams()

Helper function for constructing a GOHyperGParams objects or KEGGHyperGParams objects from a GeneSetCollection

Description

Helps to create A parameter class for representing all parameters needed for running the hyperGTest method. If it is a GOHyperGParams object, being made, then with one of the GO ontologies (BP, CC, MF) as the category. This function will construct the parameter object from a GeneSetCollection object and if necessary will also try to check to make sure that the object is based on a GO2ALL mapping.

Usage

GSEAGOHyperGParams(name, geneSetCollection, geneIds, universeGeneIds,
  ontology, pvalueCutoff, conditional, testDirection, ...)
  GSEAKEGGHyperGParams(name, geneSetCollection, geneIds, universeGeneIds,
  pvalueCutoff, testDirection, ...)

Arguments

ArgumentDescription
nameString specifying name of the GeneSetCollection.
geneSetCollectionA GeneSetCollection Object. If a GOHyperGParams object is sought, then this GeneSetCollection should be based on a GO2ALLFrame object and so the idType of that GeneSetCollection should be GOAllFrameIdentifier. If a KEGGHyperGParams object is sought then a GeneSetCollection based on a KEGGFrame object should be used and the idType will be a KEGGFrameIdentifier.
geneIdsObject of class "ANY" : A vector of gene identifiers. Numeric and character vectors are probably the only things that make sense. These are the gene ids for the selected gene set.
universeGeneIdsObject of class "ANY" : A vector of gene ids in the same format as geneIds defining a subset of the gene ids on the chip that will be used as the universe for the hypergeometric calculation. If this is NULL or has length zero, then all gene ids on the chip will be used.
ontologyA string specifying the GO ontology to use. Must be one of "BP", "CC", or "MF". (used with GO only)
pvalueCutoffA numeric values between zero and one used as a p-value cutoff for p-values generated by the Hypergeometric test. When the test being performed is non-conditional, this is only used as a default value for printing and summarizing the results. For a conditional analysis, the cutoff is used during the computation to determine perform the conditioning: child terms with a p-value less than pvalueCutoff are conditioned out of the test for their parent term.
conditionalA logical indicating whether the calculation should condition on the GO structure. (GO only)
testDirectionA string which can be either "over" or "under". This determines whether the test performed detects over or under represented GO terms.
...optional arguments to configure the GOHyperGParams object.

Seealso

HyperGResult-class GOHyperGParams-class hyperGTest

Author

M. Carlson

Link to this function

HyperGParams_class()

Class "HyperGParams"

Description

An abstract (VIRTUAL) parameter class for representing all parameters needed by a method specializing the hyperGTest generic. You should only use subclasses of this class directly.

Seealso

HyperGResult-class GOHyperGParams-class KEGGHyperGParams-class hyperGTest

Author

S. Falcon

Link to this function

HyperGResultBase_class()

Class "HyperGResultBase"

Description

This VIRTUAL class represents common elements of the return values of generic functions like hyperGTest . All subclasses are intended to implement the accessor functions documented at HyperGResult-accessors .

Seealso

HyperGResult-class GOHyperGResult-class HyperGResult-accessors

Author

Seth Falcon

Link to this function

HyperGResult_accessors()

Accessors for HyperGResult Objects

Description

This manual page documents generic functions for extracting data from the result object returned from a call to hyperGTest . The result object will be a subclass of HyperGResultBase . Methods apply to all result object classes unless otherwise noted.

Usage

pvalues(r)
oddsRatios(r)
expectedCounts(r)
geneCounts(r)
universeCounts(r)
universeMappedCount(r)
geneMappedCount(r)
geneIds(object, ...)
geneIdUniverse(r, cond = TRUE)
geneIdsByCategory(r, catids = NULL)
sigCategories(r, p)
## R CMD check doesn't like these
## annotation(r)
## description(r)
testName(r)
pvalueCutoff(r)
testDirection(r)
chrGraph(r)

Arguments

ArgumentDescription
r, objectAn instance of a subclass of HyperGResultBase .
catidsA character vector of category identifiers.
pNumeric p-value used as a cutoff for selecting a subset of the result.
condA logical value indicating whether to return conditional results for a conditional test. The default is TRUE . For non-conditional results, this argument is ignored.
...Additional arguments that may be used by specializing methods.

Seealso

hyperGTest HyperGResult-class HyperGParams-class GOHyperGParams-class KEGGHyperGParams-class

Author

Seth Falcon

Examples

## Note that more in-depth examples can be found in the GOstats
## vignette (Hypergeometric tests using GOstats).
library("hgu95av2.db")
library("annotate")

## Retrieve 300 probeids that have PFAM ids
probids <- keys(hgu95av2.db,keytype="PROBEID",column="PFAM")[1:300]

## get unique Entrez Gene IDs
geneids <- select(hgu95av2.db, probids, 'ENTREZID', 'PROBEID')
geneids <- unique(geneids[['ENTREZID']])

## Now do the same for the universe
univ <- keys(hgu95av2.db,keytype="PROBEID",column="PFAM")
univ <- select(hgu95av2.db, univ, 'ENTREZID', 'PROBEID')
univ <- unique(univ[['ENTREZID']])

p <- new("PFAMHyperGParams", geneIds=geneids, universeGeneIds=univ,
annotation="hgu95av2")
## this takes a while...
if(interactive()){
hypt <- hyperGTest(p)
summary(hypt)
htmlReport(hypt, file="temp.html", summary.args=list("htmlLinks"=TRUE))
}
Link to this function

HyperGResult_class()

Class "HyperGResult"

Description

This class represents the results of a test for over-representation of categories among genes in a selected gene set based upon the Hypergeometric distribution. The hyperGTest generic function returns an instance of the HyperGResult class. For details on accessing the results, see HyperGResult-accessors .

Seealso

HyperGResultBase-class GOHyperGResult-class HyperGResult-accessors

Author

Seth Falcon

Link to this function

KEGGHyperGParams_class()

Class "KEGGHyperGParams" and "PFAMHyperGParams"

Description

Parameter classes for representing all parameters needed for running the hyperGTest method with KEGG or PFAM as the category.

Seealso

HyperGResult-class GOHyperGParams-class hyperGTest

Author

S. Falcon

Link to this function

LinearMParams_class()

Class "LinearMParams"

Description

A parameter class for representing all parameters needed by a method specializing the linearMTest generic.

Seealso

See linearMTest for examples. ChrMapLinearMParams is a specialization of this class for chromosome maps.

Author

Deepayan Sarkar, Michael Lawrence

Link to this function

LinearMResultBase_class()

Class "LinearMResultBase"

Description

This VIRTUAL class represents common elements of the return values of generic functions like linearMTest . These elements are essentially those that are passed through from the input parameters. See LinearMResult for a concrete result class with the basic outputs.

Seealso

LinearMResult , LinearMParams , linearMTest

Author

Deepayan Sarkar, Michael Lawrence

Link to this function

LinearMResult_class()

Class "LinearMResult"

Description

This class represents the results of a test for systematic change in some gene-level statistic by gene sets. The linearMTest generic function returns an instance of the LinearMResult class.

Seealso

linearMTest

Author

Deepayan Sarkar, Michael Lawrence

Examples

showClass("LinearMResult")

Mapping chromosome bands to genes

Description

These functions return a mapping of chromosome bands to genes. makeChrBandGSC returns a GeneSetCollection object, with a GeneSet for each band. The other functions return a 0/1 incidence matrix with a row for each chromosme band and a column for each gene. Only those chromosome bands with at least one gene annotation will be included.

Usage

MAPAmat(chip, univ = NULL, minCount = 0)
makeChrBandInciMat(chrGraph)
makeChrBandGSC(chrGraph)

Arguments

ArgumentDescription
chipA string giving the annotation source. For example, "hgu133plus2"
univA vector of gene IDs (these should be Entrez IDs for most annotation sources). The the annotations will be limited to those in the set specified by univ . If univ is NULL (default), then the gene IDs are those found in the annotation data source.
chrGraphA graph object as returned by makeChrBandGraph
minCountBands with less than minCount genes will be excluded from the returned matrix. If minCount is 0 , no bands will be removed, this is the default.

Value

For makeChrBandGSC , a GeneSetCollection object with a GeneSet for each band.

For the other functions, (0/1) incidence matrix with chromosome bands as rows and gene IDs as columns. A 1 in m[i, j] indicates that the chromosome band rownames(m)[i] contains the geneID colnames(m)[j] .

Seealso

makeChrBandGraph , cateGOry , probes2MAP

Author

Seth Falcon, Michael Lawrence

Examples

have_hgu95av2.db <- suppressWarnings(require("hgu95av2.db"))
if (have_hgu95av2.db)
mam <- MAPAmat("hgu95av2.db")
Link to this function

NewChrBandTree()

Create a new ChrBandTree object

Description

NewChrBandTree and ChrBandTreeFromGraph provide constructors for the ChrBandTree class.

Usage

NewChrBandTree(chip, univ)
ChrBandTreeFromGraph(g)

Arguments

ArgumentDescription
chipThe name of an annotation data package
univA vector of gene identifiers that defines the universe of genes. Usually, this will be a vector of Entez Gene IDs. If univ is NULL , then all genes probed on the specified chip will be in the universe. We strongly recommend using the set of genes that remains after applying a non-specific filter as the universe.
gA graph instance as returned by makeChrBandGraph

Value

A new ChrBandTree instance.

Seealso

ChrBandTree-class

Author

S. Falcon

Link to this function

OBOHyperGParams_class()

Class "OBOHyperGParams"

Description

A parameter class for representing all parameters needed for running the hyperGTest method with an ontology adhered to the OBO Foundry (see http://www.obofoundry.org ) as the category.

Seealso

HyperGResult-class hyperGTest

Author

R. Castelo

Link to this function

applyByCategory()

Apply a function to a vector of statistics, by category

Description

For each category, apply the function FUN to the set of values of stats belonging to that category.

Usage

applyByCategory(stats, Amat, FUN = mean, ...)

Arguments

ArgumentDescription
statsNumeric vector with test statistics of interest.
AmatA logical or numeric matrix: the adjacency matrix of the bipartite genes - category graph. Its rows correspond to the categories, columns to the genes, and TRUE or a numeric value different from 0 indicates membership. The columns are assumed to be aligned with the elements of stats .
FUNA function to apply to the subsets stats by categories.
list()Extra parameters passed to FUN .

Details

For GO categories, the function cateGOry might be useful for the construction of Amat .

Value

The return value is a list or vector of length equal to the number of categories. Each element corresponds to the values obtained by applying FUN to the subset of values in stats according to the category defined for that row.

Seealso

apply

Author

R. Gentleman, contributions from W. Huber

Examples

set.seed(0xabcd)
st = rnorm(20)
names(st) = paste("gene", 1:20)

a = matrix(sample(c(FALSE, TRUE), 60, replace=TRUE), nrow=3,
dimnames = list(paste("category", LETTERS[1:3]), names(st)))

applyByCategory(st, a, median)
Link to this function

cateGOryMatrix()

Construct a category membership matrix from a list of gene identifiers and their annotated GO categories.

Description

The function constructs a category membership matrix, such as used by applyByCategory , from a list of gene identifiers and their annotated GO categories. For each of the GO categories stated in categ , all less specific terms (ancestors) are also included, thus one need only obtain the most specific set of GO term mappings, which can be obtained from Bioconductor annotation packages or via biomaRt . The ancestor relationships are obtained from the GO.db package.

Usage

cateGOry(x, categ, sparse=FALSE)

Arguments

ArgumentDescription
xCharacter vector with (arbitrary) gene identifiers. They will be used for the column names of the resulting matrix.
categA character vector of the same length as x with GO annotations for the genes in x . If a gene has multiple GO annotations, it is expected to occur multiple times in x , once for each different annotation.
sparseLogical. If TRUE , the resulting matrix is constructed using Matrix , otherwise, R's base matrix is used.

Details

The function requires the GO package.

For subsequent analyses, it is often useful to remove categories that have only a small number of members. Use the normal matrix subsetting syntax for this, see example.

If a GO category in categ is not found in the GO annotation package, a warning will be generated, and no ancestors for that GO category are added (but that category itself will be part of the returned adjacency matrix).

Value

The adjacency matrix of the bipartite category membership graph, rows are categories and columns genes.

Seealso

applyByCategory

Author

Wolfgang Huber

Examples

g = cateGOry(c("CG2671", "CG2671", "CG2950"),
c("GO:0090079", "GO:0001738", "GO:0003676"), sparse=TRUE)
g

rowSums(g)   ## number of genes in each category

## Filter out categories with less than minMem and more than maxMem members.
## This is toy data, in real applications, a choice of minMem higher
## than 2 will be more appropriate.
filter = function(x, minMemb = 2, maxMemb = 35) ((x>=minMemb) & (x<=maxMemb))
g[filter(rowSums(g)),,drop=FALSE ]
Link to this function

categoryToEntrezBuilder()

Return a list mapping category ids to Entrez Gene ids

Description

Return a list mapping category ids to the Entrez Gene ids annotated at the category id. Only those category ids that have at least one annotation in the set of Entrez Gene ids specified by the geneIds slot of p are included.

Usage

categoryToEntrezBuilder(p)

Arguments

ArgumentDescription
pA subclass of HyperGParams-class

Details

End users should not call this directly. This method gets called from hyperGTest . To add support for a new category, a new method for this generic must be defined. Its signature should match a subclass of HyperGParams-class appropriate for the new category.

Value

A list mapping category ids to Entrez Gene identifiers.

Seealso

hyperGTest HyperGParams-class

Author

S. Falcon

Link to this function

cb_contingency()

Create and Test Contingency Tables of Chromosome Band Annotations

Description

For each chromosome band identifier in chrVect , cb_contingency builds and performs a test on a 2 x k contingency table for the genes from selids found in the child bands of the given chrVect element.

cb_sigBands extracts the chromosome band identifiers that were in a contingency table that tested significant given the specified p-value cutoff.

cb_children returns the child bands of a given band in the chromosome band graph. The argument must have length equal to one.

Usage

cb_contingency(selids, chrVect, chrGraph, testFun = chisq.test,
               min.expected = 5L, min.k = 1L)
cb_sigBands(b, p.value = 0.01)
cb_children(n, chrGraph)

Arguments

ArgumentDescription
selidsA vector of the selected gene identifiers (usual Entrez IDs).
chrVectA character vector of chromosome band identifiers
chrGraphA graph object as returned by makeChrBandGraph . The nodes should be chromosome band IDs and the edges should represent the tree structure of the bands. Furthermore, the graph is expected to have a "geneIds" node attribute providing a vector of gene IDs annotated at each band.
testFunThe function to use for testing the 2 x k contingency tables. The default is chisq.test . It will be called with a single argument, a 2 x k matrix representing the contingency table.
min.expectedA numeric value specifying the minimum expected count for columns to be included in the contingency table. The expected count is (rowSum * colSum) / n . Chromosome bands with a select cell count less than min.expected are dropped from the table before testing occurs. If NULL , then no bands will be dropped.
min.kAn integer giving the minimum number of chromosome bands that must be present in a contingency table in order to proceed with testing.
bA list as returned by cb_contingency
p.valueA p-value cutoff to use in selecting significant contingency tables.
nA length one character vector specifying a chromosome band annotation. Bands not found in chrGraph will return character(0) when passed to cb_children .

Details

cb_sigBands assumes that the p-value associated with a result of testFun can by accessed as testFun(t)$p.value . We should improve this to be a method call which can then be specialized based on the class of the object returned by testFun .

Value

cb_contingency returns a list with an element for each test performed. This will most often be shorter than length(chrVect) due to skipped tests based on min.found and min.k . Each element of the returned list is itself a list with components:

  • cb_sigBands returns a character vector of chromosome band identifiers that are in one of the contingency tables that had a p-value less than the cutoff specified by p.value .

Author

Seth Falcon

Link to this function

cb_parse_band_Hs()

Parse Homo Sapiens Chromosome Band Annotations

Description

This function parses chromosome band annotations as found in the MAP map of Bioconductor annotation data packages. The return value is a vector of parent bands up to the relevant chromosome.

Usage

cb_parse_band_Hs(x)

Arguments

ArgumentDescription
xA chromosome band annotation given as a string.

Details

The former function cb_parse_band_hsa is now deprecated.

Value

A character vector giving the path to the relevant chromosome.

Author

Seth Falcon

Examples

cb_parse_band_Hs("12q32.12")
Link to this function

cb_parse_band_Mm()

Parse Mus Musculus Chromosome Band Annotations

Description

This function parses chromosome band annotations as found in the MAP map of Bioconductor annotation data packages. The return value is a vector of parent bands up to the relevant chromosome.

Usage

cb_parse_band_Mm(x)

Arguments

ArgumentDescription
xA chromosome band annotation given as a string.

Value

A character vector giving the path to the relevant chromosome.

Author

Seth Falcon & Nolwenn Le Meur

Examples

cb_parse_band_Mm("10 B3")

Chromosome Band Tree-Based Hypothesis Testing

Description

cb_test is a flexible tool for discovering interesting chromosome bands relative to a selected gene list. The function supports local and global tests which can be carried out in a top down or bottom up fashion on the tree of chromosome bands.

Usage

cb_test(selids, chrtree, level, dir = c("up", "down"),
       type = c("local", "global"), next.pval = 0.05,
       cond.pval = 0.05, conditional = FALSE)

Arguments

ArgumentDescription
selidsA vector of gene IDs. The IDs should match those used to annotatate the ChrBandTree given by chrtree . In most cases, these will be Entrez Gene IDs.
chrtreeA ChrBandTree object representing the chromosome bands and the mapping to gene identifiers. The genes in the ChrBandTree are limited to the universe of gene IDs specified at object creation time.
levelAn integer giving the level of the chromosome band tree at which testing should begin. The level is conceptualized as the set of nodes with a given path length to the root (organism) node of the chromosome band tree. So level 1 is the chromosome and level 2 is the chromosome arms. You can get a better sense by calling exampleLevels(chrtree)
dirA string giving the direction in which the chromosome band tree will be traversed when carrying out the tests. A bottom up traversal, from leaves to root, is specified by "up" . A top down, from root to leaves, traversal is specified by "down" .
typeA string giving the type of test to perform. The current choices are "local" and "global" . A local test carries out a chisq.test on each 2 x K contingency table induced by each set of siblings at a given level in the tree. A global test uses the Hypergeometric distribution to compute a p-value for the 2 x 2 tables induced by each band treated independently.
next.pvalThe p-value cutoff used to determine whether the parents or children of a node should be tested. After testing a given level of the tree, the decision of whether or not to continue testing the children (or parents) of the already tested nodes is made by comparing the p-value result for a given node with this cutoff; relatives of nodes with values strictly greater than the cutoff are skipped.
cond.pvalThe p-value cutoff used to determine whether a node is significant during a conditional test. See conditional .
conditionalA logical value. Can only be used when dir="up" and type="global" . In this case, a TRUE value causes a conditional Hypergeometric calculation to be performed. The genes annotated at significant children of a given band are removed before testing.

Value

A list with an element for each level of the tree that was tested. Note that the first element will correspond to the level given by level and that subsequent elements will be the next or previous depending on dir .

Each level element is itself a list consisting of a result list for each node or set of nodes tested. These inner-most lists will have, at least, the following components:

*

Author

Seth Falcon

Extract estimated effect sizes

Description

This function extracts estimated effect sizes from the results of a linear model-based gene-set / category enrichment test.

Usage

effectSize(r)

Arguments

ArgumentDescription
rThe results of the test

Value

A numeric vector.

Seealso

linkS4class{LinearMResult}

Author

Deepayan Sarkar

Link to this function

exampleLevels()

Display a sample node from each level of a ChrBandTree object

Description

The "levels" of a chromosome band tree represented by a ChrBandTree object are the sets of nodes with a given path length to the root node. This function displays the available levels along with an example node from each level.

Usage

exampleLevels(g)

Arguments

ArgumentDescription
gA ChrBandTree object

Value

A list with an element for each level. The names of the list are the levels. Each element is an example of a node from the given level.

Author

S. Falcon

Compute per category summary statistics

Description

For a given incidence matrix, Amat , compute some per category statistics.

Usage

findAMstats(Amat, tstats)

Arguments

ArgumentDescription
AmatAn incidence matrix, with categories as the rows and probes as the columns.
tstatsA vector of per probe test statistics (should be the same length as ncol(Amat) .

Details

Simple summary statistics are computed, such as the row sums and the vector of per category sums of the test statistics, tstats .

Value

A list with components,

*

Seealso

applyByCategory

Author

R. Gentleman

Examples

ts = rnorm(100)
Am = matrix(sample(c(0,1), 1000, replace=TRUE), ncol=100)
findAMstats(Am, ts)

A function to print pathway names given their numeric ID.

Description

Given a KEGG pathway ID this function returns the character name of the pathway.

Usage

getPathNames(iPW)

Arguments

ArgumentDescription
iPWA vector of KEGG pathway IDs.

Details

This function simply does a look up in KEGGPATHID2NAME and returns a list of the pathway names.

Possible extensions would be to extend it to work with the cMAP library as well.

Value

A list of pathway names.

Seealso

KEGGPATHID2NAME

Author

R. Gentleman

Examples

nms = "00031"
getPathNames(nms)

Permutation p-values for GSEA

Description

This function performs GSEA computations and returns p-values for each gene set based on repeated permutation of the phenotype labels.

Usage

gseattperm(eset, fac, mat, nperm)

Arguments

ArgumentDescription
esetAn ExpressionSet object
facA factor identifying the phenotypes in eset . Usually, this will be one of the columns in the phenotype data associated with eset .
matA 0/1 incidence matrix with each row representing a gene set and each column representing a gene. A 1 indicates membership of a gene in a gene set.
npermNumber of permutations to test to build the reference distribution.

Details

The t-statistic is used (via rowttests ) to test for a difference in means between the phenotypes determined by fac within each gene set (given as a row of mat ).

A reference distribution for these statistics is established by permuting fac and repeating the test B times.

Value

A matrix with the same number of rows as mat and two columns, "Lower" and "Upper" . The "Lower" ( "Upper" ) column gives the probability of seeing a t-statistic smaller (larger) than the observed.

Author

Seth Falcon

Examples

## This example uses a random sample of probesets and a randomly
## generated category matrix.  The results, therefore, are not
## meaningful, but the code demonstrates how to use gseattperm without
## requiring any expensive computations.

## Obtain an ExpressionSet with two types of samples (mol.biol)
haveALL <- require("ALL")
if (haveALL) {
data(ALL)
set.seed(0xabcd)
rndIdx <- sample(1:nrow(ALL), 500)
Bcell <- grep("^B", as.character(ALL$BT))
typeNames <- c("NEG", "BCR/ABL")
bcrAblOrNegIdx <- which(as.character(ALL$mol.biol) %in% typeNames)
s <- ALL[rndIdx, intersect(Bcell, bcrAblOrNegIdx)]
s$mol.biol <- factor(s$mol.biol)

## Generate a random category matrix
nCats <- 100
set.seed(0xdcba)
rndCatMat <- matrix(sample(c(0L, 1L), replace=TRUE),
nrow=nCats, ncol=nrow(s),
dimnames=list(
paste("c", 1:nCats, sep=""),
featureNames(s)))

## Demonstrate use of gseattperm
N <- 10
pvals <- gseattperm(s, s$mol.biol, rndCatMat, N)
pvals[1:5, ]
}

Hypergeometric Test for association of categories and genes

Description

Given a subclass of HyperGParams , compute Hypergeomtric p-values for over or under-representation of each term in the specified category among the specified gene set.

Usage

hyperGTest(p)

Arguments

ArgumentDescription
pAn instance of a subclass of HyperGParams . This parameter object determines the category of interest (e.g., GO or KEGG) as well as the gene set.

Details

The gene identifiers in the geneIds slot of p define the selected set of genes. The universe of gene ids is determined by the chip annotation found in the annotation slot of p . Both the selected genes and the universe are reduced by removing identifiers that do not have any annotations in the specified category.

For each term in the specified category that has at least one annotation in the selected gene set, we determine how many of its annotations are in the universe set and how many are in the selected set. With these counts we perform a Hypergeometric test using phyper . This is equivalent to using Fisher's exact test.

It is important that the correct chip annotation data package be identified as it determines the universe of gene identifiers and is often used to determine the mapping between the category term and the gene identifiers.

For S. cerevisiae if the annotation slot of p is set to '"org.Sc.sgd"' then comparisons and statistics are computed using common names and are with respect to all genes annotated in the S. cerevisiae genome not with respect to any microarray chip. This will not be the right thing to do if you are working with a yeast microarray.

Value

A HyperGResult instance.

Seealso

HyperGResult-class HyperGParams-class GOHyperGParams-class KEGGHyperGParams-class

Author

S. Falcon

Hypergeometric (gene set enrichment) tests on character vectors.

Description

This function performs a hypergeometric test for over- or under-representation of significant genes amongst those assayed in a universe of genes. It provides an interface based on character vectors of identifying member of gene sets and the gene universe.

Usage

hyperg(assayed, significant, universe,
    representation = c("over", "under"), ...)

Arguments

ArgumentDescription
assayedA vector of assayed genes (or other identifiers). assayed may be a character vector (defining a single gene set) or list of character vectors (defining a collection of gene sets).
significantA vector of assayed genes that were differentially expressed. If assayed is a character vector, then significant must also be a character vector; likewise when assayed is a list .
universeA character vector defining the universe of genes.
representationEither over or under , to indicate testing for over- or under-representation, respectively, of differentially expressed genes.
list()Additional arguments, unused.

Value

When invoked with a character vector of assayed genes, a named numeric vector providing the input values, P-value, odds ratio, and expected number of significantly expressed genes.

When invoked with a list of character vectors of assayed genes, a data frame with columns of input values, P-value, odds ratio, and expected number of significantly expressed genes.

Seealso

hyperGTest for convenience functions using Bioconductor annotation resources such as GO.db.

Author

Martin Morgan mtmorgan@fhcrc.org with contributions from Paul Shannon.

Examples

set.seed(123)

## artifical sets -- affy probes grouped by protein family
library(hgu95av2.db)
map <- select(hgu95av2.db, keys(hgu95av2.db), "PFAM")
sets <- Filter(function(x) length(x) >= 10, split(map$PROBEID, map$PFAM))

universe <- unlist(sets, use.names=FALSE)
siggenes <- sample(universe, length(universe) / 20)  ## simulate
sigsets <- Map(function(x, y) x[x %in% y], sets, MoreArgs=list(y=siggenes))

result <- hyperg(sets, sigsets, universe)
head(result)

A linear model-based test to detect enrichment of unusual genes in categories

Description

Given a subclass of LinearMParams , compute p-values for detecting systematic up or downregulation of the specified gene set in the specified category.

Usage

linearMTest(p)

Arguments

ArgumentDescription
pAn instance of a subclass of LinearMParams . This parameter object determines the category of interest (currently, only chromosome bands) as well as the gene set.

Details

The per-gene statistics in the geneStats slot of p give a measure of up or down regulation of the individual genes in the universe. % The list of genes is reduced by removing identifiers that do not have % any annotations in the specified category.

%% FIXME: more details needed

% It is important that the correct chip annotation data package be % identified as it determines the universe of gene identifiers and is % often used to determine the mapping between the category term and the % gene identifiers.

% For S. cerevisiae if the code{annotation} slot of code{p} is set to % '"YEAST"' then comparisons and statistics are computed using common % names and are with respect to all genes annotated in the S. cerevisiae % genome not with respect to any microarray chip. This will not be % the right thing to do if you are working with a yeast microarray.

Value

A LinearMResult instance.

Seealso

LinearMResult-class LinearMParams-class

Author

D. Sarkar

Link to this function

local_test_factory()

Local and Global Test Function Factories

Description

These functions return functions appropriate for use as the tfun argument to topdown_tree_visitor or bottomup_tree_visitor . In particular, it is these functions that are associated with the "local" and "global" options for the type argument to cb_test .

Usage

local_test_factory(selids, tableTest = chisq.test)
hg_test_factory(selids, PCUT = 0.05, COND = FALSE, OVER = TRUE)

Arguments

ArgumentDescription
selidsA vector of gene IDs. The IDs should match those used to annotatate the ChrBandTree given by chrtree . In most cases, these will be Entrez Gene IDs.
tableTestA contingency table testing function. The behavior of this function must be reasonably close to that of chisq.test .
PCUTA p-value cutoff that will be used to determine if a given test is significant or not when using hg_test_factory with COND=TRUE .
CONDA logical value indicating whether a conditional test should be performed.
OVERIf TRUE , test for over representation, if FALSE , test for under representation.

Details

The returned functions have signature f(start, g, prev_ans) where start is a vector of start nodes, g is a chromosome band tree graph, and prev_ans can contain the previous result returned by a call to this function.

Value

A function that can be used as the tfun argument to the tree visitor functions.

Seealso

cb_test

Author

Seth Falcon

Link to this function

makeChrBandGraph()

Create a graph representing chromosome band annotation data

Description

This function returns a graph object representing the nested structure of chromosome bands (also known as cytogenetic bands). The nodes of the graph are band identifiers. Each node has a geneIds node attribute that lists the gene IDs that are annotated at the band (the gene IDs will be Entrez IDs in most cases).

Usage

makeChrBandGraph(chip, univ = NULL)

Arguments

ArgumentDescription
chipA string giving the annotation source. For example, "hgu133plus2"
univA vector of gene IDs (these should be Entrez IDs for most annotation sources). The annotations attached to the graph will be limited to those specified by univ . If univ is NULL (default), then the gene IDs are those found in the annotation data source.

Details

This function parses the data stored in the <chip>MAP map from the appropriate annotation data package. Although cytogenetic bands are observed in all organisms, currently, only human and mouse band nomenclatures are supported.

Value

A graph-class instance. The graph will be a tree and the root node is labeled for the organism.

Author

Seth Falcon

Examples

chrGraph <- makeChrBandGraph("hgu95av2.db")
chrGraph

A function to make the contrast vectors needed for EBarrays

Description

Using EBarrays to detect differential expression requires the construction of a set of contrasts. This little helper function computes these contrasts for a two level factor.

Usage

makeEBcontr(f1, hival)

Arguments

ArgumentDescription
f1The factor that will define the contrasts.
hivalThe level of the factor to treat as the high level.

Details

Not much more to add, see EBarrays for more details. This is used in the Category package to let users compute the posterior probability of differential expression, and hence to compute expected numbers of differentially expressed genes, per category.

Value

An object of class `ebarraysPatterns''. ## Seealso [ebPatterns`](#ebpatterns) ## Author R. Gentleman ## Examples r if( require("EBarrays") ) { myfac = factor(rep(c("A", "B"), c(12, 24))) makeEBcontr(myfac, "B") }

Link to this function

makeValidParams()

Non-standard Generic for Checking Validity of Parameter Objects

Description

This function is not intended for end-users, but may be useful for developers extending the Hypergeometric testing capabilities provideded by the Category package.

makeValidParams is intended to validate a parameter object instance (e.g. HyperGParams or subclass). The idea is that unlike validObject , methods for this generic attempt to fix invalid instances when possible, and in this case issuing a warning, and only give an error if the object cannot be fixed.

Usage

makeValidParams(object)

Arguments

ArgumentDescription
objectA parameter object. Consult showMethods to see signatures currently supported.

Value

The value must have the same class as the object argument.

Author

Seth Falcon

Map probe IDs to MAP regions.

Description

This function maps probe identifiers to MAP positions using the appropriate Bioconductor meta-data package.

Usage

probes2MAP(pids, data = "hgu133plus2")

Arguments

ArgumentDescription
pidsA vector of probe IDs for the chip in use.
dataThe name of the chip, as a character string.

Details

Probes are mapped to regions, no checking for duplicate Entrez gene IDs is done.

Value

A vector, the same length as pids , with the MAP locations.

Seealso

probes2Path

Author

R. Gentleman

Examples

set.seed(123)
library("hgu95av2.db")
v1 = sample(names(as.list(hgu95av2MAP)), 100)
pp = probes2MAP(v1, "hgu95av2.db")

A function to map probe identifiers to pathways.

Description

Given a set of probe identifiers from a microarray this function looks up all KEGG pathways that the probe is documented to be involved in.

Usage

probes2Path(pids, data = "hgu133plus2")

Arguments

ArgumentDescription
pidsA vector of probe identifiers.
dataThe character name of the chip.

Details

This is a simple look up in the appropriate chip PATH data environment.

Value

A list of pathway vectors. One element for each value of pid that is mapped to at least one pathway.

Seealso

findAMstats

Author

R. Gentleman

Examples

library("hgu95av2.db")
x = c("1001_at", "1000_at")
probes2Path(x, "hgu95av2.db")

Tree Visitor Function

Description

This function visits each node in a tree-like object in an order determined by the relationOf function. The function given by tfun is called for each set of nodes and the function nfun determines which nodes to test next optionally making use of the result of the previous test.

Usage

tree_visitor(g, start, tfun, nfun, relationOf)
topdown_tree_visitor(g, start, tfun, nfun)
bottomup_tree_visitor(g, start, tfun, nfun)

Arguments

ArgumentDescription
gA tree-like object that supports the method given by relationOf .
startThe set of nodes to start the computation (can be a list of siblings), but the nodes should all belong to the same level of the tree (same path length to root node).
tfunThe test function applied to each list of siblings at each level starting with start . The signature of tfun should be (start, g, prev_ans) .
nfunA function with signature (ans, g) that processes the result of tfun and returns a character vector of node names corresponding to nodes that were involved in an "interesting" test. This is used to determine the next set of nodes to test (see details).
relationOfThe method used to traverse the tree. For example childrenOf or parentOf .

Details

The tree_visitor function is intended to allow developers to quickly prototype different statistical testing paradigms on trees. It may be possible to extend this to work for DAGs.

The visit begins by calling tfun with the nodes provided by start . The result of each call to tfun is stored in an environment. The concept is visitation by tree level and so each result is stored using a key representing the level (this isn't quite right since the nodes in start need not be first level, but they will be assigned key "1". After storing the result, nfun is used to obtain a vector of accepted node labels. The idea is that the user should have a way of determining which nodes in the next level of the tree are worth testing. The next start set is determined by start <- relationOf(g, accepted) where accepted is unique(nfun(ans, g)) .

Value

A list. See the return value of cb_test to get an idea. Each element of the list represents a call to tfun at a given level of the tree.

Author

Seth Falcon

A simple function to compute a permutation t-test.

Description

The data matrix, x , with two-level factor, fac , is used to compute t-tests. The values of fac are permuted B times and the complete set of t-tests is performed for each permutation.

Usage

ttperm(x, fac, B = 100, tsO = TRUE)

Arguments

ArgumentDescription
xA data matrix. The number of columns should be the same as the length of fac .
facA factor with two levels.
BAn integer specifying the number of permutations.
tsOA logical indicating whether to compute only the t-test statistic for each permuation. If FALSE then p-values are also computed - but this can be very slow.

Details

Not much more to say. Probably there is a generic function somewhere, but I could not find it.

Value

A list, the first element is named obs and contains the true, observed, values of the t-statistic. The second element is named ans and contains a list of length B containing the different permuations.

Seealso

rowttests

Author

R. Gentleman

Examples

x=matrix(rnorm(100), nc=10)
y = factor(rep(c("A","B"), c(5,5)))
ttperm(x, y, 10)
Link to this function

universeBuilder()

Return a vector of gene identifiers with category annotations

Description

Return all gene ids that are annotated at one or more terms in the category. If the universeGeneIds slot of p has length greater than zero, then the intersection of the gene ids specified in that slot and the normal return value is given.

Usage

universeBuilder(p)

Arguments

ArgumentDescription
pA subclass of HyperGParams-class

Details

End users should not call this directly. This method gets called from hyperGTest . To add support for a new category, a new method for this generic must be defined. Its signature should match a subclass of HyperGParams-class appropriate for the new category.

Value

A vector of gene identifiers.

Seealso

hyperGTest HyperGParams-class

Author

S. Falcon