bioconductor v3.9.0 SummarizedExperiment

The SummarizedExperiment container contains one or more assays,

Link to this section Summary

Functions

Assays objects

RangedSummarizedExperiment objects

SummarizedExperiment objects

Coverage of a RangedSummarizedExperiment object

Finding overlapping ranges in RangedSummarizedExperiment objects

Inter range transformations of a RangedSummarizedExperiment object

Intra range transformations of a RangedSummarizedExperiment object

Make a RangedSummarizedExperiment from a data.frame or DataFrame

Make a RangedSummarizedExperiment object from an ExpressionSet and vice-versa

Make a SummarizedExperiment from a '.loom' hdf5 file

Finding the nearest range neighbor in RangedSummarizedExperiment objects

Input kallisto or kallisto bootstrap results.

Link to this section Functions

Assays objects

Description

The Assays virtual class and its methods provide a formal abstraction of the assays slot of SummarizedExperiment objects.

SimpleListAssays and ShallowSimpleListAssays are concrete subclasses of Assays with the latter being currently the default implementation of Assays objects. Other implementations (e.g. disk-based) could easily be added.

Note that these classes are not meant to be used directly by the end-user and the material in this man page is aimed at package developers.

Details

Assays objects have a list-like semantics with elements having matrix- or array-like semantics (e.g., dim , dimnames ).

The Assays API consists of:

  • (a) The Assays() constructor function.

  • (b) Lossless back and forth coercion from/to SimpleList . The coercion method from SimpleList doesn't need (and should not) validate the returned object.

  • (c) length , names , names<- , [[ , [[<- , dim , [ , [<- , rbind , cbind .
    An Assays concrete subclass needs to implement (b) (required) plus, optionally any of the methods in (c).

IMPORTANT: Methods that return a modified Assays object (a.k.a. endomorphisms), that is, [ as well as replacement methods names<- , [[<- , and [<- , must respect the copy-on-change contract . With objects that don't make use of references internally, the developer doesn't need to take any special action for that because it's automatically taken care of by R itself. However, for objects that do make use of references internally (e.g. environments, external pointers, pointer to a file on disk, etc...), the developer needs to be careful to implement endomorphisms with copy-on-change semantics. This can easily be achieved (and is what the default methods for Assays objects do) by performaing a full (deep) copy of the object before modifying it instead of trying to modify it in-place. Note that the full (deep) copy is not always necessary in order to achieve copy-on-change semantics: it's enough (and often preferrable for performance reasons) to copy only the parts of the objects that need to be modified.

Assays has currently 3 implementations which are formalized by concrete subclasses SimpleListAssays, ShallowSimpleListAssays, and AssaysInEnv. ShallowSimpleListAssays is the default. AssaysInEnv is a broken alternative to ShallowSimpleListAssays that does NOT respect the copy-on-change contract . It is only provided for illustration purposes (see source file Assays-class.R for the details).

A little more detail about ShallowSimpleListAssays: a small reference class hierarchy (not exported from the GenomicRanges name space) defines a reference class ShallowData with a single field data of type ANY , and a derived class ShallowSimpleListAssays that specializes the type of data as SimpleList , and contains=c("ShallowData", "Assays") . The assays slot of a SummarizedExperiment object contains an instance of ShallowSimpleListAssays.

Seealso

Author

Martin Morgan, mtmorgan@fhcrc.org

Examples

## ---------------------------------------------------------------------
## DIRECT MANIPULATION OF Assays OBJECTS
## ---------------------------------------------------------------------
m1 <- matrix(runif(24), ncol=3)
m2 <- matrix(runif(24), ncol=3)
a <- Assays(SimpleList(m1, m2))
a

as(a, "SimpleList")

length(a)
a[[2]]
dim(a)

b <- a[-4, 2]
b
length(b)
b[[2]]
dim(b)

names(a)
names(a) <- c("a1", "a2")
names(a)
a[["a2"]]

rbind(a, a)
cbind(a, a)

## ---------------------------------------------------------------------
## COPY-ON-CHANGE CONTRACT
## ---------------------------------------------------------------------

## ShallowSimpleListAssays objects have copy-on-change semantics but not
## AssaysInEnv objects. For example:
ssla <- as(SimpleList(m1, m2), "ShallowSimpleListAssays")
aie <- as(SimpleList(m1, m2), "AssaysInEnv")

## No names on 'ssla' and 'aie':
names(ssla)
names(aie)

ssla2 <- ssla
aie2 <- aie
names(ssla2) <- names(aie2) <- c("A1", "A2")

names(ssla)  # still NULL (as expected)

names(aie)   # changed! (because the names<-,AssaysInEnv method is not
# implemented in a way that respects the copy-on-change
# contract)
Link to this function

RangedSummarizedExperiment_class()

RangedSummarizedExperiment objects

Description

The RangedSummarizedExperiment class is a matrix-like container where rows represent ranges of interest (as a GRanges or GRangesList object) and columns represent samples (with sample data summarized as a DataFrame ). A RangedSummarizedExperiment contains one or more assays, each represented by a matrix-like object of numeric or other mode.

RangedSummarizedExperiment is a subclass of SummarizedExperiment and, as such, all the methods documented in ? also work on a RangedSummarizedExperiment object. The methods documented below are additional methods that are specific to RangedSummarizedExperiment objects.

Usage

## Constructor
SummarizedExperiment(assays, ...)
list(list("SummarizedExperiment"), list("SimpleList"))(assays, rowData=NULL, rowRanges=GRangesList(),
    colData=DataFrame(), metadata=list())
list(list("SummarizedExperiment"), list("ANY"))(assays, ...)
list(list("SummarizedExperiment"), list("list"))(assays, ...)
list(list("SummarizedExperiment"), list("missing"))(assays, ...)
## Accessors
rowRanges(x, ...)
rowRanges(x, ...) <- value
## Subsetting
list(list("subset"), list("RangedSummarizedExperiment"))(x, subset, select, ...)
## rowRanges access
## see 'GRanges compatibility', below

Arguments

ArgumentDescription
assaysA list or SimpleList of matrix-like elements, or a matrix-like object. All elements of the list must have the same dimensions, and dimension names (if present) must be consistent across elements and with the row names of rowRanges and colData .
rowDataA DataFrame object describing the rows. Row names, if present, become the row names of the SummarizedExperiment object. The number of rows of the DataFrame must equal the number of rows of the matrices in assays .
rowRangesA GRanges or GRangesList object describing the ranges of interest. Names, if present, become the row names of the SummarizedExperiment object. The length of the GRanges or GRangesList must equal the number of rows of the matrices in assays . If rowRanges is missing, a SummarizedExperiment instance is returned.
colDataAn optional DataFrame describing the samples. Row names, if present, become the column names of the RangedSummarizedExperiment.
metadataAn optional list of arbitrary content describing the overall experiment.
...For SummarizedExperiment , S4 methods list and matrix , arguments identical to those of the SimpleList method. For rowRanges , ignored.
xA RangedSummarizedExperiment object. The rowRanges setter will also accept a SummarizedExperiment object and will first coerce it to RangedSummarizedExperiment before it sets value on it.
valueA GRanges or GRangesList object.
subsetAn expression which, when evaluated in the context of rowRanges(x) , is a logical vector indicating elements or rows to keep: missing values are taken as false.
selectAn expression which, when evaluated in the context of colData(x) , is a logical vector indicating elements or rows to keep: missing values are taken as false.

Details

The rows of a RangedSummarizedExperiment object represent ranges (in genomic coordinates) of interest. The ranges of interest are described by a GRanges or a GRangesList object, accessible using the rowRanges function, described below. The GRanges and GRangesList classes contains sequence (e.g., chromosome) name, genomic coordinates, and strand information. Each range can be annotated with additional data; this data might be used to describe the range or to summarize results (e.g., statistics of differential abundance) relevant to the range. Rows may or may not have row names; they often will not.

Seealso

Author

Martin Morgan, mtmorgan@fhcrc.org

Examples

nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
rse
dim(rse)
dimnames(rse)
assayNames(rse)
head(assay(rse))
assays(rse) <- endoapply(assays(rse), asinh)
head(assay(rse))

rowRanges(rse)
rowData(rse)  # same as 'mcols(rowRanges(rse))'
colData(rse)

rse[ , rse$Treatment == "ChIP"]

## cbind() combines objects with the same ranges but different samples:
rse1 <- rse
rse2 <- rse1[ , 1:3]
colnames(rse2) <- letters[1:ncol(rse2)]
cmb1 <- cbind(rse1, rse2)
dim(cmb1)
dimnames(cmb1)

## rbind() combines objects with the same samples but different ranges:
rse1 <- rse
rse2 <- rse1[1:50, ]
rownames(rse2) <- letters[1:nrow(rse2)]
cmb2 <- rbind(rse1, rse2)
dim(cmb2)
dimnames(cmb2)

## Coercion to/from SummarizedExperiment:
se0 <- as(rse, "SummarizedExperiment")
se0

as(se0, "RangedSummarizedExperiment")

## Setting rowRanges on a SummarizedExperiment object turns it into a
## RangedSummarizedExperiment object:
se <- se0
rowRanges(se) <- rowRanges
se  # RangedSummarizedExperiment

## Sanity checks:
stopifnot(identical(assays(se0), assays(rse)))
stopifnot(identical(dim(se0), dim(rse)))
stopifnot(identical(dimnames(se0), dimnames(rse)))
stopifnot(identical(rowData(se0), rowData(rse)))
stopifnot(identical(colData(se0), colData(rse)))
Link to this function

SummarizedExperiment_class()

SummarizedExperiment objects

Description

The SummarizedExperiment class is a matrix-like container where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent samples (with sample data summarized as a DataFrame ). A SummarizedExperiment object contains one or more assays, each represented by a matrix-like object of numeric or other mode.

Note that SummarizedExperiment is the parent of the RangedSummarizedExperiment class which means that all the methods documented below also work on a RangedSummarizedExperiment object.

Usage

## Constructor
# See ?RangedSummarizedExperiment for the constructor function.
## Accessors
assayNames(x, ...)
assayNames(x, ...) <- value
assays(x, ..., withDimnames=TRUE)
assays(x, ..., withDimnames=TRUE) <- value
assay(x, i, ...)
assay(x, i, ...) <- value
rowData(x, use.names=TRUE, ...)
rowData(x, ...) <- value
colData(x, ...)
colData(x, ...) <- value
#dim(x)
#dimnames(x)
#dimnames(x) <- value
## Quick colData access
list(list("$"), list("SummarizedExperiment"))(x, name)
list(list("$"), list("SummarizedExperiment"))(x, name) <- value
list(list("[["), list("SummarizedExperiment,ANY,missing"))(x, i, j, ...)
list(list("[["), list("SummarizedExperiment,ANY,missing"))(x, i, j, ...) <- value
## Subsetting
list(list("["), list("SummarizedExperiment"))(x, i, j, ..., drop=TRUE)
list(list("["), list("SummarizedExperiment,ANY,ANY,SummarizedExperiment"))(x, i, j) <- value
list(list("subset"), list("SummarizedExperiment"))(x, subset, select, ...)
## Combining
list(list("cbind"), list("SummarizedExperiment"))(..., deparse.level=1)
list(list("rbind"), list("SummarizedExperiment"))(..., deparse.level=1)
## On-disk realization
list(list("realize"), list("SummarizedExperiment"))(x, BACKEND=getRealizationBackend())

Arguments

ArgumentDescription
xA SummarizedExperiment object.
...For assay , ... may contain withDimnames , which is forwarded to assays . For cbind , rbind , ... contains SummarizedExperiment objects to be combined. For other accessors, ignored.
valueAn object of a class specified in the S4 method signature or as outlined in Details .
i, jFor assay , assay<- , i is an integer or numeric scalar; see Details for additional constraints. For [,SummarizedExperiment , [,SummarizedExperiment<- , i , j are subscripts that can act to subset the rows and columns of x , that is the matrix elements of assays . For [[,SummarizedExperiment , [[<-,SummarizedExperiment , i is a scalar index (e.g., character(1) or integer(1) ) into a column of colData .
nameA symbol representing the name of a column of colData .
withDimnamesA logical(1) , indicating whether dimnames should be applied to extracted assay elements. Setting withDimnames=FALSE increases the speed and memory efficiency with which assays are extracted. withDimnames=TRUE in the getter assays<- allows efficient complex assignments (e.g., updating names of assays, names(assays(x, withDimnames=FALSE)) is more efficient than names(assays(x)) = ... ); it does not influence actual assignment of dimnames to assays.
use.namesLike mcols , by default rowData(x) propagates the rownames of x to the returned DataFrame object (note that for a SummarizedExperiment object, the rownames are also the names i.e. rownames(x) is always the same as names(x) ). Setting use.names=FALSE suppresses this propagation i.e. it returns a DataFrame object with no rownames. Use this when rowData(x) fails, which can happen when the rownames contain NAs (because the rownames of a SummarizedExperiment object can contain NAs, but the rownames of a DataFrame object cannot).
dropA logical(1) , ignored by these methods.
deparse.levelSee ?base:: for a description of this argument.
subsetAn expression which, when evaluated in the context of rowData(x) , is a logical vector indicating elements or rows to keep: missing values are taken as false.
selectAn expression which, when evaluated in the context of colData(x) , is a logical vector indicating elements or rows to keep: missing values are taken as false.
BACKENDNULL (the default), or a single string specifying the name of the backend. When the backend is set to NULL , each element of assays(x) is realized in memory as an ordinary array by just calling as.array on it.

Details

The SummarizedExperiment class is meant for numeric and other data types derived from a sequencing experiment. The structure is rectangular like a matrix , but with additional annotations on the rows and columns, and with the possibility to manage several assays simultaneously.

The rows of a SummarizedExperiment object represent features of interest. Information about these features is stored in a DataFrame object, accessible using the function rowData . The DataFrame must have as many rows as there are rows in the SummarizedExperiment object, with each row of the DataFrame providing information on the feature in the corresponding row of the SummarizedExperiment object. Columns of the DataFrame represent different attributes of the features of interest, e.g., gene or transcript IDs, etc.

Each column of a SummarizedExperiment object represents a sample. Information about the samples are stored in a DataFrame , accessible using the function colData , described below. The DataFrame must have as many rows as there are columns in the SummarizedExperiment object, with each row of the DataFrame providing information on the sample in the corresponding column of the SummarizedExperiment object. Columns of the DataFrame represent different sample attributes, e.g., tissue of origin, etc. Columns of the DataFrame can themselves be annotated (via the mcols function). Column names typically provide a short identifier unique to each sample.

A SummarizedExperiment object can also contain information about the overall experiment, for instance the lab in which it was conducted, the publications with which it is associated, etc. This information is stored as a list object, accessible using the metadata function. The form of the data associated with the experiment is left to the discretion of the user.

The SummarizedExperiment container is appropriate for matrix-like data. The data are accessed using the assays function, described below. This returns a SimpleList object. Each element of the list must itself be a matrix (of any mode) and must have dimensions that are the same as the dimensions of the SummarizedExperiment in which they are stored. Row and column names of each matrix must either be NULL or match those of the SummarizedExperiment during construction. It is convenient for the elements of SimpleList of assays to be named.

Seealso

Author

Martin Morgan, mtmorgan@fhcrc.org

Examples

nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
se0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
colData=colData)
se0
dim(se0)
dimnames(se0)
assayNames(se0)
head(assay(se0))
assays(se0) <- endoapply(assays(se0), asinh)
head(assay(se0))

rowData(se0)
colData(se0)

se0[, se0$Treatment == "ChIP"]
subset(se0, select = Treatment == "ChIP")

## cbind() combines objects with the same features of interest
## but different samples:
se1 <- se0
se2 <- se1[,1:3]
colnames(se2) <- letters[seq_len(ncol(se2))]
cmb1 <- cbind(se1, se2)
dim(cmb1)
dimnames(cmb1)

## rbind() combines objects with the same samples but different
## features of interest:
se1 <- se0
se2 <- se1[1:50,]
rownames(se2) <- letters[seq_len(nrow(se2))]
cmb2 <- rbind(se1, se2)
dim(cmb2)
dimnames(cmb2)

## ---------------------------------------------------------------------
## ON-DISK REALIZATION
## ---------------------------------------------------------------------
setRealizationBackend("HDF5Array")
cmb3 <- realize(cmb2)
assay(cmb3, withDimnames=FALSE)  # an HDF5Matrix object
Link to this function

coverage_methods()

Coverage of a RangedSummarizedExperiment object

Description

This man page documents the coverage method for RangedSummarizedExperiment objects.

Usage

list(list("coverage"), list("RangedSummarizedExperiment"))(x, shift=0L, width=NULL, weight=1L,
            method=c("auto", "sort", "hash"))

Arguments

ArgumentDescription
xA RangedSummarizedExperiment object.
shift, width, weight, methodSee ? in the GenomicRanges package.

Details

This method operates on the rowRanges component of the RangedSummarizedExperiment object, which can be a GenomicRanges or GRangesList object.

More precisely, on RangedSummarizedExperiment object x , coverage(x, ...) is equivalent to coverage(rowRanges(x), ...) .

See ? in the GenomicRanges package for the details of how coverage operates on a GenomicRanges or GRangesList object.

Value

See ? in the GenomicRanges package.

Seealso

Examples

nrows <- 20; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)),
IRanges(sample(1000L, 20), width=100),
strand=Rle(c("+", "-"), c(12, 8)),
seqlengths=c(chr1=1800, chr2=1300))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)

cvg <- coverage(rse)
cvg
stopifnot(identical(cvg, coverage(rowRanges(rse))))
Link to this function

findOverlaps_methods()

Finding overlapping ranges in RangedSummarizedExperiment objects

Description

This man page documents the findOverlaps methods for RangedSummarizedExperiment objects.

RangedSummarizedExperiment objects also support countOverlaps , overlapsAny , and subsetByOverlaps thanks to the default methods defined in the IRanges package and to the findOverlaps methods defined in this package and documented below.

Usage

list(list("findOverlaps"), list("RangedSummarizedExperiment,Vector"))(query, subject,
    maxgap=-1L, minoverlap=0L,
    type=c("any", "start", "end", "within", "equal"),
    select=c("all", "first", "last", "arbitrary"),
    ignore.strand=FALSE)
list(list("findOverlaps"), list("Vector,RangedSummarizedExperiment"))(query, subject,
    maxgap=-1L, minoverlap=0L,
    type=c("any", "start", "end", "within", "equal"),
    select=c("all", "first", "last", "arbitrary"),
    ignore.strand=FALSE)

Arguments

ArgumentDescription
query, subjectOne of these two arguments must be a RangedSummarizedExperiment object.
maxgap, minoverlap, typeSee ? in the GenomicRanges package.
select, ignore.strandSee ? in the GenomicRanges package.

Details

These methods operate on the rowRanges component of the RangedSummarizedExperiment object, which can be a GenomicRanges or GRangesList object.

More precisely, if any of the above functions is passed a RangedSummarizedExperiment object thru the query and/or subject argument, then it behaves as if rowRanges(query) and/or rowRanges(subject) had been passed instead.

See ? in the GenomicRanges package for the details of how findOverlaps and family operate on GenomicRanges and GRangesList objects.

Value

See ? in the GenomicRanges package.

Seealso

Examples

nrows <- 20; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)),
IRanges(sample(1000L, 20), width=100),
strand=Rle(c("+", "-"), c(12, 8)))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
rse1 <- shift(rse0, 100)

hits <- findOverlaps(rse0, rse1)
hits
stopifnot(identical(hits, findOverlaps(rowRanges(rse0), rowRanges(rse1))))
stopifnot(identical(hits, findOverlaps(rse0, rowRanges(rse1))))
stopifnot(identical(hits, findOverlaps(rowRanges(rse0), rse1)))
Link to this function

inter_range_methods()

Inter range transformations of a RangedSummarizedExperiment object

Description

This man page documents the inter range transformations that are supported on RangedSummarizedExperiment objects.

Usage

list(list("isDisjoint"), list("RangedSummarizedExperiment"))(x, ignore.strand=FALSE)
list(list("disjointBins"), list("RangedSummarizedExperiment"))(x, ignore.strand=FALSE)

Arguments

ArgumentDescription
xA RangedSummarizedExperiment object.
ignore.strandSee ? in the GenomicRanges package.

Details

These transformations operate on the rowRanges component of the RangedSummarizedExperiment object, which can be a GenomicRanges or GRangesList object.

More precisely, any of the above functions performs the following transformation on RangedSummarizedExperiment object x :

 f(rowRanges(x), ...)

where f is the name of the function and ... any additional arguments passed to it.

See ? in the GenomicRanges package for the details of how these transformations operate on a GenomicRanges or GRangesList object.

Value

See ? in the GenomicRanges package.

Seealso

Examples

nrows <- 20; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)),
IRanges(sample(1000L, 20), width=100),
strand=Rle(c("+", "-"), c(12, 8)))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
rse1 <- shift(rse0, 99*start(rse0))

isDisjoint(rse0)  # FALSE
isDisjoint(rse1)  # TRUE

bins0 <- disjointBins(rse0)
bins0
stopifnot(identical(bins0, disjointBins(rowRanges(rse0))))

bins1 <- disjointBins(rse1)
bins1
stopifnot(all(bins1 == bins1[1]))
Link to this function

intra_range_methods()

Intra range transformations of a RangedSummarizedExperiment object

Description

This man page documents the intra range transformations that are supported on RangedSummarizedExperiment objects.

Usage

list(list("shift"), list("RangedSummarizedExperiment"))(x, shift=0L, use.names=TRUE)
list(list("narrow"), list("RangedSummarizedExperiment"))(x, start=NA, end=NA, width=NA, use.names=TRUE)
list(list("resize"), list("RangedSummarizedExperiment"))(x, width, fix="start", use.names=TRUE,
       ignore.strand=FALSE)
list(list("flank"), list("RangedSummarizedExperiment"))(x, width, start=TRUE, both=FALSE,
      use.names=TRUE, ignore.strand=FALSE)
list(list("promoters"), list("RangedSummarizedExperiment"))(x, upstream=2000, downstream=200)
list(list("restrict"), list("RangedSummarizedExperiment"))(x, start=NA, end=NA, keep.all.ranges=FALSE,
         use.names=TRUE)
list(list("trim"), list("RangedSummarizedExperiment"))(x, use.names=TRUE)

Arguments

ArgumentDescription
xA RangedSummarizedExperiment object.
shift, use.namesSee ? in the IRanges package.
start, end, width, fixSee ? in the IRanges package.
ignore.strand, bothSee ? in the IRanges package.
upstream, downstreamSee ? in the IRanges package.
keep.all.rangesSee ? in the IRanges package.

Details

These transformations operate on the rowRanges component of the RangedSummarizedExperiment object, which can be a GenomicRanges or GRangesList object.

More precisely, any of the above functions performs the following transformation on RangedSummarizedExperiment object x :

 rowRanges(x) <- f(rowRanges(x), ...)

where f is the name of the function and ... any additional arguments passed to it.

See ? in the IRanges package for the details of how these transformations operate on a GenomicRanges or GRangesList object.

Seealso

Examples

nrows <- 20; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)),
IRanges(sample(1000L, 20), width=100),
strand=Rle(c("+", "-"), c(12, 8)))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)

rse1 <- shift(rse0, 1)
stopifnot(identical(
rowRanges(rse1),
shift(rowRanges(rse0), 1)
))

se2 <- narrow(rse0, start=10, end=-15)
stopifnot(identical(
rowRanges(se2),
narrow(rowRanges(rse0), start=10, end=-15)
))

se3 <- resize(rse0, width=75)
stopifnot(identical(
rowRanges(se3),
resize(rowRanges(rse0), width=75)
))

se4 <- flank(rse0, width=20)
stopifnot(identical(
rowRanges(se4),
flank(rowRanges(rse0), width=20)
))

se5 <- restrict(rse0, start=200, end=700, keep.all.ranges=TRUE)
stopifnot(identical(
rowRanges(se5),
restrict(rowRanges(rse0), start=200, end=700, keep.all.ranges=TRUE)
))
Link to this function

makeSummarizedExperimentFromDataFrame()

Make a RangedSummarizedExperiment from a data.frame or DataFrame

Description

makeSummarizedExperimentFromDataFrame uses data.frame or DataFrame column names to create a GRanges object for the rowRanges of the resulting SummarizedExperiment object. It requires that non-range data columns be coercible into a numeric matrix for the SummarizedExperiment constructor. All columns that are not part of the row ranges attribute are assumed to be experiment data; thus, keeping metadata columns will not be supported. Note that this function only returns SummarizedExperiment objects with a single assay.

If metadata columns are to be kept, one can first construct the row ranges attribute by using the makeGRangesFromDataFrame function and subsequently creating the SummarizedExperiment .

Usage

makeSummarizedExperimentFromDataFrame(df,
                                    ...,
                                    seqinfo = NULL,
                                    starts.in.df.are.0based = FALSE)

Arguments

ArgumentDescription
dfA data.frame or DataFrame object. If not, then the function first tries to turn df into a data frame with as.data.frame(df) .
...Additional arguments passed on to makeGRangesFromDataFrame
seqinfoEither NULL , or a Seqinfo object, or a character vector of seqlevels, or a named numeric vector of sequence lengths. When not NULL , it must be compatible with the genomic ranges in df i.e. it must include at least the sequence levels represented in df .
starts.in.df.are.0basedTRUE or FALSE (the default). If TRUE , then the start positions of the genomic ranges in df are considered to be 0-based and are converted to 1-based in the returned GRanges object. This feature is intended to make it more convenient to handle input that contains data obtained from resources using the "0-based start" convention. A notorious example of such resource is the UCSC Table Browser ( http://genome.ucsc.edu/cgi-bin/hgTables ).

Value

A RangedSummarizedExperiment object with rowRanges and a single assay

Seealso

Author

M. Ramos

Examples

## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------

# Note that rownames of the data.frame are also rownames of the result
df <- data.frame(chr="chr2", start = 11:15, end = 12:16,
strand = c("+", "-", "+", "*", "."), expr0 = 3:7,
expr1 = 8:12, expr2 = 12:16,
row.names = paste0("GENE", letters[5:1]))
df

exRSE <- makeSummarizedExperimentFromDataFrame(df)

exRSE

assay(exRSE)

rowRanges(exRSE)
Link to this function

makeSummarizedExperimentFromExpressionSet()

Make a RangedSummarizedExperiment object from an ExpressionSet and vice-versa

Description

Coercion between RangedSummarizedExperiment and ExpressionSet is supported in both directions.

For going from ExpressionSet to RangedSummarizedExperiment , the makeSummarizedExperimentFromExpressionSet function is also provided to let the user control how to map features to ranges.

Usage

makeSummarizedExperimentFromExpressionSet(from,
                                          mapFun=naiveRangeMapper,
                                          ...)
## range mapping functions
naiveRangeMapper(from)
probeRangeMapper(from)
geneRangeMapper(txDbPackage, key = "ENTREZID")

Arguments

ArgumentDescription
fromAn ExpressionSet object.
mapFunA function which takes an ExpressionSet object and returns a GRanges , or GRangesList object which corresponds to the genomic ranges used in the ExpressionSet. The rownames of the returned GRanges are used to match the featureNames of the ExpressionSet . The naiveRangeMapper function is used by default.
...Additional arguments passed to mapFun .
txDbPackageA character string with the Transcript Database to use for the mapping.
keyA character string with the Gene key to use for the mapping.

Value

makeSummarizedExperimentFromExpressionSet takes an ExpressionSet object as input and a list("range mapping ", " function") that maps the features to ranges. It then returns a RangedSummarizedExperiment object that corresponds to the input.

The range mapping functions return a GRanges object, with the rownames corresponding to the featureNames of the ExpressionSet object.

Seealso

Author

Jim Hester, james.f.hester@gmail.com

Examples

## ---------------------------------------------------------------------
## GOING FROM ExpressionSet TO SummarizedExperiment
## ---------------------------------------------------------------------

data(sample.ExpressionSet, package="Biobase")

# naive coercion
makeSummarizedExperimentFromExpressionSet(sample.ExpressionSet)
as(sample.ExpressionSet, "RangedSummarizedExperiment")
as(sample.ExpressionSet, "SummarizedExperiment")

# using probe range mapper
makeSummarizedExperimentFromExpressionSet(sample.ExpressionSet, probeRangeMapper)

# using the gene range mapper
se <- makeSummarizedExperimentFromExpressionSet(
sample.ExpressionSet,
geneRangeMapper("TxDb.Hsapiens.UCSC.hg19.knownGene")
)
se
rowData(se)  # duplicate row names

## ---------------------------------------------------------------------
## GOING FROM SummarizedExperiment TO ExpressionSet
## ---------------------------------------------------------------------

example(RangedSummarizedExperiment)  # to create 'rse'
rse
as(rse, "ExpressionSet")
Link to this function

makeSummarizedExperimentFromLoom()

Make a SummarizedExperiment from a '.loom' hdf5 file

Description

makeSummarizedExperimentFromLoom represents a '.loom' file as a SummarizedExperiment . The '/matrix' and '/layers' are represented as HDF5Array objects; row and column attributes are parsed to DataFrame . Optionally, row or column attributes can be specified as row and and column names.

Usage

makeSummarizedExperimentFromLoom(file,
                                 rownames_attr = NULL,
                                 colnames_attr = NULL)

Arguments

ArgumentDescription
fileThe path (as a single character string) to the HDF5 file where the dataset is located.
rownames_attrThe name of the row attribute to be used as row names.
colnames_attrThe name of the column attribute to be used as column names.

Value

A SummarizedExperiment object with row and column data and one or more assays.

Seealso

http://loompy.org/loompy-docs/format/index.html for a specification of the .loom format.

Author

Martin Morgan

Examples

## ---------------------------------------------------------------------
## BASIC EXAMPLE
## ---------------------------------------------------------------------

file <- system.file(
package="SummarizedExperiment", "extdata", "example.loom"
)
se <- makeSummarizedExperimentFromLoom(file)
se
assay(se)
metadata(se)
Link to this function

nearest_methods()

Finding the nearest range neighbor in RangedSummarizedExperiment objects

Description

This man page documents the nearest methods and family (i.e. precede , follow , distance , and distanceToNearest methods) for RangedSummarizedExperiment objects.

Usage

list(list("precede"), list("RangedSummarizedExperiment,ANY"))(x, subject, select=c("arbitrary", "all"),
        ignore.strand=FALSE)
list(list("precede"), list("ANY,RangedSummarizedExperiment"))(x, subject, select=c("arbitrary", "all"),
        ignore.strand=FALSE)
list(list("follow"), list("RangedSummarizedExperiment,ANY"))(x, subject, select=c("arbitrary", "all"),
        ignore.strand=FALSE)
list(list("follow"), list("ANY,RangedSummarizedExperiment"))(x, subject, select=c("arbitrary", "all"),
        ignore.strand=FALSE)
list(list("nearest"), list("RangedSummarizedExperiment,ANY"))(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE)
list(list("nearest"), list("ANY,RangedSummarizedExperiment"))(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE)
list(list("distance"), list("RangedSummarizedExperiment,ANY"))(x, y, ignore.strand=FALSE, ...)
list(list("distance"), list("ANY,RangedSummarizedExperiment"))(x, y, ignore.strand=FALSE, ...)
list(list("distanceToNearest"), list("RangedSummarizedExperiment,ANY"))(x, subject, ignore.strand=FALSE, ...)
list(list("distanceToNearest"), list("ANY,RangedSummarizedExperiment"))(x, subject, ignore.strand=FALSE, ...)

Arguments

ArgumentDescription
x, subjectOne of these two arguments must be a RangedSummarizedExperiment object.
select, ignore.strandSee ? in the GenomicRanges package.
yFor the distance methods, one of x or y must be a RangedSummarizedExperiment object.
...Additional arguments for methods.

Details

These methods operate on the rowRanges component of the RangedSummarizedExperiment object, which can be a GenomicRanges or GRangesList object.

More precisely, if any of the above functions is passed a RangedSummarizedExperiment object thru the x , subject , and/or y argument, then it behaves as if rowRanges(x) , rowRanges(subject) , and/or rowRanges(y) had been passed instead.

See ? in the GenomicRanges package for the details of how nearest and family operate on GenomicRanges and GRangesList objects.

Value

See ? in the GenomicRanges package.

Seealso

Examples

nrows <- 20; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)),
IRanges(sample(1000L, 20), width=100),
strand=Rle(c("+", "-"), c(12, 8)))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
rse1 <- shift(rse0, 100)

res <- nearest(rse0, rse1)
res
stopifnot(identical(res, nearest(rowRanges(rse0), rowRanges(rse1))))
stopifnot(identical(res, nearest(rse0, rowRanges(rse1))))
stopifnot(identical(res, nearest(rowRanges(rse0), rse1)))

res <- nearest(rse0)  # missing subject
res
stopifnot(identical(res, nearest(rowRanges(rse0))))

hits <- nearest(rse0, rse1, select="all")
hits
stopifnot(identical(
hits,
nearest(rowRanges(rse0), rowRanges(rse1), select="all")
))
stopifnot(identical(
hits,
nearest(rse0, rowRanges(rse1), select="all")
))
stopifnot(identical(
hits,
nearest(rowRanges(rse0), rse1, select="all")
))

Input kallisto or kallisto bootstrap results.

Description

readKallisto inputs several kallisto output files into a single SummarizedExperiment instance, with rows corresponding to estimated transcript abundance and columns to samples. readKallistoBootstrap inputs kallisto bootstrap replicates of a single sample into a matrix of transcript x bootstrap abundance estimates.

Usage

readKallisto(files,
    json = file.path(dirname(files), "run_info.json"), 
    h5 = any(grepl("\.h5$", files)), what = KALLISTO_ASSAYS,
    as = c("SummarizedExperiment", "list", "matrix"))
readKallistoBootstrap(file, i, j)

Arguments

ArgumentDescription
filescharacter() paths to kallisto abundance.tsv output files. The assumption is that files are organized in the way implied by kallisto, with each sample in a distinct directory, and the directory containing files abundance.tsv, run_info.json, and perhaps abundance.h5.
jsoncharacter() vector of the same length as files specifying the location of JSON files produced by kallisto and containing information on the run. The default assumes that json files are in the same directory as the corresponding abundance file.
h5character() vector of the same length as files specifying the location of HDF5 files produced by kallisto and containing bootstrap estimates. The default assumes that HDF5 files are in the same directory as the corresponding abundance file.
whatcharacter() vector of kallisto per-sample outputs to be input. See KALLISTO_ASSAYS for available values.
ascharacter(1) specifying the output format. See Value for additional detail.
filecharacter(1) path to a single HDF5 output file.
i, jinteger() vector of row ( i ) and column ( j ) indexes to input.

Value

A SummarizedExperiment , list , or matrix , depending on the value of argument as ; by default a SummarizedExperiment . The as="SummarizedExperiment" rowData(se) the length of each transcript; colData(se) includes summary information on each sample, including the number of targets and bootstraps, the kallisto and index version, the start time and operating system call used to create the file. assays() contains one or more transcript x sample matrices of parameters estimated by kallisto (see KALLISTO_ASSAYS ).

as="list" return value contains information simillar to SummarizedExperiment with row, column and assay data as elements of the list without coordination of row and column annotations into an integrated data container. as="matrix" returns the specified assay as a simple R matrix.

Author

Martin Morgan martin.morgan@roswellpark.org

References

http://pachterlab.github.io/kallisto software for quantifying transcript abundance.

Examples

outputs <- system.file(package="SummarizedExperiment", "extdata",
"kallisto")
files <- dir(outputs, pattern="abundance.tsv", full=TRUE, recursive=TRUE)
stopifnot(all(file.exists(files)))

## default: input 'est_counts'
(se <- readKallisto(files, as="SummarizedExperiment"))
str(readKallisto(files, as="list"))
str(readKallisto(files, as="matrix"))

## available assays
KALLISTO_ASSAYS
## one or more assay
readKallisto(files, what=c("tpm", "eff_length"))

## alternatively: read hdf5 files
files <- sub(".tsv", ".h5", files, fixed=TRUE)
readKallisto(files)

## input all bootstraps
xx <- readKallistoBootstrap(files[1])
ridx <- head(which(rowSums(xx) != 0), 3)
cidx <- c(1:5, 96:100)
xx[ridx, cidx]

## selective input of rows (transcripts) and/or bootstraps
readKallistoBootstrap(files[1], i=c(ridx, rev(ridx)), j=cidx)