bioconductor v3.9.0 SummarizedExperiment

SimpleListAssays and ShallowSimpleListAssays are concrete subclasses of Assays with the latter being currently the default implementation of Assays objects. Other implementations (e.g. disk-based) could easily be added.

Note that these classes are not meant to be used directly by the end-user and the material in this man page is aimed at package developers.

Details

Assays objects have a list-like semantics with elements having matrix- or array-like semantics (e.g., dim , dimnames ).

The Assays API consists of:

(a) The Assays() constructor function.
(b) Lossless back and forth coercion from/to SimpleList . The coercion method from SimpleList doesn't need (and should not) validate the returned object.
(c) length , names , names<- , [[ , [[<- , dim , [ , [<- , rbind , cbind .
An Assays concrete subclass needs to implement (b) (required) plus, optionally any of the methods in (c).

IMPORTANT: Methods that return a modified Assays object (a.k.a. endomorphisms), that is, [ as well as replacement methods names<- , [[<- , and [<- , must respect the copy-on-change contract . With objects that don't make use of references internally, the developer doesn't need to take any special action for that because it's automatically taken care of by R itself. However, for objects that do make use of references internally (e.g. environments, external pointers, pointer to a file on disk, etc...), the developer needs to be careful to implement endomorphisms with copy-on-change semantics. This can easily be achieved (and is what the default methods for Assays objects do) by performaing a full (deep) copy of the object before modifying it instead of trying to modify it in-place. Note that the full (deep) copy is not always necessary in order to achieve copy-on-change semantics: it's enough (and often preferrable for performance reasons) to copy only the parts of the objects that need to be modified.

Assays has currently 3 implementations which are formalized by concrete subclasses SimpleListAssays, ShallowSimpleListAssays, and AssaysInEnv. ShallowSimpleListAssays is the default. AssaysInEnv is a broken alternative to ShallowSimpleListAssays that does NOT respect the copy-on-change contract . It is only provided for illustration purposes (see source file Assays-class.R for the details).

A little more detail about ShallowSimpleListAssays: a small reference class hierarchy (not exported from the GenomicRanges name space) defines a reference class ShallowData with a single field data of type ANY , and a derived class ShallowSimpleListAssays that specializes the type of data as SimpleList , and contains=c("ShallowData", "Assays") . The assays slot of a SummarizedExperiment object contains an instance of ShallowSimpleListAssays.

Author

Martin Morgan, mtmorgan@fhcrc.org

Examples

## ---------------------------------------------------------------------
## DIRECT MANIPULATION OF Assays OBJECTS
## ---------------------------------------------------------------------
m1 <- matrix(runif(24), ncol=3)
m2 <- matrix(runif(24), ncol=3)
a <- Assays(SimpleList(m1, m2))
a

as(a, "SimpleList")

length(a)
a[[2]]
dim(a)

b <- a[-4, 2]
b
length(b)
b[[2]]
dim(b)

names(a)
names(a) <- c("a1", "a2")
names(a)
a[["a2"]]

rbind(a, a)
cbind(a, a)

## ---------------------------------------------------------------------
## COPY-ON-CHANGE CONTRACT
## ---------------------------------------------------------------------

## ShallowSimpleListAssays objects have copy-on-change semantics but not
## AssaysInEnv objects. For example:
ssla <- as(SimpleList(m1, m2), "ShallowSimpleListAssays")
aie <- as(SimpleList(m1, m2), "AssaysInEnv")

## No names on 'ssla' and 'aie':
names(ssla)
names(aie)

ssla2 <- ssla
aie2 <- aie
names(ssla2) <- names(aie2) <- c("A1", "A2")

names(ssla)  # still NULL (as expected)

names(aie)   # changed! (because the names<-,AssaysInEnv method is not
# implemented in a way that respects the copy-on-change
# contract)

RangedSummarizedExperiment_class()

RangedSummarizedExperiment objects

Description

The RangedSummarizedExperiment class is a matrix-like container where rows represent ranges of interest (as a GRanges or GRangesList object) and columns represent samples (with sample data summarized as a DataFrame ). A RangedSummarizedExperiment contains one or more assays, each represented by a matrix-like object of numeric or other mode.

RangedSummarizedExperiment is a subclass of SummarizedExperiment and, as such, all the methods documented in ? also work on a RangedSummarizedExperiment object. The methods documented below are additional methods that are specific to RangedSummarizedExperiment objects.

Usage

## Constructor
SummarizedExperiment(assays, ...)
list(list("SummarizedExperiment"), list("SimpleList"))(assays, rowData=NULL, rowRanges=GRangesList(),
    colData=DataFrame(), metadata=list())
list(list("SummarizedExperiment"), list("ANY"))(assays, ...)
list(list("SummarizedExperiment"), list("list"))(assays, ...)
list(list("SummarizedExperiment"), list("missing"))(assays, ...)
## Accessors
rowRanges(x, ...)
rowRanges(x, ...) <- value
## Subsetting
list(list("subset"), list("RangedSummarizedExperiment"))(x, subset, select, ...)
## rowRanges access
## see 'GRanges compatibility', below

Arguments

Argument	Description
`assays`	A `list` or `SimpleList` of matrix-like elements, or a matrix-like object. All elements of the list must have the same dimensions, and dimension names (if present) must be consistent across elements and with the row names of `rowRanges` and `colData` .
`rowData`	A DataFrame object describing the rows. Row names, if present, become the row names of the SummarizedExperiment object. The number of rows of the DataFrame must equal the number of rows of the matrices in `assays` .
`rowRanges`	A GRanges or GRangesList object describing the ranges of interest. Names, if present, become the row names of the SummarizedExperiment object. The length of the GRanges or GRangesList must equal the number of rows of the matrices in `assays` . If `rowRanges` is missing, a SummarizedExperiment instance is returned.
`colData`	An optional DataFrame describing the samples. Row names, if present, become the column names of the RangedSummarizedExperiment.
`metadata`	An optional `list` of arbitrary content describing the overall experiment.
`...`	For `SummarizedExperiment` , S4 methods `list` and `matrix` , arguments identical to those of the `SimpleList` method. For `rowRanges` , ignored.
`x`	A RangedSummarizedExperiment object. The `rowRanges` setter will also accept a SummarizedExperiment object and will first coerce it to RangedSummarizedExperiment before it sets `value` on it.
`value`	A GRanges or GRangesList object.
`subset`	An expression which, when evaluated in the context of `rowRanges(x)` , is a logical vector indicating elements or rows to keep: missing values are taken as false.
`select`	An expression which, when evaluated in the context of `colData(x)` , is a logical vector indicating elements or rows to keep: missing values are taken as false.

Details

The rows of a RangedSummarizedExperiment object represent ranges (in genomic coordinates) of interest. The ranges of interest are described by a GRanges or a GRangesList object, accessible using the rowRanges function, described below. The GRanges and GRangesList classes contains sequence (e.g., chromosome) name, genomic coordinates, and strand information. Each range can be annotated with additional data; this data might be used to describe the range or to summarize results (e.g., statistics of differential abundance) relevant to the range. Rows may or may not have row names; they often will not.

Author

Martin Morgan, mtmorgan@fhcrc.org

Examples

nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE),
feature_id=sprintf("ID%03d", 1:200))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
rse
dim(rse)
dimnames(rse)
assayNames(rse)
head(assay(rse))
assays(rse) <- endoapply(assays(rse), asinh)
head(assay(rse))

rowRanges(rse)
rowData(rse)  # same as 'mcols(rowRanges(rse))'
colData(rse)

rse[ , rse$Treatment == "ChIP"]

## cbind() combines objects with the same ranges but different samples:
rse1 <- rse
rse2 <- rse1[ , 1:3]
colnames(rse2) <- letters[1:ncol(rse2)]
cmb1 <- cbind(rse1, rse2)
dim(cmb1)
dimnames(cmb1)

## rbind() combines objects with the same samples but different ranges:
rse1 <- rse
rse2 <- rse1[1:50, ]
rownames(rse2) <- letters[1:nrow(rse2)]
cmb2 <- rbind(rse1, rse2)
dim(cmb2)
dimnames(cmb2)

## Coercion to/from SummarizedExperiment:
se0 <- as(rse, "SummarizedExperiment")
se0

as(se0, "RangedSummarizedExperiment")

## Setting rowRanges on a SummarizedExperiment object turns it into a
## RangedSummarizedExperiment object:
se <- se0
rowRanges(se) <- rowRanges
se  # RangedSummarizedExperiment

## Sanity checks:
stopifnot(identical(assays(se0), assays(rse)))
stopifnot(identical(dim(se0), dim(rse)))
stopifnot(identical(dimnames(se0), dimnames(rse)))
stopifnot(identical(rowData(se0), rowData(rse)))
stopifnot(identical(colData(se0), colData(rse)))

SummarizedExperiment_class()

SummarizedExperiment objects

Description

The SummarizedExperiment class is a matrix-like container where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent samples (with sample data summarized as a DataFrame ). A SummarizedExperiment object contains one or more assays, each represented by a matrix-like object of numeric or other mode.

Note that SummarizedExperiment is the parent of the RangedSummarizedExperiment class which means that all the methods documented below also work on a RangedSummarizedExperiment object.

Usage

## Constructor
# See ?RangedSummarizedExperiment for the constructor function.
## Accessors
assayNames(x, ...)
assayNames(x, ...) <- value
assays(x, ..., withDimnames=TRUE)
assays(x, ..., withDimnames=TRUE) <- value
assay(x, i, ...)
assay(x, i, ...) <- value
rowData(x, use.names=TRUE, ...)
rowData(x, ...) <- value
colData(x, ...)
colData(x, ...) <- value
#dim(x)
#dimnames(x)
#dimnames(x) <- value
## Quick colData access
list(list("$"), list("SummarizedExperiment"))(x, name)
list(list("$"), list("SummarizedExperiment"))(x, name) <- value
list(list("[["), list("SummarizedExperiment,ANY,missing"))(x, i, j, ...)
list(list("[["), list("SummarizedExperiment,ANY,missing"))(x, i, j, ...) <- value
## Subsetting
list(list("["), list("SummarizedExperiment"))(x, i, j, ..., drop=TRUE)
list(list("["), list("SummarizedExperiment,ANY,ANY,SummarizedExperiment"))(x, i, j) <- value
list(list("subset"), list("SummarizedExperiment"))(x, subset, select, ...)
## Combining
list(list("cbind"), list("SummarizedExperiment"))(..., deparse.level=1)
list(list("rbind"), list("SummarizedExperiment"))(..., deparse.level=1)
## On-disk realization
list(list("realize"), list("SummarizedExperiment"))(x, BACKEND=getRealizationBackend())

Arguments

Argument	Description
`x`	A SummarizedExperiment object.
`...`	For `assay` , `...` may contain `withDimnames` , which is forwarded to `assays` . For `cbind` , `rbind` , `...` contains SummarizedExperiment objects to be combined. For other accessors, ignored.
`value`	An object of a class specified in the S4 method signature or as outlined in Details .
`i, j`	For `assay` , `assay<-` , `i` is an integer or numeric scalar; see Details for additional constraints. For `[,SummarizedExperiment` , `[,SummarizedExperiment<-` , `i` , `j` are subscripts that can act to subset the rows and columns of `x` , that is the `matrix` elements of `assays` . For `[[,SummarizedExperiment` , `[[<-,SummarizedExperiment` , `i` is a scalar index (e.g., `character(1)` or `integer(1)` ) into a column of `colData` .
`name`	A symbol representing the name of a column of `colData` .
`withDimnames`	A `logical(1)` , indicating whether dimnames should be applied to extracted assay elements. Setting `withDimnames=FALSE` increases the speed and memory efficiency with which assays are extracted. `withDimnames=TRUE` in the getter `assays<-` allows efficient complex assignments (e.g., updating names of assays, `names(assays(x, withDimnames=FALSE))` is more efficient than `names(assays(x)) = ...` ); it does not influence actual assignment of dimnames to assays.
`use.names`	Like `mcols` , by default `rowData(x)` propagates the rownames of `x` to the returned DataFrame object (note that for a SummarizedExperiment object, the rownames are also the names i.e. `rownames(x)` is always the same as `names(x)` ). Setting `use.names=FALSE` suppresses this propagation i.e. it returns a DataFrame object with no rownames. Use this when `rowData(x)` fails, which can happen when the rownames contain NAs (because the rownames of a SummarizedExperiment object can contain NAs, but the rownames of a DataFrame object cannot).
`drop`	A `logical(1)` , ignored by these methods.
`deparse.level`	See `?base::` for a description of this argument.
`subset`	An expression which, when evaluated in the context of `rowData(x)` , is a logical vector indicating elements or rows to keep: missing values are taken as false.
`select`	An expression which, when evaluated in the context of `colData(x)` , is a logical vector indicating elements or rows to keep: missing values are taken as false.
`BACKEND`	`NULL` (the default), or a single string specifying the name of the backend. When the backend is set to `NULL` , each element of `assays(x)` is realized in memory as an ordinary array by just calling `as.array` on it.

Details

The SummarizedExperiment class is meant for numeric and other data types derived from a sequencing experiment. The structure is rectangular like a matrix , but with additional annotations on the rows and columns, and with the possibility to manage several assays simultaneously.

The rows of a SummarizedExperiment object represent features of interest. Information about these features is stored in a DataFrame object, accessible using the function rowData . The DataFrame must have as many rows as there are rows in the SummarizedExperiment object, with each row of the DataFrame providing information on the feature in the corresponding row of the SummarizedExperiment object. Columns of the DataFrame represent different attributes of the features of interest, e.g., gene or transcript IDs, etc.

Each column of a SummarizedExperiment object represents a sample. Information about the samples are stored in a DataFrame , accessible using the function colData , described below. The DataFrame must have as many rows as there are columns in the SummarizedExperiment object, with each row of the DataFrame providing information on the sample in the corresponding column of the SummarizedExperiment object. Columns of the DataFrame represent different sample attributes, e.g., tissue of origin, etc. Columns of the DataFrame can themselves be annotated (via the mcols function). Column names typically provide a short identifier unique to each sample.

A SummarizedExperiment object can also contain information about the overall experiment, for instance the lab in which it was conducted, the publications with which it is associated, etc. This information is stored as a list object, accessible using the metadata function. The form of the data associated with the experiment is left to the discretion of the user.

The SummarizedExperiment container is appropriate for matrix-like data. The data are accessed using the assays function, described below. This returns a SimpleList object. Each element of the list must itself be a matrix (of any mode) and must have dimensions that are the same as the dimensions of the SummarizedExperiment in which they are stored. Row and column names of each matrix must either be NULL or match those of the SummarizedExperiment during construction. It is convenient for the elements of SimpleList of assays to be named.

RangedSummarizedExperiment objects.
DataFrame , SimpleList , and Annotated objects in the S4Vectors package.
The metadata and mcols accessors in the S4Vectors package.
saveHDF5SummarizedExperiment and loadHDF5SummarizedExperiment in the HDF5Array package for saving/loading an HDF5-based SummarizedExperiment object to/from disk.
The realize generic function in the DelayedArray package for more information about on-disk realization of objects carrying delayed operations.

Author

Martin Morgan, mtmorgan@fhcrc.org

Examples

nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
se0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
colData=colData)
se0
dim(se0)
dimnames(se0)
assayNames(se0)
head(assay(se0))
assays(se0) <- endoapply(assays(se0), asinh)
head(assay(se0))

rowData(se0)
colData(se0)

se0[, se0$Treatment == "ChIP"]
subset(se0, select = Treatment == "ChIP")

## cbind() combines objects with the same features of interest
## but different samples:
se1 <- se0
se2 <- se1[,1:3]
colnames(se2) <- letters[seq_len(ncol(se2))]
cmb1 <- cbind(se1, se2)
dim(cmb1)
dimnames(cmb1)

## rbind() combines objects with the same samples but different
## features of interest:
se1 <- se0
se2 <- se1[1:50,]
rownames(se2) <- letters[seq_len(nrow(se2))]
cmb2 <- rbind(se1, se2)
dim(cmb2)
dimnames(cmb2)

## ---------------------------------------------------------------------
## ON-DISK REALIZATION
## ---------------------------------------------------------------------
setRealizationBackend("HDF5Array")
cmb3 <- realize(cmb2)
assay(cmb3, withDimnames=FALSE)  # an HDF5Matrix object

coverage_methods()

Coverage of a RangedSummarizedExperiment object

Description

This man page documents the coverage method for RangedSummarizedExperiment objects.

Usage

list(list("coverage"), list("RangedSummarizedExperiment"))(x, shift=0L, width=NULL, weight=1L,
            method=c("auto", "sort", "hash"))

Arguments

Argument	Description
`x`	A RangedSummarizedExperiment object.
`shift, width, weight, method`	See `?` in the GenomicRanges package.

Details

This method operates on the rowRanges component of the RangedSummarizedExperiment object, which can be a GenomicRanges or GRangesList object.

More precisely, on RangedSummarizedExperiment object x , coverage(x, ...) is equivalent to coverage(rowRanges(x), ...) .

See ? in the GenomicRanges package for the details of how coverage operates on a GenomicRanges or GRangesList object.

Value

See ? in the GenomicRanges package.

Examples

nrows <- 20; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)),
IRanges(sample(1000L, 20), width=100),
strand=Rle(c("+", "-"), c(12, 8)),
seqlengths=c(chr1=1800, chr2=1300))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)

cvg <- coverage(rse)
cvg
stopifnot(identical(cvg, coverage(rowRanges(rse))))

findOverlaps_methods()

Finding overlapping ranges in RangedSummarizedExperiment objects

Description

This man page documents the findOverlaps methods for RangedSummarizedExperiment objects.

RangedSummarizedExperiment objects also support countOverlaps , overlapsAny , and subsetByOverlaps thanks to the default methods defined in the IRanges package and to the findOverlaps methods defined in this package and documented below.

Usage

list(list("findOverlaps"), list("RangedSummarizedExperiment,Vector"))(query, subject,
    maxgap=-1L, minoverlap=0L,
    type=c("any", "start", "end", "within", "equal"),
    select=c("all", "first", "last", "arbitrary"),
    ignore.strand=FALSE)
list(list("findOverlaps"), list("Vector,RangedSummarizedExperiment"))(query, subject,
    maxgap=-1L, minoverlap=0L,
    type=c("any", "start", "end", "within", "equal"),
    select=c("all", "first", "last", "arbitrary"),
    ignore.strand=FALSE)

Arguments

Argument	Description
`query, subject`	One of these two arguments must be a RangedSummarizedExperiment object.
`maxgap, minoverlap, type`	See `?` in the GenomicRanges package.
`select, ignore.strand`	See `?` in the GenomicRanges package.

Details

These methods operate on the rowRanges component of the RangedSummarizedExperiment object, which can be a GenomicRanges or GRangesList object.

More precisely, if any of the above functions is passed a RangedSummarizedExperiment object thru the query and/or subject argument, then it behaves as if rowRanges(query) and/or rowRanges(subject) had been passed instead.

See ? in the GenomicRanges package for the details of how findOverlaps and family operate on GenomicRanges and GRangesList objects.

Value

See ? in the GenomicRanges package.

Examples

nrows <- 20; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)),
IRanges(sample(1000L, 20), width=100),
strand=Rle(c("+", "-"), c(12, 8)))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
rse1 <- shift(rse0, 100)

hits <- findOverlaps(rse0, rse1)
hits
stopifnot(identical(hits, findOverlaps(rowRanges(rse0), rowRanges(rse1))))
stopifnot(identical(hits, findOverlaps(rse0, rowRanges(rse1))))
stopifnot(identical(hits, findOverlaps(rowRanges(rse0), rse1)))

inter_range_methods()

Inter range transformations of a RangedSummarizedExperiment object

Description

This man page documents the inter range transformations that are supported on RangedSummarizedExperiment objects.

Usage

list(list("isDisjoint"), list("RangedSummarizedExperiment"))(x, ignore.strand=FALSE)
list(list("disjointBins"), list("RangedSummarizedExperiment"))(x, ignore.strand=FALSE)

Arguments

Argument	Description
`x`	A RangedSummarizedExperiment object.
`ignore.strand`	See `?` in the GenomicRanges package.

Details

These transformations operate on the rowRanges component of the RangedSummarizedExperiment object, which can be a GenomicRanges or GRangesList object.

More precisely, any of the above functions performs the following transformation on RangedSummarizedExperiment object x :

 f(rowRanges(x), ...)

where f is the name of the function and ... any additional arguments passed to it.

See ? in the GenomicRanges package for the details of how these transformations operate on a GenomicRanges or GRangesList object.

Value

See ? in the GenomicRanges package.

Examples

nrows <- 20; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)),
IRanges(sample(1000L, 20), width=100),
strand=Rle(c("+", "-"), c(12, 8)))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
rse1 <- shift(rse0, 99*start(rse0))

isDisjoint(rse0)  # FALSE
isDisjoint(rse1)  # TRUE

bins0 <- disjointBins(rse0)
bins0
stopifnot(identical(bins0, disjointBins(rowRanges(rse0))))

bins1 <- disjointBins(rse1)
bins1
stopifnot(all(bins1 == bins1[1]))

intra_range_methods()

Intra range transformations of a RangedSummarizedExperiment object

Description

This man page documents the intra range transformations that are supported on RangedSummarizedExperiment objects.

Usage

list(list("shift"), list("RangedSummarizedExperiment"))(x, shift=0L, use.names=TRUE)
list(list("narrow"), list("RangedSummarizedExperiment"))(x, start=NA, end=NA, width=NA, use.names=TRUE)
list(list("resize"), list("RangedSummarizedExperiment"))(x, width, fix="start", use.names=TRUE,
       ignore.strand=FALSE)
list(list("flank"), list("RangedSummarizedExperiment"))(x, width, start=TRUE, both=FALSE,
      use.names=TRUE, ignore.strand=FALSE)
list(list("promoters"), list("RangedSummarizedExperiment"))(x, upstream=2000, downstream=200)
list(list("restrict"), list("RangedSummarizedExperiment"))(x, start=NA, end=NA, keep.all.ranges=FALSE,
         use.names=TRUE)
list(list("trim"), list("RangedSummarizedExperiment"))(x, use.names=TRUE)

Arguments

Argument	Description
`x`	A RangedSummarizedExperiment object.
`shift, use.names`	See `?` in the IRanges package.
`start, end, width, fix`	See `?` in the IRanges package.
`ignore.strand, both`	See `?` in the IRanges package.
`upstream, downstream`	See `?` in the IRanges package.
`keep.all.ranges`	See `?` in the IRanges package.

Details

These transformations operate on the rowRanges component of the RangedSummarizedExperiment object, which can be a GenomicRanges or GRangesList object.

More precisely, any of the above functions performs the following transformation on RangedSummarizedExperiment object x :

 rowRanges(x) <- f(rowRanges(x), ...)

where f is the name of the function and ... any additional arguments passed to it.

See ? in the IRanges package for the details of how these transformations operate on a GenomicRanges or GRangesList object.

Examples

nrows <- 20; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)),
IRanges(sample(1000L, 20), width=100),
strand=Rle(c("+", "-"), c(12, 8)))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)

rse1 <- shift(rse0, 1)
stopifnot(identical(
rowRanges(rse1),
shift(rowRanges(rse0), 1)
))

se2 <- narrow(rse0, start=10, end=-15)
stopifnot(identical(
rowRanges(se2),
narrow(rowRanges(rse0), start=10, end=-15)
))

se3 <- resize(rse0, width=75)
stopifnot(identical(
rowRanges(se3),
resize(rowRanges(rse0), width=75)
))

se4 <- flank(rse0, width=20)
stopifnot(identical(
rowRanges(se4),
flank(rowRanges(rse0), width=20)
))

se5 <- restrict(rse0, start=200, end=700, keep.all.ranges=TRUE)
stopifnot(identical(
rowRanges(se5),
restrict(rowRanges(rse0), start=200, end=700, keep.all.ranges=TRUE)
))

makeSummarizedExperimentFromDataFrame()

Make a RangedSummarizedExperiment from a data.frame or DataFrame

Description

makeSummarizedExperimentFromDataFrame uses data.frame or DataFrame column names to create a GRanges object for the rowRanges of the resulting SummarizedExperiment object. It requires that non-range data columns be coercible into a numeric matrix for the SummarizedExperiment constructor. All columns that are not part of the row ranges attribute are assumed to be experiment data; thus, keeping metadata columns will not be supported. Note that this function only returns SummarizedExperiment objects with a single assay.

If metadata columns are to be kept, one can first construct the row ranges attribute by using the makeGRangesFromDataFrame function and subsequently creating the SummarizedExperiment .

Usage

makeSummarizedExperimentFromDataFrame(df,
                                    ...,
                                    seqinfo = NULL,
                                    starts.in.df.are.0based = FALSE)

Arguments

Argument	Description
`df`	A data.frame or DataFrame object. If not, then the function first tries to turn `df` into a data frame with `as.data.frame(df)` .
`...`	Additional arguments passed on to makeGRangesFromDataFrame
`seqinfo`	Either `NULL` , or a Seqinfo object, or a character vector of seqlevels, or a named numeric vector of sequence lengths. When not `NULL` , it must be compatible with the genomic ranges in `df` i.e. it must include at least the sequence levels represented in `df` .
`starts.in.df.are.0based`	`TRUE` or `FALSE` (the default). If `TRUE` , then the start positions of the genomic ranges in `df` are considered to be 0-based and are converted to 1-based in the returned GRanges object. This feature is intended to make it more convenient to handle input that contains data obtained from resources using the "0-based start" convention. A notorious example of such resource is the UCSC Table Browser ( http://genome.ucsc.edu/cgi-bin/hgTables ).

Value

A RangedSummarizedExperiment object with rowRanges and a single assay

Author

M. Ramos

Examples

## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------

# Note that rownames of the data.frame are also rownames of the result
df <- data.frame(chr="chr2", start = 11:15, end = 12:16,
strand = c("+", "-", "+", "*", "."), expr0 = 3:7,
expr1 = 8:12, expr2 = 12:16,
row.names = paste0("GENE", letters[5:1]))
df

exRSE <- makeSummarizedExperimentFromDataFrame(df)

exRSE

assay(exRSE)

rowRanges(exRSE)

makeSummarizedExperimentFromExpressionSet()

Make a RangedSummarizedExperiment object from an ExpressionSet and vice-versa

Description

Coercion between RangedSummarizedExperiment and ExpressionSet is supported in both directions.

For going from ExpressionSet to RangedSummarizedExperiment , the makeSummarizedExperimentFromExpressionSet function is also provided to let the user control how to map features to ranges.

Usage

makeSummarizedExperimentFromExpressionSet(from,
                                          mapFun=naiveRangeMapper,
                                          ...)
## range mapping functions
naiveRangeMapper(from)
probeRangeMapper(from)
geneRangeMapper(txDbPackage, key = "ENTREZID")

Arguments

Argument	Description
`from`	An ExpressionSet object.
`mapFun`	A function which takes an ExpressionSet object and returns a GRanges , or GRangesList object which corresponds to the genomic ranges used in the ExpressionSet. The rownames of the returned GRanges are used to match the featureNames of the ExpressionSet . The `naiveRangeMapper` function is used by default.
`...`	Additional arguments passed to `mapFun` .
`txDbPackage`	A character string with the Transcript Database to use for the mapping.
`key`	A character string with the Gene key to use for the mapping.

Value

makeSummarizedExperimentFromExpressionSet takes an ExpressionSet object as input and a list("range mapping ", " function") that maps the features to ranges. It then returns a RangedSummarizedExperiment object that corresponds to the input.

The range mapping functions return a GRanges object, with the rownames corresponding to the featureNames of the ExpressionSet object.

Author

Jim Hester, james.f.hester@gmail.com

Examples

## ---------------------------------------------------------------------
## GOING FROM ExpressionSet TO SummarizedExperiment
## ---------------------------------------------------------------------

data(sample.ExpressionSet, package="Biobase")

# naive coercion
makeSummarizedExperimentFromExpressionSet(sample.ExpressionSet)
as(sample.ExpressionSet, "RangedSummarizedExperiment")
as(sample.ExpressionSet, "SummarizedExperiment")

# using probe range mapper
makeSummarizedExperimentFromExpressionSet(sample.ExpressionSet, probeRangeMapper)

# using the gene range mapper
se <- makeSummarizedExperimentFromExpressionSet(
sample.ExpressionSet,
geneRangeMapper("TxDb.Hsapiens.UCSC.hg19.knownGene")
)
se
rowData(se)  # duplicate row names

## ---------------------------------------------------------------------
## GOING FROM SummarizedExperiment TO ExpressionSet
## ---------------------------------------------------------------------

example(RangedSummarizedExperiment)  # to create 'rse'
rse
as(rse, "ExpressionSet")

makeSummarizedExperimentFromLoom()

Make a SummarizedExperiment from a '.loom' hdf5 file

Description

makeSummarizedExperimentFromLoom represents a '.loom' file as a SummarizedExperiment . The '/matrix' and '/layers' are represented as HDF5Array objects; row and column attributes are parsed to DataFrame . Optionally, row or column attributes can be specified as row and and column names.

Usage

makeSummarizedExperimentFromLoom(file,
                                 rownames_attr = NULL,
                                 colnames_attr = NULL)

Arguments

Argument	Description
`file`	The path (as a single character string) to the HDF5 file where the dataset is located.
`rownames_attr`	The name of the row attribute to be used as row names.
`colnames_attr`	The name of the column attribute to be used as column names.

Value

A SummarizedExperiment object with row and column data and one or more assays.

Author

Martin Morgan

Examples

## ---------------------------------------------------------------------
## BASIC EXAMPLE
## ---------------------------------------------------------------------

file <- system.file(
package="SummarizedExperiment", "extdata", "example.loom"
)
se <- makeSummarizedExperimentFromLoom(file)
se
assay(se)
metadata(se)

nearest_methods()

Finding the nearest range neighbor in RangedSummarizedExperiment objects

Description

This man page documents the nearest methods and family (i.e. precede , follow , distance , and distanceToNearest methods) for RangedSummarizedExperiment objects.

Usage

list(list("precede"), list("RangedSummarizedExperiment,ANY"))(x, subject, select=c("arbitrary", "all"),
        ignore.strand=FALSE)
list(list("precede"), list("ANY,RangedSummarizedExperiment"))(x, subject, select=c("arbitrary", "all"),
        ignore.strand=FALSE)
list(list("follow"), list("RangedSummarizedExperiment,ANY"))(x, subject, select=c("arbitrary", "all"),
        ignore.strand=FALSE)
list(list("follow"), list("ANY,RangedSummarizedExperiment"))(x, subject, select=c("arbitrary", "all"),
        ignore.strand=FALSE)
list(list("nearest"), list("RangedSummarizedExperiment,ANY"))(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE)
list(list("nearest"), list("ANY,RangedSummarizedExperiment"))(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE)
list(list("distance"), list("RangedSummarizedExperiment,ANY"))(x, y, ignore.strand=FALSE, ...)
list(list("distance"), list("ANY,RangedSummarizedExperiment"))(x, y, ignore.strand=FALSE, ...)
list(list("distanceToNearest"), list("RangedSummarizedExperiment,ANY"))(x, subject, ignore.strand=FALSE, ...)
list(list("distanceToNearest"), list("ANY,RangedSummarizedExperiment"))(x, subject, ignore.strand=FALSE, ...)

Arguments

Argument	Description
`x, subject`	One of these two arguments must be a RangedSummarizedExperiment object.
`select, ignore.strand`	See `?` in the GenomicRanges package.
`y`	For the `distance` methods, one of `x` or `y` must be a RangedSummarizedExperiment object.
`...`	Additional arguments for methods.

Details

These methods operate on the rowRanges component of the RangedSummarizedExperiment object, which can be a GenomicRanges or GRangesList object.

More precisely, if any of the above functions is passed a RangedSummarizedExperiment object thru the x , subject , and/or y argument, then it behaves as if rowRanges(x) , rowRanges(subject) , and/or rowRanges(y) had been passed instead.

See ? in the GenomicRanges package for the details of how nearest and family operate on GenomicRanges and GRangesList objects.

Value

See ? in the GenomicRanges package.

Examples

nrows <- 20; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)),
IRanges(sample(1000L, 20), width=100),
strand=Rle(c("+", "-"), c(12, 8)))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
rse1 <- shift(rse0, 100)

res <- nearest(rse0, rse1)
res
stopifnot(identical(res, nearest(rowRanges(rse0), rowRanges(rse1))))
stopifnot(identical(res, nearest(rse0, rowRanges(rse1))))
stopifnot(identical(res, nearest(rowRanges(rse0), rse1)))

res <- nearest(rse0)  # missing subject
res
stopifnot(identical(res, nearest(rowRanges(rse0))))

hits <- nearest(rse0, rse1, select="all")
hits
stopifnot(identical(
hits,
nearest(rowRanges(rse0), rowRanges(rse1), select="all")
))
stopifnot(identical(
hits,
nearest(rse0, rowRanges(rse1), select="all")
))
stopifnot(identical(
hits,
nearest(rowRanges(rse0), rse1, select="all")
))

readKallisto()

Input kallisto or kallisto bootstrap results.

Description

readKallisto inputs several kallisto output files into a single SummarizedExperiment instance, with rows corresponding to estimated transcript abundance and columns to samples. readKallistoBootstrap inputs kallisto bootstrap replicates of a single sample into a matrix of transcript x bootstrap abundance estimates.

Usage

readKallisto(files,
    json = file.path(dirname(files), "run_info.json"), 
    h5 = any(grepl("\.h5$", files)), what = KALLISTO_ASSAYS,
    as = c("SummarizedExperiment", "list", "matrix"))
readKallistoBootstrap(file, i, j)

Arguments

Argument	Description
`files`	character() paths to kallisto abundance.tsv output files. The assumption is that files are organized in the way implied by kallisto, with each sample in a distinct directory, and the directory containing files abundance.tsv, run_info.json, and perhaps abundance.h5.
`json`	character() vector of the same length as `files` specifying the location of JSON files produced by kallisto and containing information on the run. The default assumes that json files are in the same directory as the corresponding abundance file.
`h5`	character() vector of the same length as `files` specifying the location of HDF5 files produced by kallisto and containing bootstrap estimates. The default assumes that HDF5 files are in the same directory as the corresponding abundance file.
`what`	character() vector of kallisto per-sample outputs to be input. See KALLISTO_ASSAYS for available values.
`as`	character(1) specifying the output format. See `Value` for additional detail.
`file`	character(1) path to a single HDF5 output file.
`i, j`	integer() vector of row ( `i` ) and column ( `j` ) indexes to input.

Value

A SummarizedExperiment , list , or matrix , depending on the value of argument as ; by default a SummarizedExperiment . The as="SummarizedExperiment" rowData(se) the length of each transcript; colData(se) includes summary information on each sample, including the number of targets and bootstraps, the kallisto and index version, the start time and operating system call used to create the file. assays() contains one or more transcript x sample matrices of parameters estimated by kallisto (see KALLISTO_ASSAYS ).

as="list" return value contains information simillar to SummarizedExperiment with row, column and assay data as elements of the list without coordination of row and column annotations into an integrated data container. as="matrix" returns the specified assay as a simple R matrix.

Author

Martin Morgan martin.morgan@roswellpark.org

References

http://pachterlab.github.io/kallisto software for quantifying transcript abundance.

Examples

outputs <- system.file(package="SummarizedExperiment", "extdata",
"kallisto")
files <- dir(outputs, pattern="abundance.tsv", full=TRUE, recursive=TRUE)
stopifnot(all(file.exists(files)))

## default: input 'est_counts'
(se <- readKallisto(files, as="SummarizedExperiment"))
str(readKallisto(files, as="list"))
str(readKallisto(files, as="matrix"))

## available assays
KALLISTO_ASSAYS
## one or more assay
readKallisto(files, what=c("tpm", "eff_length"))

## alternatively: read hdf5 files
files <- sub(".tsv", ".h5", files, fixed=TRUE)
readKallisto(files)

## input all bootstraps
xx <- readKallistoBootstrap(files[1])
ridx <- head(which(rowSums(xx) != 0), 3)
cidx <- c(1:5, 96:100)
xx[ridx, cidx]

## selective input of rows (transcripts) and/or bootstraps
readKallistoBootstrap(files[1], i=c(ridx, rev(ridx)), j=cidx)

v3.9.0

bioconductor v3.9.0 SummarizedExperiment

Link to this section Summary

Functions

Link to this section Functions

Assays_class()

Description

Details

Seealso

Author

Examples

RangedSummarizedExperiment_class()

Description

Usage

Arguments

Details

Seealso

Author

Examples

SummarizedExperiment_class()

Description

Usage

Arguments

Details

Seealso

Author

Examples

coverage_methods()

Description

Usage

Arguments

Details

Value

Seealso

Examples

findOverlaps_methods()

Description

Usage

Arguments

Details

Value

Seealso

Examples

inter_range_methods()

Description

Usage

Arguments

Details

Value

Seealso

Examples

intra_range_methods()

Description

Usage

Arguments

Details

Seealso

Examples

makeSummarizedExperimentFromDataFrame()

Description

Usage

Arguments

Value

Seealso

Author

Examples

makeSummarizedExperimentFromExpressionSet()

Description

Usage

Arguments

Value

Seealso

Author

Examples

makeSummarizedExperimentFromLoom()

Description

Usage

Arguments

Value

Seealso