bioconductor v3.9.0 S4Vectors

The S4Vectors package defines the Vector and List virtual classes

Link to this section Summary

Functions

Annotated class

DataFrame objects

DataTable objects

Matrix for Filter Results

Collection of Filter Rules

List of Hits objects

Hits objects

Comparing and ordering hits

Set operations on Hits objects

LLint vectors

List objects

Common operations on List objects

Pairs objects

Rle objects

Fixed-width running window summaries

Common operations on Rle objects

S4Vectors internals

SimpleList objects

Vector objects

Compare, order, tabulate vector-like objects

Merge vector-like objects

Set operations on vector-like objects

Compute summary statistics of subsets of vector-like objects

Some utility functions to operate on strings

Unlist the list-like columns of a DataFrame object

Some utility functions to operate on integer vectors

Test if a vector-like object is sorted

Apply a function over subsequences of 2 vector-like objects

Display utilities

Divide a vector-like object into groups

Subsetting utilities

Convert between parallel vectors and lists

Link to this section Functions

Link to this function

Annotated_class()

Annotated class

Description

The virtual class Annotated is used to standardize the storage of metadata with a subclass.

Details

The Annotated class supports the storage of global metadata in a subclass. This is done through the metadata slot that stores a list object.

Seealso

The Vector class, which extends Annotated directly.

Author

P. Aboyoun

Examples

showClass("Annotated")  # shows (some of) the known subclasses

## If the IRanges package was not already loaded, this will show
## more subclasses:
library(IRanges)
showClass("Annotated")
Link to this function

DataFrame_class()

DataFrame objects

Description

The DataFrame class extends the DataTable virtual class and supports the storage of any type of object (with length and [ methods) as columns.

Details

On the whole, the DataFrame behaves very similarly to data.frame , in terms of construction, subsetting, splitting, combining, etc. The most notable exception is that the row names are optional. This means calling rownames(x) will return NULL if there are no row names. Of course, it could return seq_len(nrow(x)) , but returning NULL informs, for example, combination functions that no row names are desired (they are often a luxury when dealing with large data).

As DataFrame derives from Vector , it is possible to set an annotation string. Also, another DataFrame can hold metadata on the columns.

For a class to be supported as a column, it must have length and [ methods, where [ supports subsetting only by i and respects drop=FALSE . Optionally, a method may be defined for the showAsCell generic, which should return a vector of the same length as the subset of the column passed to it. This vector is then placed into a data.frame and converted to text with format . Thus, each element of the vector should be some simple, usually character, representation of the corresponding element in the column.

Seealso

Author

Michael Lawrence

Examples

score <- c(1L, 3L, NA)
counts <- c(10L, 2L, NA)
row.names <- c("one", "two", "three")

df <- DataFrame(score) # single column
df[["score"]]
df <- DataFrame(score, row.names = row.names) #with row names
rownames(df)

df <- DataFrame(vals = score) # explicit naming
df[["vals"]]

# arrays
ary <- array(1:4, c(2,1,2))
sw <- DataFrame(I(ary))

# a data.frame
sw <- DataFrame(swiss)
as.data.frame(sw) # swiss, without row names
# now with row names
sw <- DataFrame(swiss, row.names = rownames(swiss))
as.data.frame(sw) # swiss

# subsetting

sw[] # identity subset
sw[,] # same

sw[NULL] # no columns
sw[,NULL] # no columns
sw[NULL,] # no rows

## select columns
sw[1:3]
sw[,1:3] # same as above
sw[,"Fertility"]
sw[,c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)]

## select rows and columns
sw[4:5, 1:3]

sw[1] # one-column DataFrame
## the same
sw[, 1, drop = FALSE]
sw[, 1] # a (unnamed) vector
sw[[1]] # the same
sw[["Fertility"]]

sw[["Fert"]] # should return 'NULL'

sw[1,] # a one-row DataFrame
sw[1,, drop=TRUE] # a list

## duplicate row, unique row names are created
sw[c(1, 1:2),]

## indexing by row names
sw["Courtelary",]
subsw <- sw[1:5,1:4]
subsw["C",] # partially matches

## row and column names
cn <- paste("X", seq_len(ncol(swiss)), sep = ".")
colnames(sw) <- cn
colnames(sw)
rn <- seq(nrow(sw))
rownames(sw) <- rn
rownames(sw)

## column replacement

df[["counts"]] <- counts
df[["counts"]]
df[[3]] <- score
df[["X"]]
df[[3]] <- NULL # deletion
Link to this function

DataTable_class()

DataTable objects

Description

DataTable is an API only (i.e. virtual class with no slots) for accessing objects with a rectangular shape like DataFrame or DelayedMatrix objects.

Seealso

Examples

showClass("DataTable")  # shows (some of) the known subclasses

library(IRanges)
df <- DataFrame(as.data.frame(UCBAdmissions))
xtabs(Freq ~ Gender + Admit, df)
Link to this function

FilterMatrix_class()

Matrix for Filter Results

Description

A FilterMatrix object is a matrix meant for storing the logical output of a set of FilterRules , where each rule corresponds to a column. The FilterRules are stored within the FilterMatrix object, for the sake of provenance. In general, a FilterMatrix behaves like an ordinary matrix .

Seealso

Author

Michael Lawrence

Link to this function

FilterRules_class()

Collection of Filter Rules

Description

A FilterRules object is a collection of filter rules, which can be either expression or function objects. Rules can be disabled/enabled individually, facilitating experimenting with different combinations of filters.

Details

It is common to split a dataset into subsets during data analysis. When data is large, however, representing subsets (e.g. by logical vectors) and storing them as copies might become too costly in terms of space. The FilterRules class represents subsets as lightweight expression and/or function objects. Subsets can then be calculated when needed (on the fly). This avoids copying and storing a large number of subsets. Although it might take longer to frequently recalculate a subset, it often is a relatively fast operation and the space savings tend to be more than worth it when data is large.

Rules may be either expressions or functions. Evaluating an expression or invoking a function should result in a logical vector. Expressions are often more convenient, but functions (i.e. closures) are generally safer and more powerful, because the user can specify the enclosing environment. If a rule is an expression, it is evaluated inside the envir argument to the eval method (see below). If a function, it is invoked with envir as its only argument. See examples.

Seealso

FilterMatrix objects for storing the logical output of a set of FilterRules objects.

Author

Michael Lawrence

Examples

## constructing a FilterRules instance

## an empty set of filters
filters <- FilterRules()

## as a simple character vector
filts <- c("peaks", "promoters")
filters <- FilterRules(filts)
active(filters) # all TRUE

## with functions and expressions
filts <- list(peaks = expression(peaks), promoters = expression(promoters),
find_eboxes = function(rd) rep(FALSE, nrow(rd)))
filters <- FilterRules(filts, active = FALSE)
active(filters) # all FALSE

## direct, quoted args (character literal parsed)
filters <- FilterRules(under_peaks = peaks, in_promoters = "promoters")
filts <- list(under_peaks = expression(peaks),
in_promoters = expression(promoters))

## specify both exprs and additional args
filters <- FilterRules(filts, diffexp = de)

filts <- c("promoters", "peaks", "introns")
filters <- FilterRules(filts)

## evaluation
df <- DataFrame(peaks = c(TRUE, TRUE, FALSE, FALSE),
promoters = c(TRUE, FALSE, FALSE, TRUE),
introns = c(TRUE, FALSE, FALSE, FALSE))
eval(filters, df)
fm <- evalSeparately(filters, df)
identical(filterRules(fm), filters)
summary(fm)
summary(fm, percent = TRUE)
fm <- evalSeparately(filters, df, serial = TRUE)

## set the active state directly

active(filters) <- FALSE # all FALSE
active(filters) <- TRUE # all TRUE
active(filters) <- c(FALSE, FALSE, TRUE)
active(filters)["promoters"] <- TRUE # use a filter name

## toggle the active state by name or index

active(filters) <- c(NA, 2) # NA's are dropped
active(filters) <- c("peaks", NA)
Link to this function

HitsList_class()

List of Hits objects

Description

The HitsList class stores a set of Hits objects. It's typically used to represent the result of findOverlaps on two IntegerRangesList objects.

Details

Roughly the same set of utilities are provided for HitsList as for Hits :

The as.matrix method coerces a HitsList object in a similar way to Hits , except a column is prepended that indicates which space (or element in the query IntegerRangesList ) to which the row corresponds.

The as.table method flattens or unlists the list, counts the number of hits for each query range and outputs the counts as a table , which has the same shape as from a single Hits object.

To transpose a HitsList object x , so that the subject and query in each space are interchanged, call t(x) . This allows, for example, counting the number of hits for each subject element using as.table .

Seealso

Note

This class is highly experimental. It has not been well tested and may disappear at any time.

Author

Michael Lawrence

Hits objects

Description

The Hits class is a container for representing a set of hits between a set of left nodes and a set of right nodes . Note that only the hits are stored in the object. No information about the left or right nodes is stored, except their number.

For example, the findOverlaps function, defined and documented in the IRanges package, returns the hits between the query and subject arguments in a Hits object.

Usage

## Constructor functions
Hits(from=integer(0), to=integer(0), nLnode=0L, nRnode=0L, ...,
     sort.by.query=FALSE)
SelfHits(from=integer(0), to=integer(0), nnode=0L, ...,
     sort.by.query=FALSE)

Arguments

ArgumentDescription
from, to2 integer vectors of the same length. The values in from must be >= 1 and <= nLnode . The values in to must be >= 1 and <= nRnode .
nLnode, nRnodeNumber of left and right nodes.
...Metadata columns to set on the Hits object. All the metadata columns must be vector-like objects of the same length as from and to .
sort.by.queryShould the hits in the returned object be sorted by query? If yes, then a SortedByQueryHits object is returned (SortedByQueryHits is a subclass of Hits).
nnodeNumber of nodes.

Seealso

  • Hits-comparison for comparing and ordering hits.

  • The findOverlaps function in the IRanges package which returns SortedByQueryHits object.

  • Hits-examples in the IRanges package, for some examples of Hits object basic manipulation.

  • setops-methods in the IRanges package, for set operations on Hits objects.

Author

Michael Lawrence and Hervé Pagès

Examples

from <- c(5, 2, 3, 3, 3, 2)
to <- c(11, 15, 5, 4, 5, 11)
id <- letters[1:6]

Hits(from, to, 7, 15, id)
Hits(from, to, 7, 15, id, sort.by.query=TRUE)

## ---------------------------------------------------------------------
## selectHits()
## ---------------------------------------------------------------------

x <- c("a", "b", "a", "c", "d")
table <- c("a", "e", "d", "a", "a", "d")
hits <- findMatches(x, table)  # sorts the hits by query
hits

selectHits(hits, select="all")  # no-op

selectHits(hits, select="first")
selectHits(hits, select="first", nodup=TRUE)

selectHits(hits, select="last")
selectHits(hits, select="last", nodup=TRUE)

selectHits(hits, select="arbitrary")
selectHits(hits, select="count")

## ---------------------------------------------------------------------
## remapHits()
## ---------------------------------------------------------------------

Lnodes.remapping <- factor(c(a="A", b="B", c="C", d="D")[x],
levels=LETTERS[1:4])
remapHits(hits, Lnodes.remapping=Lnodes.remapping)

## See ?`Hits-examples` in the IRanges package for more examples of basic
## manipulation of Hits objects.

## ---------------------------------------------------------------------
## SelfHits objects
## ---------------------------------------------------------------------

hits2 <- SelfHits(c(2, 3, 3, 3, 3, 3, 4, 4, 4), c(4, 3, 2:4, 2, 2:3, 2), 4)
## Hits 2 and 4 are self hits (from 3rd node to itself):
which(isSelfHit(hits2))
## Hits 4, 6, 7, 8, and 9, are redundant hits:
which(isRedundantHit(hits2))

hits3 <- findMatches(x)
hits3[!isSelfHit(hits3)]
|hits3[!(isSelfHit(hits3) | isRedundantHit(hits3))]|
Link to this function

Hits_comparison()

Comparing and ordering hits

Description

== , != , <= , >= , < , > , match() , %in% , order() , sort() , and rank() can be used on Hits objects to compare and order hits.

Note that only the "pcompare" , "match" , and "order" methods are actually defined for Hits objects. This is all what is needed to make all the other comparing and ordering operations (i.e. == , != , <= , >= , < , > , %in% , sort() , and rank() ) work on these objects (see ?`` for more information about this). ## Usage ```r list(list("pcompare"), list("Hits,Hits"))(x, y) list(list("match"), list("Hits,Hits"))(x, table, nomatch=NA_integer_, incomparables=NULL, method=c("auto", "quick", "hash")) list(list("order"), list("Hits"))(..., na.last=TRUE, decreasing=FALSE, method=c("auto", "shell", "radix")) ``` ## Arguments |Argument |Description| |------------- |----------------| |x, y, table| Compatible [Hits](#hits) objects, that is, [Hits](#hits) objects with the same subject and query lengths. | |nomatch| The value to be returned in the case when no match is found. It is coerced to aninteger. | |incomparables| Not supported. | |method| Formatch: Use a Quicksort-based (method="quick") or a hash-based (method="hash") algorithm. The latter tends to give better performance, except maybe for some pathological input that we've not encountered so far. Whenmethod="auto"is specified, the most efficient algorithm will be used, that is, the hash-based algorithm iflength(x) <= 2^29, otherwise the Quicksort-based algorithm. Fororder: Themethodargument is ignored. | |...| One or more [Hits](#hits) objects. The additional [Hits](#hits) objects are used to break ties. | |na.last| Ignored. | |decreasing|TRUEorFALSE. | ## Details Only hits that belong to [Hits](#hits) objects with same subject and query lengths can be compared. Hits are ordered by query hit first, and then by subject hit. On a [Hits](#hits) object,order,sort, andrankare consistent with this order. list(" ", " ", list(list(), list(" ", " ", list("pcompare(x, y)"), ": ", " Performs element-wise (aka "parallel") comparison of 2 ", list("Hits"), " ", " objects ", list("x"), " and ", list("y"), ", that is, returns an integer vector where ", " the i-th element is less than, equal to, or greater than zero if ", " ", list("x[i]"), " is considered to be respectively less than, equal to, or ", " greater than ", list("y[i]"), ". See ", list("?", list("Vector-comparison"),

"`"), " for

", " how ", list("x"), " or ", list("y"), " is recycled when the 2 objects don't have the ", " same length. ", " ")), " ", " ", list(list(), list(" ", " ", list("match(x, table, nomatch=NAinteger, method=c("auto", "quick", "hash"))"), ": ", " Returns an integer vector of the length of ", list("x"), ", containing the ", " index of the first matching hit in ", list("table"), " (or ", list("nomatch"), " if ", " there is no matching hit) for each hit in ",

list("x"), ".

", " ")), " ", " ", list(list(), list(" ", " ", list("order(...)"), ": ", " Returns a permutation which rearranges its first argument (a ", list("Hits"), " ", " object) into ascending order, breaking ties by further arguments (also ", " ", list("Hits"), " objects). ", " ")), " ", " ")

Seealso

  • Hits objects.

  • Vector-comparison for general information about comparing, ordering, and tabulating vector-like objects.

Author

Hervé Pagès

Examples

## ---------------------------------------------------------------------
## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 Hits OBJECTS
## ---------------------------------------------------------------------
hits <- Hits(c(2, 4, 4, 4, 5, 5), c(3, 1, 3, 2, 3, 2), 6, 3)
hits

pcompare(hits, hits[3])
pcompare(hits[3], hits)

hits == hits[3]
hits != hits[3]
hits >= hits[3]
hits < hits[3]

## ---------------------------------------------------------------------
## B. match(), %in%
## ---------------------------------------------------------------------
table <- hits[-c(1, 3)]
match(hits, table)

hits %in% table

## ---------------------------------------------------------------------
## C. order(), sort(), rank()
## ---------------------------------------------------------------------
order(hits)
sort(hits)
rank(hits)

Set operations on Hits objects

Description

Perform set operations on Hits objects.

Details

union(x, y) , intersect(x, y) , setdiff(x, y) , and setequal(x, y) work on Hits objects x and y only if the objects are compatible Hits objects , that is, if they have the same subject and query lengths. These operations return respectively the union, intersection, (asymmetric!) difference, and equality of the sets of hits in x and y .

Value

union returns a Hits object obtained by appending to x the hits in y that are not already in x .

intersect returns a Hits object obtained by keeping only the hits in x that are also in y .

setdiff returns a Hits object obtained by dropping from x the hits that are in y .

setequal returns TRUE if x and y contain the same sets of hits and FALSE otherwise.

union , intersect , and setdiff propagate the names and metadata columns of their first argument ( x ).

Seealso

  • Hits objects.

  • Hits-comparison for comparing and ordering hits.

  • BiocGenerics:: , BiocGenerics:: , and BiocGenerics:: in the BiocGenerics package for general information about these generic functions.

Author

Hervé Pagès and Michael Lawrence

Examples

x <- Hits(c(2, 4, 4, 4, 5, 5), c(3, 1, 3, 2, 3, 2), 6, 3,
score=11:16)
x

y <- Hits(c(1, 3, 4, 4, 5, 5, 5), c(3, 3, 2, 1, 2, 1, 3), 6, 3,
score=21:27)
y

union(x, y)
union(y, x)  # same hits as in union(x, y), but in different order

intersect(x, y)
intersect(y, x)  # same hits as in intersect(x, y), but in
# different order

setdiff(x, y)
setdiff(y, x)

setequal(x, y)

LLint vectors

Description

The LLint class is a container for storing a vector of list("large ", " integers") (i.e. long long int values at the C level).

Usage

LLint(length=0L)
as.LLint(x)
is.LLint(x)

Arguments

ArgumentDescription
lengthA non-negative number (i.e. integer, double, or LLint value) specifying the desired length.
xObject to be coerced or tested.

Details

LLint vectors aim to provide the same functionality as integer vectors in base R but their values are stored as long long int values at the C level vs int values for integer vectors. Note that on Intel platforms long long int values are 64-bit and int values 32-bit only. Therefore LLint vectors can hold values in the +/-9.223e18 range (approximately) vs +/-2.147e9 only for integer vectors.

NAs are supported and the NA_LLint_ constant is predefined for convenience as as(NA, "LLint") .

Names are not supported for now.

Coercions from/to logical, integer, double, and character are supported.

Operations from the Arith , Compare and Summary groups are supported.

More operations coming soon...

Seealso

Author

Hervé Pagès

Examples

## A long long int uses 8 bytes (i.e. 64 bits) in C:
.Machine$sizeof.longlong

## ---------------------------------------------------------------------
## SIMPLE EXAMPLES
## ---------------------------------------------------------------------

LLint()
LLint(10)

as.LLint(3e9)
as.LLint("3000000000")

x <- as.LLint(1:10 * 111111111)
x * x
5 * x   # result as vector of doubles (i.e. 'x' coerced to double)
5L * x  # result as LLint vector (i.e. 5L coerced to LLint vector)
max(x)
min(x)
range(x)
sum(x)

x <- as.LLint(1:20)
prod(x)
x <- as.LLint(1:21)
prod(x)  # result is out of LLint range (+/-9.223e18)
prod(as.numeric(x))

x <- as.LLint(1:75000)
sum(x * x * x) == sum(x) * sum(x)

## Note that max(), min() and range() *always* return an LLint vector
## when called on an LLint vector, even when the vector is empty:
max(LLint())  # NA with no warning
min(LLint())  # NA with no warning

## This differs from how max(), min() and range() behave on an empty
## integer vector:
max(integer())  # -Inf with a warning
min(integer())  #  Inf with a warning

## ---------------------------------------------------------------------
## GOING FROM STRINGS TO INTEGERS
## ---------------------------------------------------------------------

## as.integer() behaves like as.integer(as.double()) on a character
## vector. With the following consequence:
s <- "-2.9999999999999999"
as.integer(s)   # -3

## as.LLint() converts the string *directly* to LLint, without
## coercing to double first:
as.LLint(s)  # decimal part ignored

## ---------------------------------------------------------------------
## GOING FROM DOUBLE-PRECISION VALUES TO INTEGERS AND VICE-VERSA
## ---------------------------------------------------------------------

## Be aware that a double-precision value is not guaranteed to represent
## exactly an integer > 2^53. This can cause some surprises:
2^53 == 2^53 + 1  # TRUE, yep!

## And therefore:
as.LLint(2^53) == as.LLint(2^53 + 1)  # also TRUE

## This can be even more disturbing when passing a big literal integer
## value because the R parser will turn it into a double-precision value
## before passing it to as.LLint():
x1 <- as.LLint(9007199254740992)  # same as as.LLint(2^53)
x1
x2 <- as.LLint(9007199254740993)  # same as as.LLint(2^53 + 1)
x2
x1 == x2  # still TRUE

## However, no precision is lost if a string literal is used instead:
x1 <- as.LLint("9007199254740992")
x1
x2 <- as.LLint("9007199254740993")
x2
x1 == x2  # FALSE
x2 - x1

d1 <- as.double(x1)
d2 <- as.double(x2)  # warning!
d1 == d2  # TRUE

## ---------------------------------------------------------------------
## LLint IS IMPLEMENTED AS AN S4 CLASS
## ---------------------------------------------------------------------

class(LLint(10))
typeof(LLint(10))        # S4
storage.mode(LLint(10))  # S4
is.vector(LLint(10))     # FALSE
is.atomic(LLint(10))     # FALSE

## This means that an LLint vector cannot go in an ordinary data
## frame:
data.frame(id=as.LLint(1:5))  # error!
## A DataFrame needs to be used instead:
DataFrame(id=as.LLint(1:5))

## ---------------------------------------------------------------------
## SANITY CHECKS
## ---------------------------------------------------------------------

x <- as.integer(c(0, 1, -1, -3, NA, -99))
y <- as.integer(c(-6, NA, -4:3, 0, 1999, 6:10, NA))
xx <- as.LLint(x)
yy <- as.LLint(y)

## Operations from "Arith" group:
stopifnot(identical(x + y, as.integer(xx + yy)))
stopifnot(identical(as.LLint(y + x), yy + xx))
stopifnot(identical(x - y, as.integer(xx - yy)))
stopifnot(identical(as.LLint(y - x), yy - xx))
stopifnot(identical(x * y, as.integer(xx * yy)))
stopifnot(identical(as.LLint(y * x), yy * xx))
stopifnot(identical(x / y, xx / yy))
stopifnot(identical(y / x, yy / xx))
stopifnot(identical(x %/% y, as.integer(xx %/% yy)))
stopifnot(identical(as.LLint(y %/% x), yy %/% xx))
stopifnot(identical(x %% y, as.integer(xx %% yy)))
stopifnot(identical(as.LLint(y %% x), yy %% xx))
stopifnot(identical(x ^ y, xx ^ yy))
stopifnot(identical(y ^ x, yy ^ xx))

## Operations from "Compare" group:
stopifnot(identical(x == y, xx == yy))
stopifnot(identical(y == x, yy == xx))
stopifnot(identical(x != y, xx != yy))
stopifnot(identical(y != x, yy != xx))
stopifnot(identical(x <= y, xx <= yy))
stopifnot(identical(y <= x, yy <= xx))
stopifnot(identical(x >= y, xx >= yy))
stopifnot(identical(y >= x, yy >= xx))
stopifnot(identical(x < y, xx < yy))
stopifnot(identical(y < x, yy < xx))
stopifnot(identical(x > y, xx > yy))
stopifnot(identical(y > x, yy > xx))

## Operations from "Summary" group:
stopifnot(identical(max(y), as.integer(max(yy))))
stopifnot(identical(max(y, na.rm=TRUE), as.integer(max(yy, na.rm=TRUE))))
stopifnot(identical(min(y), as.integer(min(yy))))
stopifnot(identical(min(y, na.rm=TRUE), as.integer(min(yy, na.rm=TRUE))))
stopifnot(identical(range(y), as.integer(range(yy))))
stopifnot(identical(range(y, na.rm=TRUE), as.integer(range(yy, na.rm=TRUE))))
stopifnot(identical(sum(y), as.integer(sum(yy))))
stopifnot(identical(sum(y, na.rm=TRUE), as.integer(sum(yy, na.rm=TRUE))))
stopifnot(identical(prod(y), as.double(prod(yy))))
stopifnot(identical(prod(y, na.rm=TRUE), as.double(prod(yy, na.rm=TRUE))))

List objects

Description

List objects are Vector objects with a "[[" , elementType and elementNROWS method. The List class serves a similar role as list in base R.

It adds one slot, the elementType slot, to the two slots shared by all Vector objects.

The elementType slot is the preferred location for List subclasses to store the type of data represented in the sequence. It is designed to take a character of length 1 representing the class of the sequence elements. While the List class performs no validity checking based on elementType , if a subclass expects elements to be of a given type, that subclass is expected to perform the necessary validity checking. For example, the subclass IntegerList (defined in the IRanges package) has elementType = "integer" and its validity method checks if this condition is TRUE.

To be functional, a class that inherits from List must define at least a "[[" method (in addition to the minimum set of Vector methods).

Seealso

  • List-utils for common operations on List objects.

  • Vector objects for the parent class.

  • The SimpleList class for a direct extension of the List class.

  • The CompressedList class defined in the IRanges package for another direct extension of the List class.

  • The IntegerList , RleList , and IRanges classes and constructors defined in the IRanges package for more examples of concrete List subclasses.

  • The extractList function defined in the IRanges package for grouping elements of a vector-like object into a list-like object.

Author

P. Aboyoun and H. Pagès

Examples

showClass("List")  # shows (some of) the known subclasses

Common operations on List objects

Description

Various functions and methods for looping on List objects, functional programming on List objects, and evaluation of an expression in a List object.

Usage

## Looping on List objects:
## ------------------------
list(list("lapply"), list("List"))(X, FUN, ...)
list(list("sapply"), list("List"))(X, FUN, ..., simplify=TRUE, USE.NAMES=TRUE)
endoapply(X, FUN, ...)
revElements(x, i)
mendoapply(FUN, ..., MoreArgs=NULL)
pc(...)
## Functional programming methods for List objects:
## ------------------------------------------------
list(list("Reduce"), list("List"))(f, x, init, right=FALSE, accumulate=FALSE)
list(list("Filter"), list("List"))(f, x)
list(list("Find"), list("List"))(f, x, right=FALSE, nomatch=NULL)
list(list("Map"), list("List"))(f, ...)
list(list("Position"), list("List"))(f, x, right=FALSE, nomatch=NA_integer_)
## Evaluation of an expression in a List object:
## ---------------------------------------------
list(list("within"), list("List"))(data, expr, ...)
## Constructing list matrices:
## ---------------------------------------------
list(list("rbind"), list("List"))(..., deparse.level=1L)
list(list("cbind"), list("List"))(..., deparse.level=1L)

Arguments

ArgumentDescription
X, xA list, data.frame or List object.
FUNThe function to be applied to each element of X (for endoapply ) or for the elements in ... (for mendoapply ).
...For lapply , sapply , and endoapply , optional arguments to FUN . For mendoapply , pc and Map , one or more list-like objects.
simplify, USE.NAMESSee ?base:: for a description of these arguments.
MoreArgsA list of other arguments to FUN .
iIndex specifying the elements to replace. Can be anything supported by [<- .
f, init, right, accumulate, nomatchSee ?base:: for a description of these arguments.
dataA List object.
exprExpression to evaluate.
deparse.levelSee ?base:: for a description of this argument.

Details

list(list("Looping on List objects"), list(" ", " Like the standard ", list(list("lapply")), " function defined in the ", " ", list("base"), " package, the ", list("lapply"), " method for ", list("List"), " objects ", " returns a list of the same length as ", list("X"), ", with each element being ", " the result of applying ", list("FUN"), " to the corresponding element of ", list("X"), ". ", " ", " Like the standard ", list(list("sapply")), " function defined in the ", " ",

list("base"), " package, the ", list("sapply"), " method for ", list("List"), " objects

", " is a user-friendly version of ", list("lapply"), " by default returning a vector ", " or matrix if appropriate. ", " ", " ", list("endoapply"), " and ", list("mendoapply"), " perform the endomorphic equivalents ", " of ", list(list("lapply")), " and ", list(list("mapply")), " by returning ", " objects of the same class as the inputs rather than an ordinary list. ", " ", " ",

list("revElements(x, i)"), " reverses the list elements in ", list("x"), " specified

", " by ", list("i"), ". It's equivalent to, but faster than, doing ", " ", list("x[i] <- endoapply(x[i], rev)"), ". ", " ", " ", list("pc(...)"), " combine list-like objects by concatenating them in an ", " element-wise fashion. It's similar to, but faster than, ", " ", list("mapply(c, ..., SIMPLIFY=FALSE)"), ". With the following differences: ", " ", list(" ", " ", list(), " ",

    list("pc()"), " ignores the supplied objects that are NULL.

", " ", list(), " ", list("pc()"), " does not recycle its arguments. All the supplied ", " objects must have the same length. ", " ", list(), " If one of the supplied objects is a ", list("List"), " object, then ", " ", list("pc()"), " returns a ", list("List"), " object. ", " ", list(), " ", list("pc()"), " always returns a homogenous list or ", list("List"), " object, ", " that is, an object where all the list elements have the same type. ",

    "    "), "

", " ")) list(list("Functional programming methods for List objects"), list(" ", " The R base package defines some higher-order functions that are commonly ", " found in Functional Programming Languages. ", " See ", list("?base::", list("Reduce")), " for the details, and, in particular, ", " for a description of their arguments. ", " The ", list("S4Vectors"), " package provides methods for ", list("List"), " objects, so, ", " in addition to be an ordinary vector or list, the ", list(

"x"), " argument can

", " also be a ", list("List"), " object. ", " ")) list(list("Evaluation of an expression in a List object"), list(" ", " ", list("within"), " evaluates ", list("expr"), " within ", list("as.env(data)"), " via ", " ", list("eval(data)"), ". Similar to ", list("with"), ", except assignments made ", " during evaluation are taken as assignments into ", list("data"), ", i.e., ", " new symbols have their value appended to ", list("data"), ", and assigning ", " new values to existing symbols results in replacement. ", " ")) list(list("Binding Lists into a matrix"), list(" ", " There are methods for ", list("cbind"), " and ", list("rbind"), " that will bind ", " multiple lists together into a basic list matrix. The usual ", " geometric constraints apply. In the future, this might return a List ", " (+ dimensions), but for now the return value is an ordinary list. ", " "))

Value

endoapply returns an object of the same class as X , each element of which is the result of applying FUN to the corresponding element of X .

mendoapply returns an object of the same class as the first object specified in list() , each element of which is the result of applying FUN to the corresponding elements of list() .

pc returns a list or List object of the same length as the input objects.

See ?base:: for the value returned by the functional programming methods.

See ?base:: for the value returned by within .

cbind and rbind return a list matrix.

Seealso

  • The List class.

  • base:: and base:: for the default lapply and mapply methods.

  • base:: for the default functional programming methods.

  • base:: for the default within method.

  • base:: and base:: for the default matrix binding methods.

Author

P. Aboyoun and H. Pagès

Examples

a <- data.frame(x = 1:10, y = rnorm(10))
b <- data.frame(x = 1:10, y = rnorm(10))

endoapply(a, function(x) (x - mean(x))/sd(x))
mendoapply(function(e1, e2) (e1 - mean(e1)) * (e2 - mean(e2)), a, b)

x <- list(a=11:13, b=26:21, c=letters)
y <- list(-(5:1), c("foo", "bar"), 0.25)
pc(x, y)

library(IRanges)
x <- IntegerList(a=11:13, b=26:21, c=31:36, d=4:2)
y <- NumericList(-(5:1), 1:2, numeric(0), 0.25)
pc(x, y)

Reduce("+", x)

Filter(is.unsorted, x)

pos1 <- Position(is.unsorted, x)
stopifnot(identical(Find(is.unsorted, x), x[[pos1]]))

pos2 <- Position(is.unsorted, x, right=TRUE)
stopifnot(identical(Find(is.unsorted, x, right=TRUE), x[[pos2]]))

y <- x * 1000L
Map("c", x, y)

rbind(x, y)
cbind(x, y)

Pairs objects

Description

Pairs is a Vector that stores two parallel vectors (any object that can be a column in a DataFrame ). It provides conveniences for performing binary operations on the vectors, as well as for converting between an equivalent list representation. Virtually all of the typical R vector operations should behave as expected.

A typical use case is representing the pairing from a findOverlaps call, for which findOverlapPairs is a shortcut.

Seealso

  • Hits-class , a typical way to define a pairing.

  • findOverlapPairs in the IRanges package, which generates an instance of this class based on overlaps.

  • setops-methods in the IRanges package, for set operations on Pairs objects.

Author

Michael Lawrence

Examples

p <- Pairs(1:10, Rle(1L, 10), score=rnorm(10), names=letters[1:10])
identical(first(p), 1:10)
mcols(p)$score
p%in% p[1:5]
as.data.frame(p)
z <- zipup(p)
first(p) <- Rle(1:10)
identical(zipdown(z), p)

Rle objects

Description

The Rle class is a general container for storing an atomic vector that is stored in a run-length encoding format. It is based on the rle function from the base package.

Seealso

Rle-utils , Rle-runstat , and aggregate for more operations on Rle objects.

rle

Vector-class

Author

P. Aboyoun

Examples

x <- Rle(10:1, 1:10)
x

runLength(x)
runValue(x)
nrun(x)

diff(x)
unique(x)
sort(x)
x[c(1,3,5,7,9)]
x > 4

x2 <- Rle(LETTERS[c(21:26, 25:26)], 8:1)
table(x2)

y <- Rle(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,TRUE))
y
as.vector(y)
rep(y, 10)
c(y, x > 5)

Fixed-width running window summaries

Description

The runsum , runmean , runmed , runwtsum , runq functions calculate the sum, mean, median, weighted sum, and order statistic for fixed width running windows.

Usage

runsum(x, k, endrule = c("drop", "constant"), ...)
runmean(x, k, endrule = c("drop", "constant"), ...)
list(list("smoothEnds"), list("Rle"))(y, k = 3)
list(list("runmed"), list("Rle"))(x, k, endrule = c("median", "keep", "drop", "constant"), 
       algorithm = NULL, print.level = 0)
runwtsum(x, k, wt, endrule = c("drop", "constant"), ...)
runq(x, k, i, endrule = c("drop", "constant"), ...)

Arguments

ArgumentDescription
x,yThe data object.
kAn integer indicating the fixed width of the running window. Must be odd when endrule != "drop" .

|endrule | A character string indicating how the values at the beginning and the end (of the data) should be treated. list(" ", " ", list(list(list(""median"")), list("see ", list(list("runmed")), ";")), " ", " ", list(list(list(""keep"")), list("see ", list(list("runmed")), ";")), " ", " ", list(list(list(""drop"")), list("do not extend the running statistics to ", " be the same length as the underlying vectors;")), " ", " ", list(list(list(""constant"")), list("copies running statistic to the ", " first values and analogously for the last ones making the ", " smoothed ends ", |

list("constant"), ".")), "

", " ")
|wt | A numeric vector of length k that provides the weights to use. | |i | An integer in [0, k] indicating which order statistic to calculate. | |algorithm,print.level | See ?stats:: for a description of these arguments. | |list() | Additional arguments passed to methods. Specifically, na.rm . When na.rm = TRUE , the NA and NaN values are removed. When na.rm = FALSE , NA is returned if either NA or NaN are in the specified window. |

Details

The runsum , runmean , runmed , runwtsum , and runq functions provide efficient methods for calculating the specified numeric summary by performing the looping in compiled code.

Value

An object of the same class as x .

Seealso

runmed , Rle-class , RleList-class

Author

P. Aboyoun and V. Obenchain

Examples

x <- Rle(1:10, 1:10)
runsum(x, k = 3)
runsum(x, k = 3, endrule = "constant")
runmean(x, k = 3)
runwtsum(x, k = 3, wt = c(0.25, 0.5, 0.25))
runq(x, k = 5, i = 3, endrule = "constant")

## Missing and non-finite values
x <- Rle(c(1, 2, NA, 0, 3, Inf, 4, NaN))
runsum(x, k = 2)
runsum(x, k = 2, na.rm = TRUE)
runmean(x, k = 2, na.rm = TRUE)
runwtsum(x, k = 2, wt =  c(0.25, 0.5), na.rm = TRUE)
runq(x, k = 2, i = 2, na.rm = TRUE) ## max value in window

## The .naive_runsum() function demonstrates the semantics of
## runsum(). This test ensures the behavior is consistent with
## base::sum().

.naive_runsum <- function(x, k, na.rm=FALSE)
sapply(0:(length(x)-k),
function(offset) sum(x[1:k + offset], na.rm=na.rm))

x0 <- c(1, Inf, 3, 4, 5, NA)
x <- Rle(x0)
target1 <- .naive_runsum(x0, 3, na.rm = TRUE)
target2 <- .naive_runsum(x, 3, na.rm = TRUE)
stopifnot(target1 == target2)
current <- as.vector(runsum(x, 3, na.rm = TRUE))
stopifnot(target1 == current)

## runmean() and runwtsum() :
x <- Rle(c(2, 1, NA, 0, 1, -Inf))
runmean(x, k = 3)
runmean(x, k = 3, na.rm = TRUE)
runwtsum(x, k = 3, wt = c(0.25, 0.50, 0.25))
runwtsum(x, k = 3, wt = c(0.25, 0.50, 0.25), na.rm = TRUE)

## runq() :
runq(x, k = 3, i = 1, na.rm = TRUE) ## smallest value in window
runq(x, k = 3, i = 3, na.rm = TRUE) ## largest value in window

## When na.rm = TRUE, it is possible the number of non-NA
## values in the window will be less than the 'i' specified.
## Here we request the 4th smallest value in the window,
## which tranlates to the value at the 4/5 (0.8) percentile.
x <- Rle(c(1, 2, 3, 4, 5))
runq(x, k=length(x), i=4, na.rm=TRUE)

## The same request on a Rle with two missing values
## finds the value at the 0.8 percentile of the vector
## at the new length of 3 after the NA's have been removed.
## This translates to round((0.8) * 3).
x <- Rle(c(1, 2, 3, NA, NA))
runq(x, k=length(x), i=4, na.rm=TRUE)

Common operations on Rle objects

Description

Common operations on Rle objects.

Seealso

Rle objects

S4groupGeneric

Author

P. Aboyoun

Examples

x <- Rle(10:1, 1:10)
x

sqrt(x)
x^2 + 2 * x + 1
range(x)
sum(x)
mean(x)

z <- c("the", "quick", "red", "fox", "jumps", "over", "the", "lazy", "brown", "dog")
z <- Rle(z, seq_len(length(z)))
chartr("a", "@", z)
toupper(z)
Link to this function

S4Vectors_internals()

S4Vectors internals

Description

Objects, classes and methods defined in the S4Vectors package that are not intended to be used directly.

Link to this function

SimpleList_class()

SimpleList objects

Description

The (non-virtual) SimpleList class extends the List virtual class.

Details

The SimpleList class is the simplest, most generic concrete implementation of the List abstraction. It provides an implementation that subclasses can easily extend.

In a SimpleList object the list elements are stored internally in an ordinary list.

Seealso

  • List objects for the parent class.

  • The CompressedList class defined in the IRanges package for a more efficient alternative to SimpleList.

  • The SimpleIntegerList class defined in the IRanges package for a SimpleList subclass example.

  • The DataFrame class for another SimpleList subclass example.

Examples

## Displaying a SimpleList object:
x1 <- SimpleList(a=letters, i=Rle(22:20, 4:2))
class(x1)

## The "Simple" prefix is removed from the real class name of the
## object:
x1

library(IRanges)
x2 <- IntegerList(11:12, integer(0), 3:-2, compress=FALSE)
class(x2)

## The "Simple" prefix is removed from the real class name of the
## object:
x2

## This is controlled by internal helper classNameForDisplay():
classNameForDisplay(x2)

Vector objects

Description

The Vector virtual class serves as the heart of the S4Vectors package and has over 90 subclasses. It serves a similar role as vector in base R.

The Vector class supports the storage of global and element-wise metadata:

  • The global metadata annotates the object as a whole: this metadata is accessed via the metadata accessor and is represented as an ordinary list;

  • The element-wise metadata annotates individual elements of the object: this metadata is accessed via the mcols accessor ( mcols stands for metadata columns ) and is represented as a DataTable object (i.e. as an instance of a concrete subclass of DataTable , e.g. a DataFrame object), with a row for each element and a column for each metadata variable. Note that the element-wise metadata can also be NULL .

To be functional, a class that inherits from Vector must define at least a length and a "[" method.

Seealso

  • Rle , Hits , IRanges and XRaw for example implementations.

  • Vector-comparison for comparing, ordering, and tabulating vector-like objects.

  • Vector-setops for set operations on vector-like objects.

  • Vector-merge for merging vector-like objects.

  • List for a direct Vector extension that serves a similar role as list in base R.

  • extractList for grouping elements of a vector-like object into a list-like object.

  • DataTable which is the type of objects returned by the mcols accessor.

  • The Annotated class, which Vector extends.

Examples

showClass("Vector")  # shows (some of) the known subclasses
Link to this function

Vector_comparison()

Compare, order, tabulate vector-like objects

Description

Generic functions and methods for comparing, ordering, and tabulating vector-like objects.

Usage

## Element-wise (aka "parallel") comparison of 2 Vector objects
## ------------------------------------------------------------
pcompare(x, y)
list(list("=="), list("Vector,Vector"))(e1, e2)
list(list("=="), list("Vector,ANY"))(e1, e2)
list(list("=="), list("ANY,Vector"))(e1, e2)
list(list("<="), list("Vector,Vector"))(e1, e2)
list(list("<="), list("Vector,ANY"))(e1, e2)
list(list("<="), list("ANY,Vector"))(e1, e2)
list(list("!="), list("Vector,Vector"))(e1, e2)
list(list("!="), list("Vector,ANY"))(e1, e2)
list(list("!="), list("ANY,Vector"))(e1, e2)
list(list(">="), list("Vector,Vector"))(e1, e2)
list(list(">="), list("Vector,ANY"))(e1, e2)
list(list(">="), list("ANY,Vector"))(e1, e2)
list(list("<"), list("Vector,Vector"))(e1, e2)
list(list("<"), list("Vector,ANY"))(e1, e2)
list(list("<"), list("ANY,Vector"))(e1, e2)
list(list(">"), list("Vector,Vector"))(e1, e2)
list(list(">"), list("Vector,ANY"))(e1, e2)
list(list(">"), list("ANY,Vector"))(e1, e2)
## selfmatch()
## -----------
selfmatch(x, ...)
## duplicated() & unique()
## -----------------------
list(list("duplicated"), list("Vector"))(x, incomparables=FALSE, ...)
list(list("unique"), list("Vector"))(x, incomparables=FALSE, ...)
## %in%
## ----
list(list("%in%"), list("Vector,Vector"))(x, table)
list(list("%in%"), list("Vector,ANY"))(x, table)
list(list("%in%"), list("ANY,Vector"))(x, table)
## findMatches() & countMatches()
## ------------------------------
findMatches(x, table, select=c("all", "first", "last"), ...)
countMatches(x, table, ...)
## rank()
## ------
list(list("rank"), list("Vector"))(x, na.last = TRUE, ties.method = c("average", 
        "first", "last", "random", "max", "min"), by)
## sort()
## ------
list(list("sort"), list("Vector"))(x, decreasing=FALSE, na.last=NA, by)
## table()
## -------
list(list("table"), list("Vector"))(...)

Arguments

ArgumentDescription
x, y, e1, e2, tableVector-like objects.
incomparablesThe duplicated method for Vector objects does NOT support this argument. The unique method for Vector objects, which is implemented on top of duplicated , propagates this argument to its call to duplicated . See ?base:: and ?base:: for more information about this argument.
selectOnly select="all" is supported at the moment. Note that you can use match if you want to do select="first" . Otherwise you're welcome to request this on the Bioconductor mailing list.
ties.methodSee ?base:: .
decreasing, na.lastSee ?base:: .
byA formula referencing the metadata columns by which to sort, e.g., ~ x + y sorts by column x , breaking ties with column y .
...A Vector object for table (the table method for Vector objects can only take one input object). Otherwise, extra arguments supported by specific methods. In particular:
  • The default selfmatch method, which is implemented on top of match , propagates the extra arguments to its call to match .

  • The duplicated method for Vector objects, which is implemented on top of selfmatch , accepts extra argument fromLast and propagates the other extra arguments to its call to selfmatch . See ?base:: for more information about this argument.

  • The unique method for Vector objects, which is implemented on top of duplicated , propagates the extra arguments to its call to duplicated .

  • The default findMatches and countMatches methods, which are implemented on top of match and selfmatch , propagate the extra arguments to their calls to match and selfmatch .

  • The sort method for Vector objects, which is implemented on top of order , only accepts extra argument na.last and propagates it to its call to order .

Details

Doing pcompare(x, y) on 2 vector-like objects x and y of length 1 must return an integer less than, equal to, or greater than zero if the single element in x is considered to be respectively less than, equal to, or greater than the single element in y . If x or y have a length != 1, then they are typically expected to have the same length so pcompare(x, y) can operate element-wise, that is, in that case it returns an integer vector of the same length as x and y where the i-th element is the result of compairing x[i] and y[i] . If x and y don't have the same length and are not zero-length vectors, then the shortest is first recycled to the length of the longest. If one of them is a zero-length vector then pcompare(x, y) returns a zero-length integer vector.

selfmatch(x, ...) is equivalent to match(x, x, ...) . This is actually how the default method is implemented. However note that selfmatch(x, ...) will typically be more efficient than match(x, x, ...) on vector-like objects for which a specific selfmatch method is implemented.

findMatches is an enhanced version of match which, by default (i.e. if select="all" ), returns all the matches in a Hits object.

countMatches returns an integer vector of the length of x containing the number of matches in table for each element in x .

Value

For pcompare : see Details section above.

For selfmatch : an integer vector of the same length as x .

For duplicated , unique , and %in% : see ?BiocGenerics:: , ?BiocGenerics:: , and ?`` . ForfindMatches: a [Hits](#hits) object by default (i.e. ifselect="all"). ForcountMatches: an integer vector of the length ofxcontaining the number of matches intablefor each element inx. Forsort: see?BiocGenerics::. Fortable: a 1D array of integer values promoted to the"table"class. See?BiocGeneric::for more information. ## Seealso * The [Vector](#vector) class. * [Hits-comparison](#hits-comparison) for comparing and ordering hits. * [Vector-setops](#vector-setops) for set operations on vector-like objects. * [Vector-merge](#vector-merge) for merging vector-like objects. * [IntegerRanges-comparison](#integerranges-comparison) in the IRanges package for comparing and ordering ranges. * [==](#==) and [%in%](#%in%) in the base package, andBiocGenerics::,BiocGenerics::,BiocGenerics::,BiocGenerics::,BiocGenerics::,BiocGenerics::in the BiocGenerics package for general information about the comparison/ordering operators and functions. * The [Hits](#hits) class. *BiocGeneric::in the BiocGenerics package. ## Note The following notes are for developers who want to implement comparing, ordering, and tabulating methods for their own [Vector](#vector) subclass: * The 6 traditional binary comparison operators are:==,!=,<=,>=,<, and>. The list("S4Vectors") package provides the following methods for these operators: list(" ", "setMethod("==", c("Vector", "Vector"), ", " function(e1, e2) { pcompare(e1, e2) == 0L } ", ") ", "setMethod("<=", c("Vector", "Vector"), ", " function(e1, e2) { pcompare(e1, e2) <= 0L } ", ") ", "setMethod("!=", c("Vector", "Vector"), ", " function(e1, e2) { !(e1 == e2) } ", ") ", "setMethod(">=", c("Vector", "Vector"), ", " function(e1, e2) { e2 <= e1 } ", ") ", "setMethod("<", c("Vector", "Vector"), ", " function(e1, e2) { !(e2 <= e1) } ", ") ", "setMethod(">", c("Vector", "Vector"), ", " function(e1, e2) { !(e1 <= e2) } ", ") ", " ") With these definitions, the 6 binary operators work out-of-the-box on [Vector](#vector) objects for whichpcompareworks the expected way. Ifpcompareis not implemented, then it's enough to implement==and<=methods to have the 4 remaining operators (!=,>=,<, and>) work out-of-the-box. * The list("S4Vectors") package provides nopcomparemethod for [Vector](#vector) objects. Specificpcomparemethods need to be implemented for specific [Vector](#vector) subclasses (e.g. for [Hits](#hits) and [IntegerRanges](#integerranges) objects). These specific methods must obey the rules described in the Details section above. * Theduplicated,unique, and%in%methods for [Vector](#vector) objects are implemented on top ofselfmatch,duplicated, andmatch, respectively, so they work out-of-the-box on [Vector](#vector) objects for whichselfmatch,duplicated, andmatchwork the expected way. * Also the defaultfindMatchesandcountMatchesmethods are implemented on top ofmatchandselfmatchso they work out-of-the-box on [Vector](#vector) objects for which those things work the expected way. * However, sinceselfmatchitself is also implemented on top ofmatch, then havingmatchwork the expected way is actually enough to getselfmatch,duplicated,unique,%in%,findMatches, andcountMatcheswork out-of-the-box on [Vector](#vector) objects. * Thesortmethod for [Vector](#vector) objects is implemented on top oforder, so it works out-of-the-box on [Vector](#vector) objects for whichorderworks the expected way. * Thetablemethod for [Vector](#vector) objects is implemented on top ofselfmatch,order, andas.character, so it works out-of-the-box on a [Vector](#vector) object for which those things work the expected way. * The list("S4Vectors") package provides nomatchorordermethods for [Vector](#vector) objects. Specific methods need to be implemented for specific [Vector](#vector) subclasses (e.g. for [Hits](#hits) and [IntegerRanges](#integerranges) objects). ## Author Hervé Pagès ## Examples ```r ## --------------------------------------------------------------------- ## A. SIMPLE EXAMPLES ## --------------------------------------------------------------------- y <- c(16L, -3L, -2L, 15L, 15L, 0L, 8L, 15L, -2L) selfmatch(y) x <- c(unique(y), 999L) findMatches(x, y) countMatches(x, y) ## See ?IntegerRanges-comparison` for more examples (on IntegerRanges ## objects). You might need to load the IRanges package first. ## --------------------------------------------------------------------- ## B. FOR DEVELOPERS: HOW TO IMPLEMENT THE BINARY COMPARISON OPERATORS ## FOR YOUR Vector SUBCLASS ## --------------------------------------------------------------------- ## The answer is: don't implement them. Just implement pcompare() and the ## binary comparison operators will work out-of-the-box. Here is an ## example: ## (1) Implement a simple Vector subclass. setClass("Raw", contains="Vector", representation(data="raw")) setMethod("length", "Raw", function(x) length(x@data)) setMethod("[", "Raw", function(x, i, j, ..., drop) { x@data <- x@data[i]; x } ) x <- new("Raw", data=charToRaw("AB.x0a-BAA+C")) stopifnot(identical(length(x), 12L)) stopifnot(identical(x[7:3], new("Raw", data=charToRaw("-a0x.")))) ## (2) Implement a "pcompare" method for Raw objects. setMethod("pcompare", c("Raw", "Raw"), function(x, y) {as.integer(x@data) - as.integer(y@data)} ) stopifnot(identical(which(x == x[1]), c(1L, 9L, 10L))) stopifnot(identical(x[x < x[5]], new("Raw", data=charToRaw(".-+")))) ```

Merge vector-like objects

Description

A merge method for vector-like objects.

Usage

list(list("merge"), list("Vector,Vector"))(x, y, ..., all=FALSE, all.x=NA, all.y=NA, sort=TRUE)

Arguments

ArgumentDescription
x, y, ...Vector-like objects, typically all of the same class and typically not list-like objects (even though some list-like objects like IntegerRanges and DNAStringSet are supported). Duplicated elements in each object are removed with a warning.
allTRUE or FALSE . Whether the vector elements in the result should be the union (when all=TRUE ) or intersection (when all=FALSE ) of the vector elements in x , y , ... .
all.x, all.yTo be used only when merging 2 objects (binary merge). Both all.x and all.y must be single logicals. If any of them is NA , then it's set to the value of all . Setting both of them to TRUE or both of them to FALSE is equivalent to setting all to TRUE or to FALSE , respectively (see above). If all.x is TRUE and all.y is FALSE then the vector elements in the result will be the unique elements in x . If all.x is FALSE and all.y is TRUE then the vector elements in the result will be the unique elements in y .
sortWhether to sort the merged result.

Details

This merge method acts much like merge.data.frame , except for 3 important differences:

  • The matching is based on the vector values, not arbitrary columns in a table.

  • Self merging is a no-op if sort=FALSE (or object already sorted) and if the object has no duplicates.

  • This merge method accepts an arbitrary number of vector-like objects (n-ary merge).

If some of the objects to merge are list-like objects not supported by the method described here, then the merging is simply done by calling base::merge() on the objects. This might succeed or not...

Value

A vector-like object of the same class as the input objects (if they all have the same class) containing the merged vector values and metadata columns.

Seealso

Examples

library(GenomicRanges)
x <- GRanges(c("chr1:1-1000", "chr2:2000-3000"),
score=c(0.45, 0.1), a1=c(5L, 7L), a2=c(6, 8))
y <- GRanges(c("chr2:150-151", "chr1:1-10", "chr2:2000-3000"),
score=c(0.7, 0.82, 0.1), b1=c(0L, 5L, 1L), b2=c(1, -2, 1))
merge(x, y)
merge(x, y, all=TRUE)
merge(x, y, all.x=TRUE)
merge(x, y, all.y=TRUE)

## Shared metadata columns must agree:
mcols(x)$score[2] <- 0.11
#merge(x, y)  # error!

## NAs agree with anything:
mcols(x)$score[2] <- NA
merge(x, y)
Link to this function

Vector_setops()

Set operations on vector-like objects

Description

Perform set operations on Vector objects.

Usage

list(list("union"), list("Vector,Vector"))(x, y)
list(list("intersect"), list("Vector,Vector"))(x, y)
list(list("setdiff"), list("Vector,Vector"))(x, y)
list(list("setequal"), list("Vector,Vector"))(x, y)

Arguments

ArgumentDescription
x, yVector-like objects.

Details

The union , intersect , and setdiff methods for Vector objects return a Vector object containing respectively the union, intersection, and (asymmetric!) difference of the 2 sets of vector elements in x and y . The setequal method for Vector objects checks for list("set ", " equality") between x and y .

They're defined as follow: list(" setMethod("union", c("Vector", "Vector"), ", " function(x, y) unique(c(x, y)) ", " ) ", " setMethod("intersect", c("Vector", "Vector"), ", " function(x, y) unique(x[x %in% y]) ", " ) ", " setMethod("setdiff", c("Vector", "Vector"), ", " function(x, y) unique(x[!(x %in% y)]) ", " ) ", " setMethod("setequal", c("Vector", "Vector"), ", " function(x, y) all(x %in% y) && all(y %in% x) ", " ) ", " ") so they work out-of-the-box on Vector objects for which c , unique , and %in% are defined.

Value

union returns a Vector object obtained by appending to x the elements in y that are not already in x .

intersect returns a Vector object obtained by keeping only the elements in x that are also in y .

setdiff returns a Vector object obtained by dropping from x the elements that are in y .

setequal returns TRUE if x and y contain the same sets of vector elements and FALSE otherwise.

union , intersect , and setdiff propagate the names and metadata columns of their first argument ( x ).

Seealso

  • Vector-comparison for comparing and ordering vector-like objects.

  • Vector-merge for merging vector-like objects.

  • Vector objects.

  • BiocGenerics:: , BiocGenerics:: , and BiocGenerics:: in the BiocGenerics package for general information about these generic functions.

Author

Hervé Pagès

Examples

## See ?`Hits-setops` for some examples.
Link to this function

aggregate_methods()

Compute summary statistics of subsets of vector-like objects

Description

The S4Vectors package defines aggregate methods for Vector , Rle , and List objects.

Usage

list(list("aggregate"), list("Vector"))(x, by, FUN, start=NULL, end=NULL, width=NULL,
          frequency=NULL, delta=NULL, ..., simplify=TRUE)
list(list("aggregate"), list("Rle"))(x, by, FUN, start=NULL, end=NULL, width=NULL,
          frequency=NULL, delta=NULL, ..., simplify=TRUE)
list(list("aggregate"), list("List"))(x, by, FUN, start=NULL, end=NULL, width=NULL,
          frequency=NULL, delta=NULL, ..., simplify=TRUE)

Arguments

ArgumentDescription
xA Vector , Rle , or List object.
byAn object with start , end , and width methods. If x is a List object, the by parameter can be a IntegerRangesList object to aggregate within the list elements rather than across them. When by is a IntegerRangesList object, the output is either a SimpleAtomicList object, if possible, or a SimpleList object, if not.
FUNThe function, found via match.fun , to be applied to each subset of x .
start, end, widthThe start, end, and width of the subsets. If by is missing, then two of the three must be supplied and have the same length.
frequency, deltaOptional arguments that specify the sampling frequency and increment within the subsets (in the same fashion as window from the stats package does).
...Optional arguments to FUN .
simplifyA logical value specifying whether the result should be simplified to a vector or matrix if possible.

Details

Subsets of x can be specified either via the by argument or via the start , end , width , frequency , and delta arguments.

For example, if start and end are specified, then: list(" ", " aggregate(x, FUN=FUN, start=start, end=end, ..., simplify=simplify) ") is equivalent to: list(" ", " sapply(seq_along(start), ", " function(i) FUN(x[start[i]:end[i]], ...), simplify=simplify) ") (replace x[start[i]:end[i]] with 2D-style subsetting x[start[i]:end[i], ] if x is a DataFrame object).

Seealso

Examples

x <- Rle(10:2, 1:9)
aggregate(x, x > 4, mean)
aggregate(x, FUN=mean, start=1:26, width=20)

## Note that aggregate() works on a DataFrame object the same way it
## works on an ordinary data frame:
aggregate(DataFrame(state.x77), list(Region=state.region), mean)
aggregate(weight ~ feed, data=DataFrame(chickwts), mean)

library(IRanges)
by <- IRanges(start=1:26, width=20, names=LETTERS)
aggregate(x, by, is.unsorted)
Link to this function

character_utils()

Some utility functions to operate on strings

Description

Some low-level string utilities to operate on ordinary character vectors. For more advanced string manipulations, see the Biostrings package.

Usage

unstrsplit(x, sep="")
## more to come...

Arguments

ArgumentDescription
xA list-like object where each list element is a character vector, or a character vector (identity).
sepA single string containing the separator.

Details

unstrsplit(x, sep) is equivalent to (but much faster than) sapply(x, paste0, collapse=sep) . It performs the reverse transformation of strsplit , that is, if x is a character vector with no NAs and sep a single string, then unstrsplit(strsplit(x, split=sep, fixed=TRUE), sep) is identical to x . A notable exception to this though is when strsplit finds a match at the end of a string, in which case the last element of the output (which should normally be an empty string) is not returned (see ?strsplit for the details).

Value

A character vector with one string per list element in x .

Seealso

  • The strsplit function in the base package.

Author

Hervé Pagès

Examples

x <- list(A=c("abc", "XY"), B=NULL, C=letters[1:4])
unstrsplit(x)
unstrsplit(x, sep=",")
unstrsplit(x, sep=" => ")

data(islands)
x <- names(islands)
y <- strsplit(x, split=" ", fixed=TRUE)
x2 <- unstrsplit(y, sep=" ")
stopifnot(identical(x, x2))

## But...
names(x) <- x
y <- strsplit(x, split="in", fixed=TRUE)
x2 <- unstrsplit(y, sep="in")
y[x != x2]
## In other words: strsplit() behavior sucks :-/
Link to this function

expand_methods()

Unlist the list-like columns of a DataFrame object

Description

expand transforms a DataFrame object into a new DataFrame object where the columns specified by the user are unlisted. The transformed DataFrame object has the same colnames as the original but typically more rows.

Usage

list(list("expand"), list("DataFrame"))(x, colnames, keepEmptyRows = FALSE, recursive = TRUE)

Arguments

ArgumentDescription
xA DataFrame object with list-like columns or a Vector object with list-like metadata columns (i.e. with list-like columns in mcols(x) ).
colnamesA character or numeric vector containing the names or indices of the list-like columns to unlist. The order in which columns are unlisted is controlled by the column order in this vector. This defaults to all of the recursive (list-like) columns in x .
keepEmptyRowsA logical indicating if rows containing empty list elements in the specified colnames should be retained or dropped. When TRUE , list elements are replaced with NA and all rows are kept. When FALSE , rows with empty list elements in the colnames columns are dropped.
recursiveIf TRUE , expand each column recursively, with the result representing their cartesian product. If FALSE , expand all of the columns in parallel, which requires that they all share the same skeleton.

Value

A DataFrame object that has been expanded row-wise to match the length of the unlisted columns.

Seealso

Examples

library(IRanges)
aa <- CharacterList("a", paste0("d", 1:2), paste0("b", 1:3), c(), "c")
bb <- CharacterList(paste0("sna", 1:2),"foo", paste0("bar",1:3),c(),"hica")
df <- DataFrame(aa=aa, bb=bb, cc=11:15)

## Expand by all list-like columns (aa, bb), dropping rows with empty
## list elements:
expand(df)

## Expand the aa column only:
expand(df, colnames="aa", keepEmptyRows=TRUE)
expand(df, colnames="aa", keepEmptyRows=FALSE)

## Expand the aa and then the bb column:
expand(df, colnames=c("aa","bb"), keepEmptyRows=TRUE)
expand(df, colnames=c("aa","bb"), keepEmptyRows=FALSE)

## Expand the aa and dd column in parallel:
df$dd <- relist(seq_along(unlist(aa)), aa)
expand(df, colnames=c("aa","dd"), recursive=FALSE)
Link to this function

integer_utils()

Some utility functions to operate on integer vectors

Description

Some low-level utility functions to operate on ordinary integer vectors.

Usage

isSequence(x, of.length=length(x))
toListOfIntegerVectors(x, sep=",")
## more to come...

Arguments

ArgumentDescription
xFor isSequence() : An integer vector. For toListOfIntegerVectors() : A character vector where each element is a string containing comma-separated integers in decimal representation. Alternatively x can be a list of raw vectors, in which case it's treated like if it was sapply(x, rawToChar) .
of.lengthThe expected length of the integer sequence.
sepThe separator represented as a single-letter string.

Details

isSequence() returns TRUE or FALSE depending on whether x is identical to seq_len(of.length) or not.

toListOfIntegerVectors() is a fast and memory-efficient implementation of

 lapply(strsplit(x, sep, fixed=TRUE), as.integer) 

but, unlike the above code, it will raise an error if the input contains NAs or strings that don't represent integer values.

Value

A list parallel to x where each list element is an integer vector.

Seealso

  • The seq_len function in the base package.

  • The strsplit function in the base package.

Author

Hervé Pagès

Examples

## ---------------------------------------------------------------------
## isSequence()
## ---------------------------------------------------------------------
isSequence(1:5)               # TRUE
isSequence(5:1)               # FALSE
isSequence(0:5)               # FALSE
isSequence(integer(0))        # TRUE
isSequence(1:5, of.length=5)  # TRUE (the expected length)
isSequence(1:5, of.length=6)  # FALSE (not the expected length)

## ---------------------------------------------------------------------
## toListOfIntegerVectors()
## ---------------------------------------------------------------------

x <- c("1116,0,-19",
" +55291 , 2476,",
"19184,4269,5659,6470,6721,7469,14601",
"7778889, 426900, -4833,5659,6470,6721,7096",
"19184 , -99999")

y <- toListOfIntegerVectors(x)
y

## When it doesn't choke on an NA or string that doesn't represent
## an integer value, toListOfIntegerVectors() is equivalent to
## the function below but is faster and more memory-efficient:
toListOfIntegerVectors2 <- function(x, sep=",")
{
lapply(strsplit(x, sep, fixed=TRUE), as.integer)
}
y2 <- toListOfIntegerVectors2(x)
stopifnot(identical(y, y2))

Test if a vector-like object is sorted

Description

isSorted and isStrictlySorted test if a vector-like object is sorted or strictly sorted, respectively.

isConstant tests if a vector-like or array-like object is constant. Currently only isConstant methods for vectors or arrays of type integer or double are implemented.

Usage

isSorted(x)
isStrictlySorted(x)
isConstant(x)

Arguments

ArgumentDescription
xA vector-like object. Can also be an array-like object for isConstant .

Details

Vector-like objects of length 0 or 1 are always considered to be sorted, strictly sorted, and constant.

Strictly sorted and constant objects are particular cases of sorted objects.

isStrictlySorted(x) is equivalent to isSorted(x) && !anyDuplicated(x)

Value

A single logical i.e. TRUE , FALSE or NA .

Seealso

Author

Hervé Pagès

Examples

## ---------------------------------------------------------------------
## A. isSorted() and isStrictlySorted()
## ---------------------------------------------------------------------

x <- 1:10

isSorted(x)           # TRUE
isSorted(-x)          # FALSE
isSorted(rev(x))      # FALSE
isSorted(-rev(x))     # TRUE

isStrictlySorted(x)   # TRUE

x2 <- rep(x, each=2)
isSorted(x2)          # TRUE
isStrictlySorted(x2)  # FALSE

## ---------------------------------------------------------------------
## B. "isConstant" METHOD FOR integer VECTORS
## ---------------------------------------------------------------------

## On a vector with no NAs:
stopifnot(isConstant(rep(-29L, 10000)))

## On a vector with NAs:
stopifnot(!isConstant(c(0L, NA, -29L)))
stopifnot(is.na(isConstant(c(-29L, -29L, NA))))

## On a vector of length <= 1:
stopifnot(isConstant(NA_integer_))


## ---------------------------------------------------------------------
## C. "isConstant" METHOD FOR numeric VECTORS
## ---------------------------------------------------------------------
## This method does its best to handle rounding errors and special
## values NA, NaN, Inf and -Inf in a way that "makes sense".
## Below we only illustrate handling of rounding errors.

## Here values in 'x' are "conceptually" the same:
x <- c(11/3,
2/3 + 4/3 + 5/3,
50 + 11/3 - 50,
7.00001 - 1000003/300000)
## However, due to machine rounding errors, they are not *strictly*
## equal:
duplicated(x)
unique(x)
## only *nearly* equal:
all.equal(x, rep(11/3, 4))  # TRUE

## 'isConstant(x)' uses 'all.equal()' internally to decide whether
## the values in 'x' are all the same or not:
stopifnot(isConstant(x))

## This is not perfect though:
isConstant((x - 11/3) * 1e8)  # FALSE on Intel Pentium paltforms
# (but this is highly machine dependent!)
Link to this function

shiftApply_methods()

Apply a function over subsequences of 2 vector-like objects

Description

shiftApply loops and applies a function overs subsequences of vector-like objects X and Y .

Usage

shiftApply(SHIFT, X, Y, FUN, ..., OFFSET=0L, simplify=TRUE, verbose=FALSE)

Arguments

ArgumentDescription
SHIFTA non-negative integer vector of shift values.
X, YThe vector-like objects to shift.
FUNThe function, found via match.fun , to be applied to each set of shifted vectors.
...Further arguments for FUN .
OFFSETA non-negative integer offset to maintain throughout the shift operations.
simplifyA logical value specifying whether or not the result should be simplified to a vector or matrix if possible.
verboseA logical value specifying whether or not to print the i indices to track the iterations.

Details

Let i be the indices in SHIFT , X_i = window(X, 1 + OFFSET, length(X) - SHIFT[i]) , and Y_i = window(Y, 1 + SHIFT[i], length(Y) - OFFSET) . shiftApply calculates the set of FUN(X_i, Y_i, ...) values and returns the results in a convenient form.

Seealso

Examples

set.seed(0)
lambda <- c(rep(0.001, 4500), seq(0.001, 10, length = 500),
seq(10, 0.001, length = 500))
xRle <- Rle(rpois(1e7, lambda))
yRle <- Rle(rpois(1e7, lambda[c(251:length(lambda), 1:250)]))

cor(xRle, yRle)
shifts <- seq(235, 265, by=3)
corrs <- shiftApply(shifts, yRle, xRle, FUN=cor)

cor(xRle, yRle)
shiftApply(249:251, yRle, xRle,
FUN=function(x, y) var(x, y) / (sd(x) * sd(y)))

Display utilities

Description

Low-level utility functions and classes defined in the S4Vectors package to support display of vector-like objects. They are not intended to be used directly.

Link to this function

split_methods()

Divide a vector-like object into groups

Description

split divides the data in a vector-like object x into the groups defined by f .

NOTE: This man page is for the split methods defined in the S4Vectors package. See ?base:: for the default method (defined in the base package).

Usage

list(list("split"), list("Vector,ANY"))(x, f, drop=FALSE)
list(list("split"), list("ANY,Vector"))(x, f, drop=FALSE)
list(list("split"), list("Vector,Vector"))(x, f, drop=FALSE)
list(list("split"), list("list,Vector"))(x, f, drop=FALSE, ...)

Arguments

ArgumentDescription
x, f2 vector-like objects of the same length. f will typically be a factor, but not necessarily.
dropLogical indicating if levels that do not occur should be dropped (if f is a factor).
...Arguments passed to base:: (see Details below).

Details

The first 3 methods just delegate to the IRanges:: function defined in the IRanges package.

The method for list does:

 split(x, as.vector(f), drop=drop, ...)

Value

All these methods behave like base:: except that the first 3 methods return a List object instead of an ordinary list.

Seealso

Examples

## On an Rle object:
x <- Rle(101:105, 6:2)
split(x, c("B", "B", "A", "B", "A"))

## On a DataFrame object:
groups <- c("group1", "group2")
DF <- DataFrame(
a=letters[1:10],
i=101:110,
group=rep(factor(groups, levels=groups), c(3, 7))
)
split(DF, DF$group)
Link to this function

subsetting_utils()

Subsetting utilities

Description

Low-level utility functions and classes defined in the S4Vectors package to support subsetting of vector-like objects. They are not intended to be used directly.

Convert between parallel vectors and lists

Description

The zipup and zipdown functions convert between two parallel vectors and a list of doublets (elements of length 2). The metaphor, borrowed from Python's zip , is that of a zipper. The zipup function interleaves the elements of the parallel vectors into a list of doublets. The inverse operation is zipdown , which returns a Pairs object.

Usage

zipup(x, y, ...)
zipdown(x, ...)

Arguments

ArgumentDescription
x,yFor zipup , any vector-like object. For zipdown , a doublet list.
list()Arguments passed to methods.

Value

For zipup , a list-like object, where every element is of length 2. For zipdown , a Pairs object.

Seealso

  • Pairs objects.

Examples

z <- zipup(1:10, Rle(1L, 10))
pairs <- zipdown(z)