bioconductor v3.9.0 IRanges

The lists of atomic vectors are LogicalList , IntegerList , NumericList , ComplexList , CharacterList , and RawList . There is also an RleList class for run-length encoded versions of these atomic vector types.

Each of the above mentioned classes is virtual with Compressed and Simple non-virtual representations.

Author

P. Aboyoun

Examples

int1 <- c(1L,2L,3L,5L,2L,8L)
int2 <- c(15L,45L,20L,1L,15L,100L,80L,5L)
collection <- IntegerList(int1, int2)

## names
names(collection) <- c("one", "two")
names(collection)
names(collection) <- NULL # clear names
names(collection)
names(collection) <- "one"
names(collection) # c("one", NA)

## extraction
collection[[1]] # range1
collection[["1"]] # NULL, does not exist
collection[["one"]] # range1
collection[[NA_integer_]] # NULL

## subsetting
collection[numeric()] # empty
collection[NULL] # empty
collection[] # identity
collection[c(TRUE, FALSE)] # first element
collection[2] # second element
collection[c(2,1)] # reversed
collection[-1] # drop first
collection$one

## replacement
collection$one <- int2
collection[[2]] <- int1

## concatenating
col1 <- IntegerList(one = int1, int2)
col2 <- IntegerList(two = int2, one = int1)
col3 <- IntegerList(int2)
append(col1, col2)
append(col1, col2, 0)
col123 <- c(col1, col2, col3)
col123

## revElements
revElements(col123)
revElements(col123, 4:5)

AtomicList_utils()

Common operations on AtomicList objects

Description

Common operations on AtomicList objects.

Author

P. Aboyoun

Examples

## group generics
int1 <- c(1L,2L,3L,5L,2L,8L)
int2 <- c(15L,45L,20L,1L,15L,100L,80L,5L)
col1 <- IntegerList(one = int1, int2)
2 * col1
col1 + col1
col1 > 2
sum(col1)  # equivalent to (but faster than) 'sapply(col1, sum)'
mean(col1)  # equivalent to 'sapply(col1, mean)'

A "range of integer values" is a finite set of consecutive integer values. Each range can be fully described with exactly 2 integer values which can be arbitrarily picked up among the 3 following values: its "start" i.e. its smallest (or first, or leftmost) value; its "end" i.e. its greatest (or last, or rightmost) value; and its "width" i.e. the number of integer values in the range. For example the set of integer values that are greater than or equal to -20 and less than or equal to 400 is the range that starts at -20 and has a width of 421. In other words, a range is a closed, one-dimensional interval with integer end points and on the domain of integers.

The starting point (or "start") of a range can be any integer (see start below) but its "width" must be a non-negative integer (see width below). The ending point (or "end") of a range is equal to its "start" plus its "width" minus one (see end below). An "empty" range is a range that contains no value i.e. a range that has a null width. Depending on the context, it can be interpreted either as just the empty set of integers or, more precisely, as the position between its "end" and its "start" (note that for an empty range, the "end" equals the "start" minus one).

The length of an IPosRanges object is the number of ranges in it, not the number of integer values in its ranges.

An IPosRanges object is considered empty iff all its ranges are empty.

IPosRanges objects have a vector-like semantic i.e. they only support single subscript subsetting (unlike, for example, standard R data frames which can be subsetted by row and by column).

The IPosRanges class itself is a virtual class. The following classes derive directly from it: IRanges , IPos , NCList , and GroupingRanges .

IRanges objects ( NormalIRanges objects are documented in the same man page).
The IPos class, a memory-efficient IPosRanges derivative for representing integer positions (i.e. integer ranges of width 1).
IPosRanges-comparison for comparing and ordering ranges.
findOverlaps-methods for finding/counting overlapping ranges.
intra-range-methods and inter-range-methods for intra range and inter range transformations of IntegerRanges derivatives.
coverage-methods for computing the coverage of a set of ranges.
setops-methods for set operations on ranges.
nearest-methods for finding the nearest range neighbor.

Author

H. Pagès and M. Lawrence

Examples

## ---------------------------------------------------------------------
## Basic manipulation
## ---------------------------------------------------------------------
x <- IRanges(start=c(2:-1, 13:15), width=c(0:3, 2:0))
x
length(x)
start(x)
width(x)
end(x)
isEmpty(x)
as.matrix(x)
as.data.frame(x)

## Subsetting:
x[4:2]                  # 3 ranges
x[-1]                   # 6 ranges
x[FALSE]                # 0 range
x0 <- x[width(x) == 0]  # 2 ranges
isEmpty(x0)

## Use the replacement methods to resize the ranges:
width(x) <- width(x) * 2 + 1
x
end(x) <- start(x)            # equivalent to width(x) <- 0
x
width(x) <- c(2, 0, 4)
x
start(x)[3] <- end(x)[3] - 2  # resize the 3rd range
x

## Name the elements:
names(x)
names(x) <- c("range1", "range2")
x
x[is.na(names(x))]  # 5 ranges
x[!is.na(names(x))]  # 2 ranges

ir <- IRanges(c(1,5), c(3,10))
ir*1 # no change
ir*c(1,2) # zoom second range by 2X
ir*-2 # zoom out 2X

IPosRanges_comparison()

Comparing and ordering ranges

Description

Methods for comparing and/or ordering the ranges in IPosRanges derivatives (e.g. IRanges , IPos , or NCList objects).

Usage

## match() & selfmatch()
## ---------------------
list(list("match"), list("IPosRanges,IPosRanges"))(x, table, nomatch=NA_integer_, incomparables=NULL,
      method=c("auto", "quick", "hash"))
list(list("selfmatch"), list("IPosRanges"))(x, method=c("auto", "quick", "hash"))
## order() and related methods
## ----------------------------
list(list("is.unsorted"), list("IPosRanges"))(x, na.rm=FALSE, strictly=FALSE)
list(list("order"), list("IPosRanges"))(..., na.last=TRUE, decreasing=FALSE,
           method=c("auto", "shell", "radix"))
## Generalized parallel comparison of 2 IPosRanges derivatives
## -----------------------------------------------------------
list(list("pcompare"), list("IPosRanges,IPosRanges"))(x, y)
rangeComparisonCodeToLetter(code)

Arguments

Argument	Description
`x, table, y`	IPosRanges derivatives e.g. IRanges , IPos , or NCList objects.
`nomatch`	The value to be returned in the case when no match is found. It is coerced to an `integer` .
`incomparables`	Not supported.
`method`	For `match` and `selfmatch` : Use a Quicksort-based ( `method="quick"` ) or a hash-based ( `method="hash"` ) algorithm. The latter tends to give better performance, except maybe for some pathological input that we've not encountered so far. When `method="auto"` is specified, the most efficient algorithm will be used, that is, the hash-based algorithm if `length(x) <= 2^29` , otherwise the Quicksort-based algorithm. For `order` : The `method` argument is ignored.
`na.rm`	Ignored.
`strictly`	Logical indicating if the check should be for strictly increasing values.
`...`	One or more IPosRanges derivatives. The 2nd and following objects are used to break ties.
`na.last`	Ignored.
`decreasing`	`TRUE` or `FALSE` .
`code`	A vector of codes as returned by `pcompare` .

Details

Two ranges of an IPosRanges derivative are considered equal iff they share the same start and width. duplicated() and unique() on an IPosRanges derivative are conforming to this.

Note that with this definition, 2 empty ranges are generally not equal (they need to share the same start to be considered equal). This means that, when it comes to comparing ranges, an empty range is interpreted as a position between its end and start. For example, a typical usecase is comparison of insertion points defined along a string (like a DNA sequence) and represented as empty ranges.

The "natural order" for the elements of an IPosRanges derivative is to order them (a) first by start and (b) then by width. This way, the space of integer ranges is totally ordered.

pcompare() , == , != , <= , >= , < and > on IPosRanges derivatives behave accordingly to this "natural order".

is.unsorted() , order() , sort() , rank() on IPosRanges derivatives also behave accordingly to this "natural order".

Finally, note that some list("inter range transformations") like reduce or disjoin also use this "natural order" implicitly when operating on IPosRanges derivatives.

list(" ", " ", list(list(), list(" ", " ", list("pcompare(x, y)"), ": ", " Performs element-wise (aka "parallel") comparison of 2 ", " ", list("IPosRanges"), " objects of ", list("x"), " and ", list("y"), ", that is, ", " returns an integer vector where the i-th element is a code describing ", " how ", list("x[i]"), " is qualitatively positioned with respect to ", list("y[i]"), ". ", " ", " Here is a summary of the 13 predefined codes (and their letter ",

"      equivalents) and their meanings:

", " ", list(" ", " -6 a: x[i]: .oooo....... 6 m: x[i]: .......oooo. ", " y[i]: .......oooo. y[i]: .oooo....... ", " ", " -5 b: x[i]: ..oooo...... 5 l: x[i]: ......oooo.. ", " y[i]: ......oooo.. y[i]: ..oooo...... ", " ", " -4 c: x[i]: ...oooo..... 4 k: x[i]: .....oooo... ", " y[i]: .....oooo... y[i]: ...oooo..... ", " ", " -3 d: x[i]: ...oooooo... 3 j: x[i]: .....oooo... ",

    "            y[i]: .....oooo...              y[i]: ...oooooo...

", " ", " -2 e: x[i]: ..oooooooo.. 2 i: x[i]: ....oooo.... ", " y[i]: ....oooo.... y[i]: ..oooooooo.. ", " ", " -1 f: x[i]: ...oooo..... 1 h: x[i]: ...oooooo... ", " y[i]: ...oooooo... y[i]: ...oooo..... ", " ", " 0 g: x[i]: ...oooooo... ", " y[i]: ...oooooo... ", " "), " ", " ", " Note that this way of comparing ranges is a refinement over the ",

"      standard ranges comparison defined by the ", list("=="), ", ", list("!="), ",

", " ", list("<="), ", ", list(">="), ", ", list("<"), " and ", list(">"), " operators. In particular ", " a code that is ", list("< 0"), ", ", list("= 0"), ", or ", list("> 0"), ", corresponds to ", " ", list("x[i] < y[i]"), ", ", list("x[i] == y[i]"), ", or ", list("x[i] > y[i]"), ", ", " respectively. ", " ", " The ", list("pcompare"), " method for ", list("IPosRanges"), " derivatives is ",

"      guaranteed to return predefined codes only but methods for other

", " objects (e.g. for ", list("GenomicRanges"), " objects) can ", " return non-predefined codes. Like for the predefined codes, the sign ", " of any non-predefined code must tell whether ", list("x[i]"), " is less than, ", " or greater than ", list("y[i]"), ". ", " ")), " ", " ", list(list(), list(" ", " ", list("rangeComparisonCodeToLetter(x)"), ": ", " Translate the codes returned by ",

list("pcompare"), ". The 13 predefined

", " codes are translated as follow: -6 -> a; -5 -> b; -4 -> c; -3 -> d; ", " -2 -> e; -1 -> f; 0 -> g; 1 -> h; 2 -> i; 3 -> j; 4 -> k; 5-> l; 6 -> m. ", " Any non-predefined code is translated to X. ", " The translated codes are returned in a factor with 14 levels: ", " a, b, ..., l, m, X. ", " ")), " ", " ", list(list(), list(" ", " ", list("match(x, table, nomatch=NAinteger, method=c("auto", "quick", "hash"))"),

":

", " Returns an integer vector of the length of ", list("x"), ", ", " containing the index of the first matching range in ", list("table"), " ", " (or ", list("nomatch"), " if there is no matching range) for each range ", " in ", list("x"), ". ", " ")), " ", " ", list(list(), list(" ", " ", list("selfmatch(x, method=c("auto", "quick", "hash"))"), ": ", " Equivalent to, but more efficient than, ", " ", list("match(x, x, method=method)"),

".

", " ")), " ", " ", list(list(), list(" ", " ", list("duplicated(x, fromLast=FALSE, method=c("auto", "quick", "hash"))"), ": ", " Determines which elements of ", list("x"), " are equal to elements ", " with smaller subscripts, and returns a logical vector indicating ", " which elements are duplicates. ", list("duplicated(x)"), " is equivalent to, ", " but more efficient than, ", list("duplicated(as.data.frame(x))"), " on an ", " ", list("IPosRanges"),

" derivative.

", " See ", list(list("duplicated")), " in the ", list("base"), " package for more ", " details. ", " ")), " ", " ", list(list(), list(" ", " ", list("unique(x, fromLast=FALSE, method=c("auto", "quick", "hash"))"), ": ", " Removes duplicate ranges from ", list("x"), ". ", list("unique(x)"), " is equivalent ", " to, but more efficient than, ", list("unique(as.data.frame(x))"), " on an ", " ", list("IPosRanges"), " derivative. ",

"      See ", list(list("unique")), " in the ", list("base"), " package for more

", " details. ", " ")), " ", " ", list(list(), list(" ", " ", list("x %in% table"), ": ", " A shortcut for finding the ranges in ", list("x"), " that match any of ", " the ranges in ", list("table"), ". Returns a logical vector of length ", " equal to the number of ranges in ", list("x"), ". ", " ")), " ", " ", list(list(), list(" ", " ", list("findMatches(x, table, method=c("auto", "quick", "hash"))"),

":

", " An enhanced version of ", list("match"), " that returns all the matches ", " in a ", list("Hits"), " object. ", " ")), " ", " ", list(list(), list(" ", " ", list("countMatches(x, table, method=c("auto", "quick", "hash"))"), ": ", " Returns an integer vector of the length of ", list("x"), " containing the ", " number of matches in ", list("table"), " for each element in ", list("x"), ". ", " ")), " ", " ", list(list(), list(" ", " ",

list("order(...)"), ":

", " Returns a permutation which rearranges its first argument (an ", " ", list("IPosRanges"), " derivative) into ascending order, breaking ties ", " by further arguments (also ", list("IPosRanges"), " derivatives). ", " ")), " ", " ", list(list(), list(" ", " ", list("sort(x)"), ": ", " Sorts ", list("x"), ". ", " See ", list(list("sort")), " in the ", list("base"), " package for more details. ", " ")), " ", " ", list(

list(), list("

", " ", list("rank(x, na.last=TRUE, ties.method=c("average", "first", "random", "max", "min"))"), ": ", " Returns the sample ranks of the ranges in ", list("x"), ". ", " See ", list(list("rank")), " in the ", list("base"), " package for more details. ", " ")), " ", " ")

The IPosRanges class.
Vector-comparison in the S4Vectors package for general information about comparing, ordering, and tabulating vector-like objects.
GenomicRanges-comparison in the GenomicRanges package for comparing and ordering genomic ranges.
findOverlaps for finding overlapping ranges.
intra-range-methods and inter-range-methods for intra and inter range transformations.
setops-methods for set operations on IRanges objects.

Author

Hervé Pagès

Examples

## ---------------------------------------------------------------------
## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 IPosRanges
##    DERIVATIVES
## ---------------------------------------------------------------------
x0 <- IRanges(1:11, width=4)
x0
y0 <- IRanges(6, 9)
pcompare(x0, y0)
pcompare(IRanges(4:6, width=6), y0)
pcompare(IRanges(6:8, width=2), y0)
pcompare(x0, y0) < 0   # equivalent to 'x0 < y0'
pcompare(x0, y0) == 0  # equivalent to 'x0 == y0'
pcompare(x0, y0) > 0   # equivalent to 'x0 > y0'

rangeComparisonCodeToLetter(-10:10)
rangeComparisonCodeToLetter(pcompare(x0, y0))

## Handling of zero-width ranges (a.k.a. empty ranges):
x1 <- IRanges(11:17, width=0)
x1
pcompare(x1, x1[4])
pcompare(x1, IRanges(12, 15))

## Note that x1[2] and x1[6] are empty ranges on the edge of non-empty
## range IRanges(12, 15). Even though -1 and 3 could also be considered
## valid codes for describing these configurations, pcompare()
## considers x1[2] and x1[6] to be *adjacent* to IRanges(12, 15), and
## thus returns codes -5 and 5:
pcompare(x1[2], IRanges(12, 15))  # -5
pcompare(x1[6], IRanges(12, 15))  #  5

x2 <- IRanges(start=c(20L, 8L, 20L, 22L, 25L, 20L, 22L, 22L),
width=c( 4L, 0L, 11L,  5L,  0L,  9L,  5L,  0L))
x2

which(width(x2) == 0)  # 3 empty ranges
x2[2] == x2[2]  # TRUE
x2[2] == x2[5]  # FALSE
x2 == x2[4]
x2 >= x2[3]

## ---------------------------------------------------------------------
## B. match(), selfmatch(), %in%, duplicated(), unique()
## ---------------------------------------------------------------------
table <- x2[c(2:4, 7:8)]
match(x2, table)

x2 %in% table

duplicated(x2)
unique(x2)

## ---------------------------------------------------------------------
## C. findMatches(), countMatches()
## ---------------------------------------------------------------------
findMatches(x2, table)
countMatches(x2, table)

x2_levels <- unique(x2)
countMatches(x2_levels, x2)

## ---------------------------------------------------------------------
## D. order() AND RELATED METHODS
## ---------------------------------------------------------------------
is.unsorted(x2)
order(x2)
sort(x2)
rank(x2, ties.method="first")

IPos_class()

Memory-efficient representation of integer positions

Description

The IPos class is a container for storing a set of integer positions where most of the positions are typically (but not necessarily) adjacent. Because integer positions can be seen as integer ranges of width 1, the IPos class extends the IntegerRanges virtual class. Note that even though an IRanges object can be used for storing integer positions, using an IPos object will be much more memory-efficient, especially when the object contains long runs of adjacent positions in ascending order .

Usage

IPos(pos_runs)  # constructor function

Arguments

Argument	Description
`pos_runs`	An IRanges object (or any other IntegerRanges derivative) where each range is interpreted as a run of adjacent ascending positions. If `pos_runs` is not an IntegerRanges derivative, `IPos()` first tries to coerce it to one with `as(pos_runs, "IntegerRanges", strict=FALSE)` .

Value

An IPos object.

The GPos class in the list("GenomicRanges") package for a memory-efficient representation of list("genomic ", " positions") (i.e. genomic ranges of width 1).
IntegerRanges and IRanges objects.
IPosRanges-comparison for comparing and ordering integer ranges and/or positions.
findOverlaps-methods for finding overlapping integer ranges and/or positions.
nearest-methods for finding the nearest integer range and/or position.

Note

Like for any Vector derivative, the length of an IPos object cannot exceed .Machine$integer.max (i.e. 2^31 on most platforms). IPos() will return an error if pos_runs contains too many integer positions.

Author

Hervé Pagès; based on ideas borrowed from Georg Stricker georg.stricker@in.tum.de and Julien Gagneur gagneur@in.tum.de

Examples

## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------

## Example 1:
ipos1 <- IPos(c("44-53", "5-10", "2-5"))
ipos1

length(ipos1)
pos(ipos1)  # same as 'start(ipos1)' and 'end(ipos1)'
as.character(ipos1)
as.data.frame(ipos1)
as(ipos1, "IRanges")
as.data.frame(as(ipos1, "IRanges"))
ipos1[9:17]

## Example 2:
pos_runs <- IRanges(c(1, 6, 12, 17), c(5, 10, 16, 20))
ipos2 <- IPos(pos_runs)
ipos2

## Example 3:
ipos3A <- ipos3B <- IPos(c("1-15000", "15400-88700"))
npos <- length(ipos3A)

mcols(ipos3A)$sample <- Rle("sA")
sA_counts <- sample(10, npos, replace=TRUE)
mcols(ipos3A)$counts <- sA_counts

mcols(ipos3B)$sample <- Rle("sB")
sB_counts <- sample(10, npos, replace=TRUE)
mcols(ipos3B)$counts <- sB_counts

ipos3 <- c(ipos3A, ipos3B)
ipos3

## ---------------------------------------------------------------------
## MEMORY USAGE
## ---------------------------------------------------------------------

## Coercion to IRanges works...
ipos4 <- IPos(c("1-125000", "135000-575000"))
ir4 <- as(ipos4, "IRanges")
ir4
## ... but is generally not a good idea:
object.size(ipos4)
object.size(ir4)  # 1739 times bigger than the IPos object!

## Shuffling the order of the positions impacts memory usage:
ipos4s <- sample(ipos4)
object.size(ipos4s)

## AN IMPORTANT NOTE: In the worst situations, IPos still performs as
## good as an IRanges object.
object.size(as(ipos4s, "IRanges"))  # same size as 'ipos4s'

## Best case scenario is when the object is strictly sorted (i.e.
## positions are in strict ascending order).
## This can be checked with:
is.unsorted(ipos4, strict=TRUE)  # 'ipos4' is strictly sorted

## ---------------------------------------------------------------------
## USING MEMORY-EFFICIENT METADATA COLUMNS
## ---------------------------------------------------------------------
## In order to keep memory usage as low as possible, it is recommended
## to use a memory-efficient representation of the metadata columns that
## we want to set on the object. Rle's are particularly well suited for
## this, especially if the metadata columns contain long runs of
## identical values. This is the case for example if we want to use an
## IPos object to represent the coverage of sequencing reads along a
## chromosome.

## Example 5:
library(pasillaBamSubset)
library(Rsamtools)  # for the BamFile() constructor function
bamfile1 <- BamFile(untreated1_chr4())
bamfile2 <- BamFile(untreated3_chr4())
ipos5 <- IPos(IRanges(1, seqlengths(bamfile1)[["chr4"]]))
library(GenomicAlignments)  # for "coverage" method for BamFile objects
cov1 <- coverage(bamfile1)$chr4
cov2 <- coverage(bamfile2)$chr4
mcols(ipos5) <- DataFrame(cov1, cov2)
ipos5

object.size(ipos5)  # lightweight

## Keep only the positions where coverage is at least 10 in one of the
## 2 samples:
|ipos5[mcols(ipos5)$cov1 >= 10 | mcols(ipos5)$cov2 >= 10]|

IRangesList_class()

List of IRanges and NormalIRanges

Description

IRangesList and NormalIRangesList objects for storing IRanges and NormalIRanges objects respectively.

Author

Michael Lawrence

Examples

range1 <- IRanges(start=c(1,2,3), end=c(5,2,8))
range2 <- IRanges(start=c(15,45,20,1), end=c(15,100,80,5))
named <- IRangesList(one = range1, two = range2)
length(named) # 2
names(named) # "one" and "two"
named[[1]] # range1
unnamed <- IRangesList(range1, range2)
names(unnamed) # NULL

x <- IRangesList(start=list(c(1,2,3), c(15,45,20,1)),
end=list(c(5,2,8), c(15,100,80,5)))
as.list(x)

IRanges_class()

IRanges and NormalIRanges objects

Description

The IRanges class is a simple implementation of the IntegerRanges container where 2 integer vectors of the same length are used to store the start and width values. See the IntegerRanges virtual class for a formal definition of IntegerRanges objects and for their methods (all of them should work for IRanges objects).

Some subclasses of the IRanges class are: NormalIRanges, Views , etc...

A NormalIRanges object is just an IRanges object that is guaranteed to be "normal". See the Normality section in the man page for IntegerRanges objects for the definition and properties of "normal" IntegerRanges objects.

Author

Hervé Pagès

Examples

showClass("IRanges")  # shows (some of) the known subclasses

## ---------------------------------------------------------------------
## A. MANIPULATING IRanges OBJECTS
## ---------------------------------------------------------------------
## All the methods defined for IntegerRanges objects work on IRanges
## objects.
## See ?IntegerRanges for some examples.
## Also see ?`IRanges-utils` and ?`setops-methods` for additional
## operations on IRanges objects.

## Concatenating IRanges objects
ir1 <- IRanges(c(1, 10, 20), width=5)
mcols(ir1) <- DataFrame(score=runif(3))
ir2 <- IRanges(c(101, 110, 120), width=10)
mcols(ir2) <- DataFrame(score=runif(3))
ir3 <- IRanges(c(1001, 1010, 1020), width=20)
mcols(ir3) <- DataFrame(value=runif(3))
some.iranges <- c(ir1, ir2)
## all.iranges <- c(ir1, ir2, ir3) ## This will raise an error
all.iranges <- c(ir1, ir2, ir3, ignore.mcols=TRUE)
stopifnot(is.null(mcols(all.iranges)))

## ---------------------------------------------------------------------
## B. A NOTE ABOUT PERFORMANCE
## ---------------------------------------------------------------------
## Using an IRanges object for storing a big set of ranges is more
## efficient than using a standard R data frame:
N <- 2000000L  # nb of ranges
W <- 180L      # width of each range
start <- 1L
end <- 50000000L
set.seed(777)
range_starts <- sort(sample(end-W+1L, N))
range_widths <- rep.int(W, N)
## Instantiation is faster
system.time(x <- IRanges(start=range_starts, width=range_widths))
system.time(y <- data.frame(start=range_starts, width=range_widths))
## Subsetting is faster
system.time(x16 <- x[c(TRUE, rep.int(FALSE, 15))])
system.time(y16 <- y[c(TRUE, rep.int(FALSE, 15)), ])
## Internal representation is more compact
object.size(x16)
object.size(y16)

IRanges_constructor()

The IRanges constructor and supporting functions

Description

The IRanges function is a constructor that can be used to create IRanges instances.

solveUserSEW0 and solveUserSEW are utility functions that solve a set of user-supplied start/end/width values.

Usage

## IRanges constructor:
IRanges(start=NULL, end=NULL, width=NULL, names=NULL)
## Supporting functions (not for the end user):
solveUserSEW0(start=NULL, end=NULL, width=NULL)
solveUserSEW(refwidths, start=NA, end=NA, width=NA,
             rep.refwidths=FALSE,
             translate.negative.coord=TRUE,
             allow.nonnarrowing=FALSE)

Arguments

Argument	Description
`start, end, width`	For `IRanges` and `solveUserSEW0` : `NULL` , or vector of integers (eventually with NAs). For `solveUserSEW` : vector of integers (eventually with NAs).
`names`	A character vector or `NULL` .
`refwidths`	Vector of non-NA non-negative integers containing the reference widths.
`rep.refwidths`	`TRUE` or `FALSE` . Use of `rep.refwidths=TRUE` is supported only when `refwidths` is of length 1.
`translate.negative.coord, allow.nonnarrowing`	`TRUE` or `FALSE` .

Author

Hervé Pagès

Examples

## ---------------------------------------------------------------------
## A. USING THE IRanges() CONSTRUCTOR
## ---------------------------------------------------------------------
IRanges(start=11, end=rep.int(20, 5))
IRanges(start=11, width=rep.int(20, 5))
IRanges(-2, 20)  # only one range
IRanges(start=c(2, 0, NA), end=c(NA, NA, 14), width=11:0)
IRanges()  # IRanges instance of length zero
IRanges(names=character())

## With logical input:
x <- IRanges(c(FALSE, TRUE, TRUE, FALSE, TRUE))  # logical vector input
isNormal(x)  # TRUE
x <- IRanges(Rle(1:30) %% 5 <= 2)  # logical Rle input
isNormal(x)  # TRUE

## ---------------------------------------------------------------------
## B. USING solveUserSEW()
## ---------------------------------------------------------------------
refwidths <- c(5:3, 6:7)
refwidths

solveUserSEW(refwidths)
solveUserSEW(refwidths, start=4)
solveUserSEW(refwidths, end=3, width=2)
solveUserSEW(refwidths, start=-3)
solveUserSEW(refwidths, start=-3, width=2)
solveUserSEW(refwidths, end=-4)

## The start/end/width arguments are recycled:
solveUserSEW(refwidths, start=c(3, -4, NA), end=c(-2, NA))

## Using 'rep.refwidths=TRUE':
solveUserSEW(10, start=-(1:6), rep.refwidths=TRUE)
solveUserSEW(10, end=-(1:6), width=3, rep.refwidths=TRUE)

IRanges_internals()

IRanges internals

Description

Objects, classes and methods defined in the IRanges package that are not intended to be used directly.

IRanges_utils()

IRanges utility functions

Description

Utility functions for creating or modifying IRanges objects.

Usage

## Create an IRanges instance:
successiveIRanges(width, gapwidth=0, from=1)
breakInChunks(totalsize, nchunk, chunksize)
## Turn a logical vector into a set of ranges:
whichAsIRanges(x)
## Coercion:
asNormalIRanges(x, force=TRUE)

Arguments

Argument	Description
`width`	A vector of non-negative integers (with no NAs) specifying the widths of the ranges to create.
`gapwidth`	A single integer or an integer vector with one less element than the `width` vector specifying the widths of the gaps separating one range from the next one.
`from`	A single integer specifying the starting position of the first range.
`totalsize`	A single non-negative integer. The total size of the object to break.
`nchunk`	A single positive integer. The number of chunks.
`chunksize`	A single positive integer. The size of the chunks (last chunk might be smaller).
`x`	A logical vector for `whichAsIRanges` . An IRanges object for `asNormalIRanges` .
`force`	`TRUE` or `FALSE` . Should `x` be turned into a NormalIRanges object even if `isNormal(x)` is `FALSE` ?

Details

successiveIRanges returns an IRanges instance containing the ranges that have the widths specified in the width vector and are separated by the gaps specified in gapwidth . The first range starts at position from . When gapwidth=0 and from=1 (the defaults), the returned IRanges can be seen as a partitioning of the 1:sum(width) interval. See ?Partitioning for more details on this.

breakInChunks returns a PartitioningByEnd object describing the "chunks" that result from breaking a vector-like object of length totalsize in the chunks described by nchunk or chunksize .

whichAsIRanges returns an IRanges instance containing all of the ranges where x is TRUE .

If force=TRUE (the default), then asNormalIRanges will turn x into a NormalIRanges instance by reordering and reducing the set of ranges if necessary (i.e. only if isNormal(x) is FALSE , otherwise the set of ranges will be untouched). If force=FALSE , then asNormalIRanges will turn x into a NormalIRanges instance only if isNormal(x) is TRUE , otherwise it will raise an error. Note that when force=FALSE , the returned object is guaranteed to contain exactly the same set of ranges than x . as(x, "NormalIRanges") is equivalent to asNormalIRanges(x, force=TRUE) .

IRanges objects.
Partitioning objects.
equisplit for splitting a list-like object into a specified number of partitions.
intra-range-methods and inter-range-methods for intra range and inter range transformations.
setops-methods for performing set operations on IRanges objects.
solveUserSEW
successiveViews

Author

Hervé Pagès

Examples

vec <- as.integer(c(19, 5, 0, 8, 5))

successiveIRanges(vec)

breakInChunks(600999, chunksize=50000)  # chunks of size 50000 (last
# chunk is smaller)

whichAsIRanges(vec >= 5)

x <- IRanges(start=c(-2L, 6L, 9L, -4L, 1L, 0L, -6L, 10L),
width=c( 5L, 0L, 6L,  1L, 4L, 3L,  2L,  3L))
asNormalIRanges(x)  # 3 non-empty ranges ordered from left to right and
# separated by gaps of width >= 1.

## More on normality:
example(`IRanges-class`)
isNormal(x16)                        # FALSE
if (interactive())
x16 <- asNormalIRanges(x16)      # Error!
whichFirstNotNormal(x16)             # 57
isNormal(x16[1:56])                  # TRUE
xx <- asNormalIRanges(x16[1:56])
class(xx)
max(xx)
min(xx)

IntegerRangesList_class()

IntegerRangesList objects

Description

The IntegerRangesList virtual class is a general container for storing a list of IntegerRanges objects.

Most users are probably more interested in the IRangesList container, an IntegerRangesList derivative for storing a list of IRanges objects.

Details

The place of IntegerRangesList in the list("Vector class hierarchy") : | list(" ", " Vector ", " ^ ", " | ", " List ", " ^ ", " | ", " RangesList ", " ^ ^ ", " /
", " /
", " /
", " /
", | | " /
", " /
", " IntegerRangesList GenomicRangesList ", " ^ ^ ", " | | ", " IRangesList GRangesList ", " ^ ^ ^ ^ ", " / \ /
", " / \ /
", |

"          /           \                     /           \

", " SimpleIRangesList \ SimpleGRangesList
", " CompressedIRangesList CompressedGRangesList ", " ") Note that the list("Vector class hierarchy") has many more classes. In particular Vector , List , RangesList , and IntegerRangesList have other subclasses not shown here.

Author

M. Lawrence & H. Pagès

Examples

## ---------------------------------------------------------------------
## Basic manipulation
## ---------------------------------------------------------------------

range1 <- IRanges(start=c(1, 2, 3), end=c(5, 2, 8))
range2 <- IRanges(start=c(15, 45, 20, 1), end=c(15, 100, 80, 5))
named <- IRangesList(one = range1, two = range2)
length(named) # 2
start(named) # same as start(c(range1, range2))
names(named) # "one" and "two"
named[[1]] # range1
unnamed <- IRangesList(range1, range2)
names(unnamed) # NULL

# edit the width of the ranges in the list
edited <- named
width(edited) <- rep(c(3,2), elementNROWS(named))
edited

# same as list(range1, range2)
as.list(IRangesList(range1, range2))

# coerce to data.frame
as.data.frame(named)

IRangesList(range1, range2)

## zoom in 2X
collection <- IRangesList(one = range1, range2)
collection * 2

IntegerRanges_class()

IntegerRanges objects

Description

To preprocess a IntegerRanges or IntegerRangesList object, simply call the NCList or NCLists constructor function on it.

Usage

NCList(x, circle.length=NA_integer_)
NCLists(x, circle.length=NA_integer_)

Arguments

Argument Description

x The IntegerRanges or IntegerRangesList object to preprocess.

circle.length Use only if the space (or spaces if x is a IntegerRangesList object) on top of which the ranges in x are defined needs (need) to be considered circular. If that's the case, then use circle.length to specify the length(s) of the circular space(s). For NCList , circle.length must be a single positive integer (or NA if the space is linear). For NCLists , it must be an integer vector parallel to x (i.e. same length) and with positive or NA values (NAs indicate linear spaces).

Argument	Description
`x`	The IntegerRanges or IntegerRangesList object to preprocess.
`circle.length`	Use only if the space (or spaces if `x` is a IntegerRangesList object) on top of which the ranges in `x` are defined needs (need) to be considered circular. If that's the case, then use `circle.length` to specify the length(s) of the circular space(s). For `NCList` , `circle.length` must be a single positive integer (or NA if the space is linear). For `NCLists` , it must be an integer vector parallel to `x` (i.e. same length) and with positive or NA values (NAs indicate linear spaces).

Details

The GenomicRanges package also defines the GNCList constructor and class for preprocessing and representing a vector of genomic ranges as a data structure based on Nested Containment Lists.

Some important differences between the new findOverlaps/countOverlaps implementation based on Nested Containment Lists (BioC >= 3.1) and the old implementation based on Interval Trees (BioC < 3.1):

With the new implementation, the hits returned by findOverlaps are not fully ordered (i.e. ordered by queryHits and subject Hits) anymore, but only partially ordered (i.e. ordered by queryHits only). Other than that, and except for the 2 particular situations mentioned below, the 2 implementations produce the same output. However, the new implementation is faster and more memory efficient.
With the new implementation, either the query or the subject can be preprocessed with NCList for a IntegerRanges object (replacement for IntervalTree ), NCLists for a IntegerRangesList object (replacement for IntervalForest ), and GNCList for a GenomicRanges object (replacement for GIntervalTree ). However, for a one-time use, it is NOT advised to explicitely preprocess the input. This is because findOverlaps or countOverlaps will take care of it and do a better job at it (by preprocessing only what's needed when it's needed, and releasing memory as they go).
With the new implementation, countOverlaps on IntegerRanges or GenomicRanges objects doesn't call findOverlaps in order to collect all the hits in a growing Hits object and count them only at the end. Instead, the counting happens at the C level and the hits are not kept. This reduces memory usage considerably when there is a lot of hits.
When minoverlap=0 , zero-width ranges are now interpreted as insertion points and considered to overlap with ranges that contain them. With the old alogrithm, zero-width ranges were always ignored. This is the 1st situation where the new and old implementations produce different outputs.
When using select="arbitrary" , the new implementation will generally not select the same hits as the old implementation. This is the 2nd situation where the new and old implementations produce different outputs.
The new implementation supports preprocessing of a GenomicRanges object with ranges defined on circular sequences (e.g. on the mitochnodrial chromosome). See GNCList in the GenomicRanges package for some examples.
Objects preprocessed with NCList , NCLists , and GNCList are serializable (with save ) for later use. Not a typical thing to do though, because preprocessing is very cheap (i.e. very fast and memory efficient).

Value

An NCList object for the NCList constructor and an NCLists object for the NCLists constructor.

Author

Hervé Pagès

References

Alexander V. Alekseyenko and Christopher J. Lee -- Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases. Bioinformatics (2007) 23 (11): 1386-1393. doi: 10.1093/bioinformatics/btl647

Examples

## The example below is for illustration purpose only and does NOT
## reflect typical usage. This is because, for a one-time use, it is
## NOT advised to explicitely preprocess the input for findOverlaps()
## or countOverlaps(). These functions will take care of it and do a
## better job at it (by preprocessing only what's needed when it's
## needed, and release memory as they go).

query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))

## Either the query or the subject of findOverlaps() can be preprocessed:

ppsubject <- NCList(subject)
hits1 <- findOverlaps(query, ppsubject)
hits1

ppquery <- NCList(query)
hits2 <- findOverlaps(ppquery, subject)
hits2

## Note that 'hits1' and 'hits2' contain the same hits but not in the
## same order.
stopifnot(identical(sort(hits1), sort(hits2)))

RangedData_class()

Data on ranges

Description

IMPORTANT NOTE: RangedData objects are deprecated in BioC 3.9! The use of RangedData objects has been discouraged in favor of GRanges or GRangesList objects since BioC 2.12, that is, since 2014. The GRanges and GRangesList classes are defined in the GenomicRanges package. See ?GRanges and ?GenomicRanges (after loading the GenomicRanges package) for more information about these classes. PLEASE MIGRATE YOUR CODE TO USE GRanges OR GRangesList OBJECTS INSTEAD OF RangedData OBJECTS AS SOON AS POSSIBLE. Don't hesitate to ask on the bioc-devel mailing list ( https://bioconductor.org/help/support/#bioc-devel ) if you need help with this.

RangedData supports storing data, i.e. a set of variables, on a set of ranges spanning multiple spaces (e.g. chromosomes). Although the data is split across spaces, it can still be treated as one cohesive dataset when desired and extends DataTable .

Details

A RangedData object consists of two primary components: a IntegerRangesList holding the ranges over multiple spaces and a parallel SplitDataFrameList , holding the split data. There is also an universe slot for denoting the source (e.g. the genome) of the ranges and/or data.

There are two different modes of interacting with a RangedData . The first mode treats the object as a contiguous "data frame" annotated with range information. The accessors start , end , and width get the corresponding fields in the ranges as atomic integer vectors, undoing the division over the spaces. The [[ and matrix-style [, extraction and subsetting functions unroll the data in the same way. [[<- does the inverse. The number of rows is defined as the total number of ranges and the number of columns is the number of variables in the data. It is often convenient and natural to treat the data this way, at least when the data is small and there is no need to distinguish the ranges by their space.

The other mode is to treat the RangedData as a list, with an element (a virtual IntegerRanges / DataFrame pair) for each space. The length of the object is defined as the number of spaces and the value returned by the names accessor gives the names of the spaces. The list-style [ subset function behaves analogously.

Author

Michael Lawrence

Examples

ranges <- IRanges(c(1,2,3),c(4,5,6))
filter <- c(1L, 0L, 1L)
score <- c(10L, 2L, NA)

## constructing RangedData instances

## no variables
rd <- RangedData()
rd <- RangedData(ranges)
ranges(rd)
## one variable
rd <- RangedData(ranges, score)
rd[["score"]]
## multiple variables
rd <- RangedData(ranges, filter, vals = score)
rd[["vals"]] # same as rd[["score"]] above
rd$vals
rd[["filter"]]
rd <- RangedData(ranges, score + score)
rd[["score...score"]] # names made valid

## split some data over chromosomes

range2 <- IRanges(start=c(15,45,20,1), end=c(15,100,80,5))
both <- c(ranges, range2)
score <- c(score, c(0L, 3L, NA, 22L))
filter <- c(filter, c(0L, 1L, NA, 0L))
chrom <- paste("chr", rep(c(1,2), c(length(ranges), length(range2))), sep="")

rd <- RangedData(both, score, filter, space = chrom)
rd[["score"]] # identical to score
rd[1][["score"]] # identical to score[1:3]

## subsetting

## list style: [i]

rd[numeric()] # these three are all empty
rd[logical()]
rd[NULL]
rd[] # missing, full instance returned
rd[FALSE] # logical, supports recycling
rd[c(FALSE, FALSE)] # same as above
rd[TRUE] # like rd[]
rd[c(TRUE, FALSE)]
rd[1] # numeric index
rd[c(1,2)]
rd[-2]

## matrix style: [i,j]

rd[,NULL] # no columns
rd[NULL,] # no rows
rd[,1]
rd[,1:2]
rd[,"filter"]
rd[1,] # now by the rows
rd[c(1,3),]
rd[1:2, 1] # row and column
rd[c(1:2,1,3),1] ## repeating rows

## dimnames

colnames(rd)[2] <- "foo"
colnames(rd)
rownames(rd) <- head(letters, nrow(rd))
rownames(rd)

## space names

names(rd)
names(rd)[1] <- "chr1"

## variable replacement

count <- c(1L, 0L, 2L)
rd <- RangedData(ranges, count, space = c(1, 2, 1))
## adding a variable
score <- c(10L, 2L, NA)
rd[["score"]] <- score
rd[["score"]] # same as 'score'
## replacing a variable
count2 <- c(1L, 1L, 0L)
rd[["count"]] <- count2
## numeric index also supported
rd[[2]] <- score
rd[[2]] # gets 'score'
## removing a variable
rd[[2]] <- NULL
ncol(rd) # is only 1
rd$score2 <- score

## combining

rd <- RangedData(ranges, score, space = c(1, 2, 1))
c(rd[1], rd[2]) # equal to 'rd'
rd2 <- RangedData(ranges, score)

ViewsList_class()

List of Views

Description

An extension of List that holds only Views objects.

Details

ViewsList is a virtual class. Specialized subclasses like e.g. RleViewsList are useful for storing coverage vectors over a set of spaces (e.g. chromosomes), each of which requires a separate RleViews object.

As a List subclass, ViewsList inherits all the methods available for List objects. It also presents an API that is very similar to that of Views , where operations are vectorized over the elements and generally return lists.

Author

P. Aboyoun and H. Pagès

Examples

showClass("ViewsList")

Views_class()

Views objects

Description

The Views virtual class is a general container for storing a set of views on an arbitrary Vector object, called the "subject".

Its primary purpose is to introduce concepts and provide some facilities that can be shared by the concrete classes that derive from it.

Some direct subclasses of the Views class are: RleViews , XIntegerViews (defined in the XVector package), XStringViews (defined in the Biostrings package), etc...

Author

Hervé Pagès

Examples

showClass("Views")  # shows (some of) the known subclasses

## Create a set of 4 views on an XInteger subject of length 10:
subject <- Rle(3:-6)
v1 <- Views(subject, start=4:1, end=4:7)

## Extract the 2nd view:
v1[[2]]

## Some views can be "out of limits"
v2 <- Views(subject, start=4:-1, end=6)
trim(v2)
subviews(v2, end=-2)

## See ?`XIntegerViews-class` in the XVector package for more examples.

coverage_methods()

Coverage of a set of ranges

Description

For each position in the space underlying a set of ranges, counts the number of ranges that cover it.

Usage

coverage(x, shift=0L, width=NULL, weight=1L, ...)
list(list("coverage"), list("IntegerRanges"))(x, shift=0L, width=NULL, weight=1L,
            method=c("auto", "sort", "hash"))
list(list("coverage"), list("IntegerRangesList"))(x, shift=0L, width=NULL, weight=1L,
            method=c("auto", "sort", "hash"))

Arguments

Argument	Description
`x`	A IntegerRanges , Views , or IntegerRangesList object. See ?`` in the GenomicRanges package forcoverage` methods for other objects.
`shift, weight`	`shift` specifies how much each range in `x` should be shifted before the coverage is computed. A positive shift value will shift the corresponding range in `x` to the right, and a negative value to the left. NAs are not allowed. `weight` assigns a weight to each range in `x` .

If x is an IntegerRanges or Views object: each of these arguments must be an integer or numeric vector parallel to x (will get recycled if necessary). Alternatively, each of these arguments can also be specified as a single string naming a metadata column in x (i.e. a column in mcols(x) ) to be used as the shift (or weight ) vector. Note that when x is an IPos object, each of these arguments can only be a single number.
If x is an IntegerRangesList object: each of these arguments must be a numeric vector or list-like object of the same length as x (will get recycled if necessary). If it's a numeric vector, it's first turned into a list with as.list . After recycling, each list element shift[[i]] (or weight[[i]] ) must be an integer or numeric vector parallel to x[[i]] (will get recycled if necessary). If weight is an integer vector or list-like object of integer vectors, the coverage vector(s) will be returned as integer- Rle object(s). If it's a numeric vector or list-like object of numeric vectors, the coverage vector(s) will be returned as numeric- Rle object(s). |width | Specifies the length of the returned coverage vector(s). |
If x is an IntegerRanges object: width must be NULL (the default), an NA, or a single non-negative integer. After being shifted, the ranges in x are always clipped on the left to keep only their positive portion i.e. their intersection with the [1, +inf) interval. If width is a single non-negative integer, then they're also clipped on the right to keep only their intersection with the [1, width] interval. In that case coverage returns a vector of length width . Otherwise, it returns a vector that extends to the last position in the underlying space covered by the shifted ranges.
If x is a Views object: Same as for a IntegerRanges object, except that, if width is NULL then it's treated as if it was length(subject(x)) .
If x is a IntegerRangesList object: width must be NULL or an integer vector parallel to x (i.e. with one element per list element in x ). If not NULL , the vector must contain NAs or non-negative integers and it will get recycled to the length of x if necessary. If NULL , it is replaced with NA and recycled to the length of x . Finally width[i] is used to compute the coverage vector for x[[i]] and is therefore treated like explained above (when x is a IntegerRanges object).
|method | If method is set to "sort" , then x is sorted previous to the calculation of the coverage. If method is set to hash , then x is hashed directly to a vector of length width without previous sorting. The "hash" method is faster than the "sort" method when x is large (i.e. contains a lot of ranges). When x is small and width is big (e.g. x represents a small set of reads aligned to a big chromosome), then method="sort" is faster and uses less memory than method="hash" . Using method="auto" selects the best method based on length(x) and width . | |... | Further arguments to be passed to or from other methods. |

Value

If x is a IntegerRanges or Views object: An integer- or numeric- Rle object depending on whether weight is an integer or numeric vector.

If x is a IntegerRangesList object: An RleList object with one coverage vector per list element in x , and with x names propagated to it. The i-th coverage vector can be either an integer- or numeric- Rle object, depending on the type of weight[[i]] (after weight has gone thru as.list and recycling, like described previously).

Author

H. Pagès and P. Aboyoun

Examples

## ---------------------------------------------------------------------
## A. COVERAGE OF AN IRanges OBJECT
## ---------------------------------------------------------------------
x <- IRanges(start=c(-2L, 6L, 9L, -4L, 1L, 0L, -6L, 10L),
width=c( 5L, 0L, 6L,  1L, 4L, 3L,  2L,  3L))
coverage(x)
coverage(x, shift=7)
coverage(x, shift=7, width=27)
coverage(x, shift=c(-4, 2))  # 'shift' gets recycled
coverage(x, shift=c(-4, 2), width=12)
coverage(x, shift=-max(end(x)))

coverage(restrict(x, 1, 10))
coverage(reduce(x), shift=7)
coverage(gaps(shift(x, 7), start=1, end=27))

## With weights:
coverage(x, weight=as.integer(10^(0:7)))  # integer-Rle
coverage(x, weight=c(2.8, -10))  # numeric-Rle, 'shift' gets recycled

## ---------------------------------------------------------------------
## B. COVERAGE OF AN IPos OBJECT
## ---------------------------------------------------------------------
pos_runs <- IRanges(c(1, 5, 9), c(10, 8, 15))
ipos <- IPos(pos_runs)
coverage(ipos)

## ---------------------------------------------------------------------
## C. COVERAGE OF AN IRangesList OBJECT
## ---------------------------------------------------------------------
x <- IRangesList(A=IRanges(3*(4:-1), width=1:3), B=IRanges(2:10, width=5))
cvg <- coverage(x)
cvg

stopifnot(identical(cvg[[1]], coverage(x[[1]])))
stopifnot(identical(cvg[[2]], coverage(x[[2]])))

coverage(x, width=c(50, 9))
coverage(x, width=c(NA, 9))
coverage(x, width=9)  # 'width' gets recycled

## Each list element in 'shift' and 'weight' gets recycled to the length
## of the corresponding element in 'x'.
weight <- list(as.integer(10^(0:5)), -0.77)
cvg2 <- coverage(x, weight=weight)
cvg2  # 1st coverage vector is an integer-Rle, 2nd is a numeric-Rle

identical(mapply(coverage, x=x, weight=weight), as.list(cvg2))

## ---------------------------------------------------------------------
## D. SOME MATHEMATICAL PROPERTIES OF THE coverage() FUNCTION
## ---------------------------------------------------------------------

## PROPERTY 1: The coverage vector is not affected by reordering the
## input ranges:
set.seed(24)
x <- IRanges(sample(1000, 40, replace=TRUE), width=17:10)
cvg0 <- coverage(x)
stopifnot(identical(coverage(sample(x)), cvg0))

## Of course, if the ranges are shifted and/or assigned weights, then
## this doesn't hold anymore, unless the 'shift' and/or 'weight'
## arguments are reordered accordingly.

## PROPERTY 2: The coverage of the concatenation of 2 IntegerRanges
## objects 'x' and 'y' is the sum of the 2 individual coverage vectors:
y <- IRanges(sample(-20:280, 36, replace=TRUE), width=28)
stopifnot(identical(coverage(c(x, y), width=100),
coverage(x, width=100) + coverage(y, width=100)))

## Note that, because adding 2 vectors in R recycles the shortest to
## the length of the longest, the following is generally FALSE:
identical(coverage(c(x, y)), coverage(x) + coverage(y))  # FALSE

## It would only be TRUE if the 2 coverage vectors that we add had the
## same length, which would only happen by chance. By using the same
## 'width' value when we computed the 2 coverages previously, we made
## sure they had the same length.

## Because of properties 1 & 2, we have:
x1 <- x[c(TRUE, FALSE)]  # pick up 1st, 3rd, 5th, etc... ranges
x2 <- x[c(FALSE, TRUE)]  # pick up 2nd, 4th, 6th, etc... ranges
cvg1 <- coverage(x1, width=100)
cvg2 <- coverage(x2, width=100)
stopifnot(identical(coverage(x, width=100), cvg1 + cvg2))

## PROPERTY 3: Multiplying the weights by a scalar has the effect of
## multiplying the coverage vector by the same scalar:
weight <- runif(40)
cvg3 <- coverage(x, weight=weight)
stopifnot(all.equal(coverage(x, weight=-2.68 * weight), -2.68 * cvg3))

## Because of properties 1 & 2 & 3, we have:
stopifnot(identical(coverage(x, width=100, weight=c(5L, -11L)),
5L * cvg1 - 11L * cvg2))

## PROPERTY 4: Using the sum of 2 weight vectors produces the same
## result as using the 2 weight vectors separately and summing the
## 2 results:
weight2 <- 10 * runif(40) + 3.7
stopifnot(all.equal(coverage(x, weight=weight + weight2),
cvg3 + coverage(x, weight=weight2)))

## PROPERTY 5: Repeating any input range N number of times is
## equivalent to multiplying its assigned weight by N:
times <- sample(0:10L, length(x), replace=TRUE)
stopifnot(all.equal(coverage(rep(x, times), weight=rep(weight, times)),
coverage(x, weight=weight * times)))

## In particular, if 'weight' is not supplied:
stopifnot(identical(coverage(rep(x, times)), coverage(x, weight=times)))

## PROPERTY 6: If none of the input range actually gets clipped during
## the "shift and clip" process, then:
##
##     sum(cvg) = sum(width(x) * weight)
##
stopifnot(sum(cvg3) == sum(width(x) * weight))

## In particular, if 'weight' is not supplied:
stopifnot(sum(cvg0) == sum(width(x)))

## Note that this property is sometimes used in the context of a
## ChIP-Seq analysis to estimate "the number of reads in a peak", that
## is, the number of short reads that belong to a peak in the coverage
## vector computed from the genomic locations (a.k.a. genomic ranges)
## of the aligned reads. Because of property 6, the number of reads in
## a peak is approximately the area under the peak divided by the short
## read length.

## PROPERTY 7: If 'weight' is not supplied, then disjoining or reducing
## the ranges before calling coverage() has the effect of "shaving" the
## coverage vector at elevation 1:
table(cvg0)
shaved_cvg0 <- cvg0
runValue(shaved_cvg0) <- pmin(runValue(cvg0), 1L)
table(shaved_cvg0)

stopifnot(identical(coverage(disjoin(x)), shaved_cvg0))
stopifnot(identical(coverage(reduce(x)), shaved_cvg0))

## ---------------------------------------------------------------------
## E. SOME SANITY CHECKS
## ---------------------------------------------------------------------
dummy_coverage <- function(x, shift=0L, width=NULL)
{
y <- IRanges:::unlist_as_integer(shift(x, shift))
if (is.null(width))
width <- max(c(0L, y))
Rle(tabulate(y,  nbins=width))
}

check_real_vs_dummy <- function(x, shift=0L, width=NULL)
{
res1 <- coverage(x, shift=shift, width=width)
res2 <- dummy_coverage(x, shift=shift, width=width)
stopifnot(identical(res1, res2))
}
check_real_vs_dummy(x)
check_real_vs_dummy(x, shift=7)
check_real_vs_dummy(x, shift=7, width=27)
check_real_vs_dummy(x, shift=c(-4, 2))
check_real_vs_dummy(x, shift=c(-4, 2), width=12)
check_real_vs_dummy(x, shift=-max(end(x)))

## With a set of distinct single positions:
x3 <- IRanges(sample(50000, 20000), width=1)
stopifnot(identical(sort(start(x3)), which(coverage(x3) != 0L)))

extractList()

Group elements of a vector-like object into a list-like object

Description

relist and split are 2 common ways of grouping the elements of a vector-like object into a list-like object. The IRanges and S4Vectors packages define relist and split methods that operate on a Vector object and return a List object. Note that the split methods defined in S4Vectors delegate to the splitAsList function defined in IRanges and documented below.

Because relist and split both impose restrictions on the kind of grouping that they support (e.g. every element in the input object needs to go in a group and can only go in one group), the IRanges package introduces the extractList generic function for performing arbitrary groupings.

Usage

## relist()
## --------
list(list("relist"), list("ANY,List"))(flesh, skeleton)
list(list("relist"), list("Vector,list"))(flesh, skeleton)
## splitAsList()
## -------------
splitAsList(x, f, drop=FALSE, ...)
## extractList()
## -------------
extractList(x, i)
## regroup()
## ---------
regroup(x, g)

Arguments

Argument	Description
`flesh, x`	A vector-like object.
`skeleton`	A list-like object. Only the "shape" (i.e. element lengths) of `skeleton` matters. Its exact content is ignored.
`f`	An atomic vector or a factor (possibly in Rle form).
`drop`	Logical indicating if levels that do not occur should be dropped (if `f` is a factor).
`i`	A list-like object. Unlike for `skeleton` , the content here matters (see Details section below). Note that `i` can be a IntegerRanges object (a particular type of list-like object), and, in that case, `extractList` is particularly fast (this is a common use case).
`g`	A Grouping or an object coercible to one. For `regroup` , `g` groups the elements of `x` .
`...`	Arguments to pass to methods.

Details

relist , split , and extractList have in common that they return a list-like object where each list element has the same class as the original vector-like object. Thus they need to be able to select the appropriate List concrete subclass to use for this returned value. This selection is performed by relistToClass and is based only on the class of the original object.

By default, extractList(x, i) is equivalent to: list(" relist(x[unlist(i)], i) ") An exception is made when x is a data-frame-like object. In that case x is subsetted along the rows, that is, extractList(x, i) is equivalent to: list(" relist(x[unlist(i), ], i) ") This is more or less how the default method is implemented, except for some optimizations when i is a IntegerRanges object.

relist and split (or splitAsList ) can be seen as special cases of extractList : list(" relist(flesh, skeleton) is equivalent to ", " extractList(flesh, PartitioningByEnd(skeleton)) ", " ", " split(x, f) is equivalent to ", " extractList(x, split(seq_along(f), f)) ") It is good practise to use extractList only for cases not covered by relist or split . Whenever possible, using relist or split is preferred as they will always perform more efficiently. In addition their names carry meaning and are familiar to most R users/developers so they'll make your code easier to read/understand.

Note that the transformation performed by relist or split is always reversible (via unlist and unsplit , respectively), but not the transformation performed by extractList (in general).

The regroup function splits the elements of unlist(x) into a list according to the grouping g . Each element of unlist(x) inherits its group from its parent element of x . regroup is different from relist and split , because x is already grouped, and the goal is to combine groups.

Value

The relist methods behave like utils::relist except that they return a List object. If skeleton has names, then they are propagated to the returned value.

splitAsList behaves like base::split except that the former returns a List object instead of an ordinary list.

extractList returns a list-like object parallel to i and with the same "shape" as i (i.e. same element lengths). If i has names, then they are propagated to the returned value.

All these functions return a list-like object where the list elements have the same class as x . relistToClass gives the exact class of the returned object.

The unlist and relist functions in the base and utils packages, respectively.
The split and unsplit functions in the base package.
The split methods defined in the S4Vectors package.
Vector , List , Rle , and DataFrame objects in the S4Vectors package. relistToClass is documented in the man page for List objects.
IntegerRanges objects.

Author

Hervé Pagès

Examples

## On an Rle object:
x <- Rle(101:105, 6:2)
i <- IRanges(6:10, 16:12, names=letters[1:5])
extractList(x, i)

## On a DataFrame object:
df <- DataFrame(X=x, Y=LETTERS[1:20])
extractList(df, i)

extractListFragments()

Extract list fragments from a list-like object

Description

Utilities for extracting list fragments from a list-like object.

Usage

extractListFragments(x, aranges, use.mcols=FALSE,
                     msg.if.incompatible=INCOMPATIBLE_ARANGES_MSG)
equisplit(x, nchunk, chunksize, use.mcols=FALSE)

Arguments

Argument	Description
`x`	The list-like object from which to extract the list fragments. Can be any List derivative for `extractListFragments` . Can also be an ordinary list if `extractListFragments` is called with `use.mcols=TRUE` . Can be any List derivative that supports `relist()` for `equisplit` .
`aranges`	An IntegerRanges derivative containing the list("absolute ranges") (i.e. the ranges list("along ", list("unlist(x)")) ) of the list fragments to extract. The ranges in `aranges` must be compatible with the list("cumulated length") of all the list elements in `x` , that is, `start(aranges)` and `end(aranges)` must be >= 1 and <= `sum(elementNROWS(x))` , respectively. Also please note that only IntegerRanges objects that are disjoint and sorted are supported at the moment.
`use.mcols`	Whether to propagate the metadata columns on `x` (if any) or not. Must be `TRUE` or `FALSE` (the default). If set to `FALSE` , instead of having the metadata columns propagated from `x` , the object returned by `extractListFragments` has metadata columns `revmap` and `revmap2` , and the object returned by `equisplit` has metadata column `revmap` . Note that this is the default.
`msg.if.incompatible`	The error message to use if `aranges` is not compatible with the cumulated length of all the list elements in `x` .
`nchunk`	The number of chunks. Must be a single positive integer.
`chunksize`	The size of the chunks (last chunk might be smaller). Must be a single positive integer.

Details

A list fragment of list-like object x is a window in one of its list elements.

extractListFragments is a low-level utility that extracts list fragments from list-like object x according to the absolute ranges in aranges .

equisplit fragments and splits list-like object x into a specified number of partitions with equal (total) width. This is useful for instance to ensure balanced loading of workers in parallel evaluation. For example, if x is a GRanges object, each partition is also a GRanges object and the set of all partitions is returned as a GRangesList object.

Value

An object of the same class as x for extractListFragments .

An object of class relistToClass for equisplit .

IRanges and IRangesList objects.
Partitioning objects.
IntegerList objects.
breakInChunks from breaking a vector-like object in chunks.
GRanges and GRangesList objects defined in the GenomicRanges package.
List objects defined in the S4Vectors package.
intra-range-methods and inter-range-methods for intra range and inter range transformations.

Author

Hervé Pagès

Examples

## ---------------------------------------------------------------------
## A. extractListFragments()
## ---------------------------------------------------------------------

x <- IntegerList(a=101:109, b=5:-5)
x

aranges <- IRanges(start=c(2, 4, 8, 17, 17), end=c(3, 6, 14, 16, 19))
aranges
extractListFragments(x, aranges)

x2 <- IRanges(c(1, 101, 1001, 10001), width=c(10, 5, 0, 12),
names=letters[1:4])
mcols(x2)$label <- LETTERS[1:4]
x2

aranges <- IRanges(start=13, end=20)
extractListFragments(x2, aranges)
extractListFragments(x2, aranges, use.mcols=TRUE)

aranges2 <- PartitioningByWidth(c(3, 9, 13, 0, 2))
extractListFragments(x2, aranges2)
extractListFragments(x2, aranges2, use.mcols=TRUE)

x2b <- as(x2, "IntegerList")
extractListFragments(x2b, aranges2)

x2c <- as.list(x2b)
extractListFragments(x2c, aranges2, use.mcols=TRUE)

## ---------------------------------------------------------------------
## B. equisplit()
## ---------------------------------------------------------------------

## equisplit() first calls breakInChunks() internally to create a
## PartitioningByWidth object that contains the absolute ranges of the
## chunks, then calls extractListFragments() on it 'x' to extract the
## fragments of 'x' that correspond to these absolute ranges. Finally
## the IRanges object returned by extractListFragments() is split into
## an IRangesList object where each list element corresponds to a chunk.
equisplit(x2, nchunk=2)
equisplit(x2, nchunk=2, use.mcols=TRUE)

equisplit(x2, chunksize=5)

library(GenomicRanges)
gr <- GRanges(c("chr1", "chr2"), IRanges(1, c(100, 1e5)))
equisplit(gr, nchunk=2)
equisplit(gr, nchunk=1000)

## ---------------------------------------------------------------------
## C. ADVANCED extractListFragments() EXAMPLES
## ---------------------------------------------------------------------

## === D1. Fragment list-like object into length 1 fragments ===

## First we construct a Partitioning object where all the partitions
## have a width of 1:
x2_cumlen <- nobj(PartitioningByWidth(x2))  # Equivalent to
# length(unlist(x2)) except
# that it doesn't unlist 'x2'
# so is much more efficient.
aranges1 <- PartitioningByEnd(seq_len(x2_cumlen))
aranges1

## Then we use it to fragment 'x2':
extractListFragments(x2, aranges1)
extractListFragments(x2b, aranges1)
extractListFragments(x2c, aranges1, use.mcols=TRUE)

## === D2. Fragment a Partitioning object ===

partitioning2 <- PartitioningByEnd(x2b)  # same as PartitioningByEnd(x2)
extractListFragments(partitioning2, aranges2)

## Note that when the 1st arg is a Partitioning derivative, then
## swapping the 1st and 2nd elements in the call to extractListFragments()
## doesn't change the returned partitioning:
extractListFragments(aranges2, partitioning2)

## ---------------------------------------------------------------------
## D. SANITY CHECKS
## ---------------------------------------------------------------------

## If 'aranges' is 'PartitioningByEnd(x)' or 'PartitioningByWidth(x)'
## and 'x' has no zero-length list elements, then
## 'extractListFragments(x, aranges, use.mcols=TRUE)' is a no-op.
check_no_ops <- function(x) {
aranges <- PartitioningByEnd(x)
stopifnot(identical(
extractListFragments(x, aranges, use.mcols=TRUE), x
))
aranges <- PartitioningByWidth(x)
stopifnot(identical(
extractListFragments(x, aranges, use.mcols=TRUE), x
))
}

check_no_ops(x2[lengths(x2) != 0])
check_no_ops(x2b[lengths(x2b) != 0])
check_no_ops(x2c[lengths(x2c) != 0])
check_no_ops(gr)

multisplit(x, f)

Arguments

Argument	Description
`x`	The object to split, like a vector.
`f`	A list-like object of vectors, the same length as `x` , where each element indicates the groups to which each element of `x` belongs.

Value

A list-like object, with an element for each unique value in the unlisted f , containing the elements in x where the corresponding element in f contained that value. Just try it.

Author

Michael Lawrence

Examples

multisplit(1:3, list(letters[1:2], letters[2:3], letters[2:4]))

nearest_methods()

Finding the nearest range neighbor

Description

The nearest , precede , follow , distance and distanceToNearest methods for IntegerRanges objects and subclasses.

Usage

list(list("nearest"), list("IntegerRanges,IntegerRanges_OR_missing"))(x, subject, select = c("arbitrary", "all"))
list(list("precede"), list("IntegerRanges,IntegerRanges_OR_missing"))(x, subject, select = c("first", "all"))
list(list("follow"), list("IntegerRanges,IntegerRanges_OR_missing"))(x, subject, select = c("last", "all"))
list(list("distanceToNearest"), list("IntegerRanges,IntegerRanges_OR_missing"))(x, subject, select = c("arbitrary", "all"))
list(list("distance"), list("IntegerRanges,IntegerRanges"))(x, y)
list(list("distance"), list("Pairs,missing"))(x, y)

Arguments

Argument	Description
`x`	The query IntegerRanges object, or (for `distance()` ) a Pairs containing both the query (first) and subject (second).
`subject`	The subject `IntegerRanges` object, within which the nearest neighbors are found. Can be missing, in which case `x` is also the subject.
`select`	Logic for handling ties. By default, all the methods select a single interval (arbitrary for `nearest` ,the first by order in `subject` for `precede` , and the last for `follow` ). To get all matchings, as a `Hits` object, use all .
`y`	For the `distance` method, a `IntegerRanges` object. Cannot be missing. If `x` and `y` are not the same length, the shortest will be recycled to match the length of the longest.
`hits`	The hits between `x` and `subject`
`...`	Additional arguments for methods

Details

list("nearest: ") list(" ", " The conventional nearest neighbor finder. Returns an integer vector ", " containing the index of the nearest neighbor range in ", list("subject"), " ", " for each range in ", list("x"), ". If there is no nearest neighbor ", " (if ", list("subject"), " is empty), NA's are returned. ", " ", " Here is roughly how it proceeds, for a range ", list("xi"), " in ", list("x"), ": ", " ", list(" ", " ", list(), " Find the ranges in ", list("subject"), " that overlap ", list("xi"), ". If a ", " single range ", list("si"), " in ", list("subject"), " overlaps ", list("xi"), ", ", " ", list("si"), " is returned as the nearest neighbor of ", list("xi"), ". If there ", " are multiple overlaps, one of the overlapping ranges is chosen ", " arbitrarily. ", " ", list(), " If no ranges in ", list("subject"), " overlap with ", list("xi"), ", then ", " the range in ", list("subject"), " with the shortest distance from its end ", " to the start ", list("xi"), " or its start to the end of ", list("xi"), " is ", " returned. ", " "), " ", " ")
list("precede: ") list(" ", " For each range in ", list("x"), ", ", list("precede"), " returns the index of the ", " interval in ", list("subject"), " that is directly preceded by the query ", " range. Overlapping ranges are excluded. ", list("NA"), " is returned when ", " there are no qualifying ranges in ", list("subject"), ". ", " ")
list("follow: ") list(" ", " The opposite of ", list("precede"), ", this function returns the index ", " of the range in ", list("subject"), " that a query range in ", list("x"), " ", " directly follows. Overlapping ranges are excluded. ", list("NA"), " is ", " returned when there are no qualifying ranges in ", list("subject"), ". ", " ")
list("distanceToNearest: ") list(" ", " Returns the distance for each range in ", list("x"), " to its nearest ", " neighbor in ", list("subject"), ". ", " ")
list("distance: ") list(" ", " Returns the distance for each range in ", list("x"), " to the range in ", " ", list("y"), ". ", " ", " The ", list("distance"), " method differs from others documented on this ", " page in that it is symmetric; ", list("y"), " cannot be missing. If ", list("x"), " ", " and ", list("y"), " are not the same length, the shortest will be recycled to ", " match the length of the longest. The ", list("select"), " argument is not ", " available for ", list("distance"), " because comparisons are made in a ", " pair-wise fashion. The return value is the length of the longest ", " of ", list("x"), " and ", list("y"), ". ", " ", " The ", list("distance"), " calculation changed in BioC 2.12 to accommodate ", " zero-width ranges in a consistent and intuitive manner. The new distance ", " can be explained by a ", list("block"), " model where a range is represented by ", " a series of blocks of size 1. Blocks are adjacent to each other and there ", " is no gap between them. A visual representation of ", list("IRanges(4,7)"), " ", " would be ", " ", " ", list(" ", " +-----+-----+-----+-----+ ", " 4 5 6 7 ", " "), " ", " ", " The distance between two consecutive blocks is 0L (prior to ", " Bioconductor 2.12 it was 1L). The new distance calculation now returns ", " the size of the gap between two ranges. ", " ", " This change to distance affects the notion of overlaps in that ", " we no longer say: ", " ", " x and y overlap <=> distance(x, y) == 0 ", " ", " Instead we say ", " ", " x and y overlap => distance(x, y) == 0 ", " ", " or ", " ", " x and y overlap or are adjacent <=> distance(x, y) == 0 ", " ")
list("selectNearest: ") list(" ", " Selects the hits that have the minimum distance within those for ", " each query range. Ties are possible and can be broken with ", " ", list(list("breakTies")), ". ", " ")

Value

For nearest , precede and follow , an integer vector of indices in subject , or a Hits if select="all" .

For distanceToNearest , a Hits object with an elementMetadata column of the distance between the pair. Access distance with mcols accessor.

For distance , an integer vector of distances between the ranges in x and y .

For selectNearest , a Hits object, sorted by query.

The IntegerRanges and Hits classes.
The GenomicRanges and GRanges classes in the GenomicRanges package.
findOverlaps for finding just the overlapping ranges.
list() list(" ", " GenomicRanges methods for ", " ", list(" ", " ", list(), " ", list("precede"), " ", " ", list(), " ", list("follow"), " ", " ", list(), " ", list("nearest"), " ", " ", list(), " ", list("distance"), " ", " ", list(), " ", list("distanceToNearest"), " ", " "), " ", " are documented at ", " ?", list(list("nearest-methods")), " or ", " ?", list(list("precede,GenomicRanges,GenomicRanges-method")), " ", " ")

Author

M. Lawrence

Examples

## ------------------------------------------
## precede() and follow()
## ------------------------------------------
query <- IRanges(c(1, 3, 9), c(3, 7, 10))
subject <- IRanges(c(3, 2, 10), c(3, 13, 12))

precede(query, subject)     # c(3L, 3L, NA)
precede(IRanges(), subject) # integer()
precede(query, IRanges())   # rep(NA_integer_, 3)
precede(query)              # c(3L, 3L, NA)

follow(query, subject)      # c(NA, NA, 1L)
follow(IRanges(), subject)  # integer()
follow(query, IRanges())    # rep(NA_integer_, 3)
follow(query)               # c(NA, NA, 2L)

## ------------------------------------------
## nearest()
## ------------------------------------------
query <- IRanges(c(1, 3, 9), c(2, 7, 10))
subject <- IRanges(c(3, 5, 12), c(3, 6, 12))

nearest(query, subject) # c(1L, 1L, 3L)
nearest(query)          # c(2L, 1L, 2L)

## ------------------------------------------
## distance()
## ------------------------------------------
## adjacent
distance(IRanges(1,5), IRanges(6,10)) # 0L
## overlap
distance(IRanges(1,5), IRanges(3,7))  # 0L
## zero-width
sapply(-3:3, function(i) distance(shift(IRanges(4,3), i), IRanges(4,3)))

range_squeezers()

Squeeze the ranges out of a range-based object

Description

S4 generic functions for squeezing the ranges out of a range-based object.

These are analog to range squeezers granges and grglist defined in the GenomicRanges package, except that ranges returns the ranges in an IRanges object (instead of a GRanges object for granges ), and rglist returns them in an IRangesList object (instead of a GRangesList object for grglist ).

Usage

ranges(x, use.names=TRUE, use.mcols=FALSE, ...)
rglist(x, use.names=TRUE, use.mcols=FALSE, ...)

Arguments

Argument	Description
`x`	An object containing ranges e.g. a IntegerRanges , GenomicRanges , RangedSummarizedExperiment , GAlignments , GAlignmentPairs , or GAlignmentsList object, or a Pairs object containing ranges.
`use.names`	`TRUE` (the default) or `FALSE` . Whether or not the names on `x` (accessible with `names(x)` ) should be propagated to the returned object.
`use.mcols`	`TRUE` or `FALSE` (the default). Whether or not the metadata columns on `x` (accessible with `mcols(x)` ) should be propagated to the returned object.
`...`	Additional arguments, for use in specific methods.

Details

Various packages (e.g. IRanges , GenomicRanges , SummarizedExperiment , GenomicAlignments , etc...) define and document various range squeezing methods for various types of objects.

Note that these functions can be seen as object getters or as functions performing coercion.

For some objects (e.g. GAlignments and GAlignmentPairs objects defined in the GenomicAlignments package), as(x, "IRanges") and as(x, "IRangesList") , are equivalent to ranges(x, use.names=TRUE, use.mcols=TRUE) and rglist(x, use.names=TRUE, use.mcols=TRUE) , respectively.

Value

An IRanges object for ranges .

An IRangesList object for rglist .

If x is a vector-like object (e.g. GAlignments ), the returned object is expected to be parallel to x , that is, the i-th element in the output corresponds to the i-th element in the input.

If use.names is TRUE, then the names on x (if any) are propagated to the returned object. If use.mcols is TRUE, then the metadata columns on x (if any) are propagated to the returned object.

Author

H. Pagès

Examples

## See ?GAlignments in the GenomicAlignments package for examples of
## "ranges" and "rglist" methods.

readMask()

Read a mask from a file

Description

read.agpMask and read.gapMask extract the AGAPS mask from an NCBI "agp" file or a UCSC "gap" file, respectively.

read.liftMask extracts the AGAPS mask from a UCSC "lift" file (i.e. a file containing offsets of contigs within sequences).

read.rmMask extracts the RM mask from a RepeatMasker .out file.

read.trfMask extracts the TRF mask from a Tandem Repeats Finder .bed file.

Usage

read.agpMask(file, seqname="?", mask.width=NA, gap.types=NULL, use.gap.types=FALSE)
read.gapMask(file, seqname="?", mask.width=NA, gap.types=NULL, use.gap.types=FALSE)
read.liftMask(file, seqname="?", mask.width=NA)
read.rmMask(file, seqname="?", mask.width=NA, use.IDs=FALSE)
read.trfMask(file, seqname="?", mask.width=NA)

Arguments

Argument	Description
`file`	Either a character string naming a file or a connection open for reading.
`seqname`	The name of the sequence for which the mask must be extracted. If no sequence is specified (i.e. `seqname="?"` ) then an error is raised and the sequence names found in the file are displayed. If the file doesn't contain any information for the specified sequence, then a warning is issued and an empty mask of width `mask.width` is returned.
`mask.width`	The width of the mask to return i.e. the length of the sequence this mask will be put on. See `?`` for more information about the width of a MaskCollection object.
`gap.types`	`NULL` or a character vector containing gap types. Use this argument to filter the assembly gaps that are to be extracted from the "agp" or "gap" file based on their type. Most common gap types are `"contig"` , `"clone"` , `"centromere"` , `"telomere"` , `"heterochromatin"` , `"short_arm"` and `"fragment"` . With `gap.types=NULL` , all the assembly gaps described in the file are extracted. With `gap.types="?"` , an error is raised and the gap types found in the file for the specified sequence are displayed.
`use.gap.types`	Whether or not the gap types provided in the "agp" or "gap" file should be used to name the ranges constituing the returned mask. See `?`` for more information about the names of an IRanges object.
`use.IDs`	Whether or not the repeat IDs provided in the RepeatMasker .out file should be used to name the ranges constituing the returned mask. See `?`` for more information about the names of an IRanges object.

Examples

## ---------------------------------------------------------------------
## A. Extract a mask of assembly gaps ("AGAPS" mask) with read.agpMask()
## ---------------------------------------------------------------------
## Note: The hs_b36v3_chrY.agp file was obtained by downloading,
## extracting and renaming the hs_ref_chrY.agp.gz file from
##
##   ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes/
##     hs_ref_chrY.agp.gz      5 KB  24/03/08  04:33:00 PM
##
## on May 9, 2008.

chrY_length <- 57772954
file1 <- system.file("extdata", "hs_b36v3_chrY.agp", package="IRanges")
mask1 <- read.agpMask(file1, seqname="chrY", mask.width=chrY_length,
use.gap.types=TRUE)
mask1
mask1[[1]]

mask11 <- read.agpMask(file1, seqname="chrY", mask.width=chrY_length,
gap.types=c("centromere", "heterochromatin"))
mask11[[1]]

## ---------------------------------------------------------------------
## B. Extract a mask of assembly gaps ("AGAPS" mask) with read.liftMask()
## ---------------------------------------------------------------------
## Note: The hg18liftAll.lft file was obtained by downloading,
## extracting and renaming the liftAll.zip file from
##
##   http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/
##     liftAll.zip             03-Feb-2006 11:35  5.5K
##
## on May 8, 2008.

file2 <- system.file("extdata", "hg18liftAll.lft", package="IRanges")
mask2 <- read.liftMask(file2, seqname="chr1")
mask2
if (interactive()) {
## contigs 7 and 8 for chrY are adjacent
read.liftMask(file2, seqname="chrY")

## displays the sequence names found in the file
read.liftMask(file2)

## specify an unknown sequence name
read.liftMask(file2, seqname="chrZ", mask.width=300)
}

## ---------------------------------------------------------------------
## C. Extract a RepeatMasker ("RM") or Tandem Repeats Finder ("TRF")
##    mask with read.rmMask() or read.trfMask()
## ---------------------------------------------------------------------
## Note: The ce2chrM.fa.out and ce2chrM.bed files were obtained by
## downloading, extracting and renaming the chromOut.zip and
## chromTrf.zip files from
##
##   http://hgdownload.cse.ucsc.edu/goldenPath/ce2/bigZips/
##     chromOut.zip            21-Apr-2004 09:05  2.6M
##     chromTrf.zip            21-Apr-2004 09:07  182K
##
## on May 7, 2008.

## Before you can extract a mask with read.rmMask() or read.trfMask(), you
## need to know the length of the sequence that you're going to put the
## mask on:
if (interactive()) {
library(BSgenome.Celegans.UCSC.ce2)
chrM_length <- seqlengths(Celegans)[["chrM"]]

## Read the RepeatMasker .out file for chrM in ce2:
file3 <- system.file("extdata", "ce2chrM.fa.out", package="IRanges")
RMmask <- read.rmMask(file3, seqname="chrM", mask.width=chrM_length)
RMmask

## Read the Tandem Repeats Finder .bed file for chrM in ce2:
file4 <- system.file("extdata", "ce2chrM.bed", package="IRanges")
TRFmask <- read.trfMask(file4, seqname="chrM", mask.width=chrM_length)
TRFmask
desc(TRFmask) <- paste(desc(TRFmask), "[period<=12]")
TRFmask

## Put the 2 masks on chrM:
chrM <- Celegans$chrM
masks(chrM) <- RMmask  # this would drop all current masks, if any
masks(chrM) <- append(masks(chrM), TRFmask)
chrM
}

reverse_methods()

reverse

Description

A generic function for reversing vector-like or list-like objects. This man page describes methods for reversing a character vector, a Views object, or a MaskCollection object. Note that reverse is similar to but not the same as rev .

Usage

reverse(x, ...)

Arguments

Argument	Description
`x`	A vector-like or list-like object.
`...`	Additional arguments to be passed to or from methods.

Details

On a character vector or a Views object, reverse reverses each element individually, without modifying the top-level order of the elements. More precisely, each individual string of a character vector is reversed.

Value

An object of the same class and length as the original object.

Examples

## On a character vector:
reverse(c("Hi!", "How are you?"))
rev(c("Hi!", "How are you?"))

## On a Views object:
v <- successiveViews(Rle(c(-0.5, 12.3, 4.88), 4:2), 1:4)
v
reverse(v)
rev(v)

## On a MaskCollection object:
mask1 <- Mask(mask.width=29, start=c(11, 25, 28), width=c(5, 2, 2))
mask2 <- Mask(mask.width=29, start=c(3, 10, 27), width=c(5, 8, 1))
mask3 <- Mask(mask.width=29, start=c(7, 12), width=c(2, 4))
mymasks <- append(append(mask1, mask2), mask3)
reverse(mymasks)

seqapply()

2 methods that should be documented somewhere else

Description

unsplit method for List object and split<- method for Vector object.

Usage

list(list("unsplit"), list("List"))(value, f, drop = FALSE)
list(list("split"), list("Vector"))(x, f, drop = FALSE, ...) <- value

Arguments

Argument	Description
`value`	The List object to unsplit.
`f`	A `factor` or `list` of factors
`drop`	Whether to drop empty elements from the returned list
`x`	Like `X`
`list()`	Extra arguments to pass to `FUN`

Details

unsplit unlists value , where the order of the returned vector is as if value were originally created by splitting that vector on the factor f .

split(x, f, drop = FALSE) <- value : Virtually splits x by the factor f , replaces the elements of the resulting list with the elements from the list value , and restores x to its original form. Note that this works for any Vector , even though split itself is not universally supported.

Author

Michael Lawrence

setops_methods()

Set operations on IntegerRanges and IntegerRangesList objects

Description

Performs set operations on IntegerRanges and IntegerRangesList objects.

Usage

## Vector-wise set operations
## --------------------------
list(list("union"), list("IntegerRanges,IntegerRanges"))(x, y)
list(list("union"), list("Pairs,missing"))(x, y, ...)
list(list("intersect"), list("IntegerRanges,IntegerRanges"))(x, y)
list(list("intersect"), list("Pairs,missing"))(x, y, ...)
list(list("setdiff"), list("IntegerRanges,IntegerRanges"))(x, y)
list(list("setdiff"), list("Pairs,missing"))(x, y, ...)
## Element-wise (aka "parallel") set operations
## --------------------------------------------
list(list("punion"), list("IntegerRanges,IntegerRanges"))(x, y, fill.gap=FALSE)
list(list("punion"), list("Pairs,missing"))(x, y, ...)
list(list("pintersect"), list("IntegerRanges,IntegerRanges"))(x, y, resolve.empty=c("none", "max.start", "start.x"))
list(list("pintersect"), list("Pairs,missing"))(x, y, ...)
list(list("psetdiff"), list("IntegerRanges,IntegerRanges"))(x, y)
list(list("psetdiff"), list("Pairs,missing"))(x, y, ...)
list(list("pgap"), list("IntegerRanges,IntegerRanges"))(x, y)

Arguments

Argument	Description
`x, y`	Objects representing ranges.
`fill.gap`	Logical indicating whether or not to force a union by using the rule `start = min(start(x), start(y)), end = max(end(x), end(y))` .
`resolve.empty`	One of `"none"` , `"max.start"` , or `"start.x"` denoting how to handle ambiguous empty ranges formed by intersections. `"none"` - throw an error if an ambiguous empty range is formed, `"max.start"` - associate the maximum start value with any ambiguous empty range, and `"start.x"` - associate the start value of `x` with any ambiguous empty range. (See Details section below for the definition of an ambiguous range.)
`...`	The methods for Pairs objects pass any extra argument to the internal call to `punion(first(x), last(x), ...)` , `pintersect(first(x), last(x), ...)` , etc...

Details

The union , intersect and setdiff methods for IntegerRanges objects return a "normal" IntegerRanges object representing the union, intersection and (asymmetric!) difference of the sets of integers represented by x and y .

punion , pintersect , psetdiff and pgap are generic functions that compute the element-wise (aka "parallel") union, intersection, (asymmetric!) difference and gap between each element in x and its corresponding element in y . Methods for IntegerRanges objects are defined. For these methods, x and y must have the same length (i.e. same number of ranges). They return a IntegerRanges object parallel to x and y i.e. where the i-th range corresponds to the i-th range in x and in y ) and represents the union/intersection/difference/gap of/between the corresponding x[i] and y[i] .

If x is a Pairs object, then y should be missing, and the operation is performed between the members of each pair.

By default, pintersect will throw an error when an "ambiguous empty range" is formed. An ambiguous empty range can occur three different ways: 1) when corresponding non-empty ranges elements x and y have an empty intersection, 2) if the position of an empty range element does not fall within the corresponding limits of a non-empty range element, or 3) if two corresponding empty range elements do not have the same position. For example if empty range element [22,21] is intersected with non-empty range element [1,10], an error will be produced; but if it is intersected with the range [22,28], it will produce [22,21]. As mentioned in the Arguments section above, this behavior can be changed using the resolve.empty argument.

Value

On IntegerRanges objects, union , intersect , and setdiff return an IRanges instance that is guaranteed to be normal (see isNormal ) but is NOT promoted to NormalIRanges .

On IntegerRanges objects, punion , pintersect , psetdiff , and pgap return an object of the same class and length as their first argument.

pintersect is similar to narrow , except the end points are absolute, not relative. pintersect is also similar to restrict , except ranges outside of the restriction become empty and are not discarded.
setops-methods in the GenomicRanges package for set operations on genomic ranges.
findOverlaps-methods for finding/counting overlapping ranges.
intra-range-methods and inter-range-methods for intra range and inter range transformations.
IntegerRanges and IntegerRangesList objects. In particular, normality of an IntegerRanges object is discussed in the man page for IntegerRanges objects.
mendoapply in the S4Vectors package.

Author

H. Pagès and M. Lawrence

Examples

x <- IRanges(c(1, 5, -2, 0, 14), c(10, 9, 3, 11, 17))
subject <- Rle(1:-3, 6:2)
y <- Views(subject, start=c(14, 0, -5, 6, 18), end=c(20, 2, 2, 8, 20))

## Vector-wise operations:
union(x, ranges(y))
union(ranges(y), x)

intersect(x, ranges(y))
intersect(ranges(y), x)

setdiff(x, ranges(y))
setdiff(ranges(y), x)

## Element-wise (aka "parallel") operations:
try(punion(x, ranges(y)))
punion(x[3:5], ranges(y)[3:5])
punion(x, ranges(y), fill.gap=TRUE)
try(pintersect(x, ranges(y)))
pintersect(x[3:4], ranges(y)[3:4])
pintersect(x, ranges(y), resolve.empty="max.start")
psetdiff(ranges(y), x)
try(psetdiff(x, ranges(y)))
start(x)[4] <- -99
end(y)[4] <- 99
psetdiff(x, ranges(y))
pgap(x, ranges(y))

## On IntegerRangesList objects:
irl1 <- IRangesList(a=IRanges(c(1,2),c(4,3)), b=IRanges(c(4,6),c(10,7)))
irl2 <- IRangesList(c=IRanges(c(0,2),c(4,5)), a=IRanges(c(4,5),c(6,7)))
union(irl1, irl2)
intersect(irl1, irl2)
setdiff(irl1, irl2)

slice_methods()

Slice a vector-like or list-like object

Description

slice is a generic function that creates views on a vector-like or list-like object that contain the elements that are within the specified bounds.

Usage

slice(x, lower=-Inf, upper=Inf, ...)
list(list("slice"), list("Rle"))(x, lower=-Inf, upper=Inf,
      includeLower=TRUE, includeUpper=TRUE, rangesOnly=FALSE)
list(list("slice"), list("RleList"))(x, lower=-Inf, upper=Inf,
      includeLower=TRUE, includeUpper=TRUE, rangesOnly=FALSE)

Arguments

Argument	Description
`x`	An Rle or RleList object, or any object coercible to an Rle object.
`lower, upper`	The lower and upper bounds for the slice.
`includeLower, includeUpper`	Logical indicating whether or not the specified boundary is open or closed.
`rangesOnly`	A logical indicating whether or not to drop the original data from the output.
`...`	Additional arguments to be passed to specific methods.

Details

slice is useful for finding areas of absolute maxima (peaks), absolute minima (troughs), or fluctuations within specified limits. One or more view summarization methods can be used on the result of slice . See ?link{view-summarization-methods}` ## Value The method for [Rle](#rle) objects returns an [RleViews](#rleviews) object ifrangesOnly=FALSEor an [IRanges](#iranges) object ifrangesOnly=TRUE. The method for [RleList](#rlelist) objects returns an [RleViewsList](#rleviewslist) object ifrangesOnly=FALSEor an [IRangesList](#irangeslist) object ifrangesOnly=TRUE. ## Seealso * [view-summarization-methods](#view-summarization-methods) for summarizing the views returned byslice. * [slice-methods](#slice-methods) in the XVector package for moreslicemethods. * [coverage`](#coverage) for computing the coverage across a set of ranges.
* The Rle , RleList , RleViews , and RleViewsList classes. ## Author P. Aboyoun ## Examples r ## Views derived from coverage x <- IRanges(start=c(1L, 9L, 4L, 1L, 5L, 10L), width=c(5L, 6L, 3L, 4L, 3L, 3L)) cvg <- coverage(x) slice(cvg, lower=2) slice(cvg, lower=2, rangesOnly=TRUE)

view_summarization_methods()

Summarize views on a vector-like object with numeric values

Description

viewApply applies a function on each view of a Views or ViewsList object.

viewMins , viewMaxs , viewSums , viewMeans calculate respectively the minima, maxima, sums, and means of the views in a Views or ViewsList object.

Usage

viewApply(X, FUN, ..., simplify = TRUE)
viewMins(x, na.rm=FALSE)
list(list("min"), list("Views"))(x, ..., na.rm = FALSE)
viewMaxs(x, na.rm=FALSE)
list(list("max"), list("Views"))(x, ..., na.rm = FALSE)
viewSums(x, na.rm=FALSE)
list(list("sum"), list("Views"))(x, ..., na.rm = FALSE)
viewMeans(x, na.rm=FALSE)
list(list("mean"), list("Views"))(x, ...)
viewWhichMins(x, na.rm=FALSE)
list(list("which.min"), list("Views"))(x)
viewWhichMaxs(x, na.rm=FALSE)
list(list("which.max"), list("Views"))(x)
viewRangeMins(x, na.rm=FALSE)
viewRangeMaxs(x, na.rm=FALSE)

Arguments

Argument	Description
`X`	A Views object.
`FUN`	The function to be applied to each view in `X` .
`...`	Additional arguments to be passed on.
`simplify`	A logical value specifying whether or not the result should be simplified to a vector or matrix if possible.
`x`	An RleViews or RleViewsList object.
`na.rm`	Logical indicating whether or not to include missing values in the results.

Details

The viewMins , viewMaxs , viewSums , and viewMeans functions provide efficient methods for calculating the specified numeric summary by performing the looping in compiled code.

The viewWhichMins , viewWhichMaxs , viewRangeMins , and viewRangeMaxs functions provide efficient methods for finding the locations of the minima and maxima.

Value

For all the functions in this man page (except viewRangeMins and viewRangeMaxs ): A numeric vector of the length of x if x is an RleViews object, or a List object of the length of x if it's an RleViewsList object.

For viewRangeMins and viewRangeMaxs : An IRanges object if x is an RleViews object, or an IRangesList object if it's an RleViewsList object.

Note

For convenience, methods for min , max , sum , mean , which.min and which.max are provided as wrappers around the corresponding view* functions (which might be deprecated at some point).

Author

P. Aboyoun

Examples

## Views derived from coverage
x <- IRanges(start=c(1L, 9L, 4L, 1L, 5L, 10L),
width=c(5L, 6L, 3L, 4L, 3L,  3L))
cvg <- coverage(x)
cvg_views <- slice(cvg, lower=2)

viewApply(cvg_views, diff)

viewMins(cvg_views)
viewMaxs(cvg_views)

viewSums(cvg_views)
viewMeans(cvg_views)

viewWhichMins(cvg_views)
viewWhichMaxs(cvg_views)

viewRangeMins(cvg_views)
viewRangeMaxs(cvg_views)