bioconductor v3.9.0 GenomicRanges
The ability to efficiently represent and manipulate genomic
Link to this section Summary
Functions
DelegatingGenomicRanges objects
GNCList objects
Memory-efficient representation of genomic positions
GRangesList objects
GRanges objects
GenomicRangesList objects
Comparing and ordering genomic ranges
Transform genomic ranges into "absolute" ranges
Enforcing constraints thru Constraint objects
Coverage of a GRanges or GRangesList object
Finding overlapping genomic ranges
Squeeze the genomic ranges out of a range-based object
Manipulating genomic variables
Inter range transformations of a GRanges or GRangesList object
Intra range transformations of a GRanges or GRangesList object
Make a GRanges object from a data.frame or DataFrame
Make a GRangesList object from a data.frame or DataFrame
Finding the nearest genomic range neighbor
Calculate the "phi coefficient" between two binary variables
Set operations on genomic ranges
Strand utilities
Put (virtual) tiles on a given genome
Generate windows for a GenomicRanges
Link to this section Functions
DelegatingGenomicRanges_class()
DelegatingGenomicRanges objects
Description
The DelegatingGenomicRanges
class implements the virtual
GenomicRanges
class using a delegate GenomicRanges
. This
enables developers to create GenomicRanges
subclasses that add
specialized columns or other structure, while remaining agnostic to
how the data are actually stored. See the Extending GenomicRanges
vignette.
Author
M. Lawrence
GNCList_class()
GNCList objects
Description
The GNCList class is a container for storing the Nested Containment List
representation of a vector of genomic ranges (typically represented as
a GRanges object).
To preprocess a GRanges object, simply call the GNCList
constructor function on it. The resulting GNCList object can then be used
for efficient overlap-based operations on the genomic ranges.
Usage
GNCList(x)
Arguments
Argument | Description |
---|---|
x | The GRanges (or more generally GenomicRanges ) object to preprocess. |
Details
The IRanges package also defines the NCList
and NCLists
constructors and classes for
preprocessing and representing a IntegerRanges or
IntegerRangesList object as a data structure based on
Nested Containment Lists.
Note that GNCList objects (introduced in BioC 3.1) are replacements for GIntervalTree objects (BioC < 3.1).
See ?
in the IRanges package for
some important differences between the new algorithm based on Nested
Containment Lists and the old algorithm based on interval trees.
In particular, the new algorithm supports preprocessing of a
GenomicRanges object with ranges defined on circular sequences
(e.g. on the mitochnodrial chromosome). See below for some examples.
Value
A GNCList object.
Seealso
The
NCList
andNCLists
constructors and classs defined in the IRanges package.findOverlaps
for finding/counting interval overlaps between two range-based objects.GRanges objects.
Author
H. Pagès
References
Alexander V. Alekseyenko and Christopher J. Lee -- Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases. Bioinformatics (2007) 23 (11): 1386-1393. doi: 10.1093/bioinformatics/btl647
Examples
## The examples below are for illustration purpose only and do NOT
## reflect typical usage. This is because, for a one time use, it is
## NOT advised to explicitely preprocess the input for findOverlaps()
## or countOverlaps(). These functions will take care of it and do a
## better job at it (by preprocessing only what's needed when it's
## needed, and release memory as they go).
## ---------------------------------------------------------------------
## PREPROCESS QUERY OR SUBJECT
## ---------------------------------------------------------------------
query <- GRanges(Rle(c("chrM", "chr1", "chrM", "chr1"), 4:1),
IRanges(1:10, width=5), strand=rep(c("+", "-"), 5))
subject <- GRanges(Rle(c("chr1", "chr2", "chrM"), 3:1),
IRanges(6:1, width=5), strand="+")
## Either the query or the subject of findOverlaps() can be preprocessed:
ppsubject <- GNCList(subject)
hits1a <- findOverlaps(query, ppsubject)
hits1a
hits1b <- findOverlaps(query, ppsubject, ignore.strand=TRUE)
hits1b
ppquery <- GNCList(query)
hits2a <- findOverlaps(ppquery, subject)
hits2a
hits2b <- findOverlaps(ppquery, subject, ignore.strand=TRUE)
hits2b
## Note that 'hits1a' and 'hits2a' contain the same hits but not
## necessarily in the same order.
stopifnot(identical(sort(hits1a), sort(hits2a)))
## Same for 'hits1b' and 'hits2b'.
stopifnot(identical(sort(hits1b), sort(hits2b)))
## ---------------------------------------------------------------------
## WITH CIRCULAR SEQUENCES
## ---------------------------------------------------------------------
seqinfo <- Seqinfo(c("chr1", "chr2", "chrM"),
seqlengths=c(100, 50, 10),
isCircular=c(FALSE, FALSE, TRUE))
seqinfo(query) <- seqinfo[seqlevels(query)]
seqinfo(subject) <- seqinfo[seqlevels(subject)]
ppsubject <- GNCList(subject)
hits3 <- findOverlaps(query, ppsubject)
hits3
## Circularity introduces more hits:
stopifnot(all(hits1a %in% hits3))
new_hits <- setdiff(hits3, hits1a)
new_hits # 1 new hit
query[queryHits(new_hits)]
subject[subjectHits(new_hits)] # positions 11:13 on chrM are the same
# as positions 1:3
## Sanity checks:
stopifnot(identical(new_hits, Hits(9, 6, 10, 6, sort.by.query=TRUE)))
ppquery <- GNCList(query)
hits4 <- findOverlaps(ppquery, subject)
stopifnot(identical(sort(hits3), sort(hits4)))
GPos_class()
Memory-efficient representation of genomic positions
Description
The GPos class is a container for storing a set of genomic positions where most of the positions are typically (but not necessarily) adjacent. Because genomic positions can be seen as genomic ranges of width 1, the GPos class extends the GenomicRanges virtual class. Note that even though a GRanges instance can be used for storing genomic positions, using a GPos object will be much more memory-efficient, especially when the object contains long runs of adjacent positions in ascending order .
Usage
GPos(pos_runs) # constructor function
Arguments
Argument | Description |
---|---|
pos_runs | A GRanges object (or any other GenomicRanges derivative) where each range is interpreted as a run of adjacent ascending genomic positions on the same strand. If pos_runs is not a GenomicRanges derivative, GPos() first tries to coerce it to one with as(pos_runs, "GenomicRanges", strict=FALSE) . |
Value
A GPos object.
Seealso
The IPos class in the list("IRanges") package for a memory-efficient representation of list("integer ", " positions") (i.e. integer ranges of width 1).
GenomicRanges and GRanges objects.
The
seqinfo
accessor and family in the list("GenomeInfoDb") package for accessing/modifying the seqinfo component of an object.GenomicRanges-comparison for comparing and ordering genomic positions.
findOverlaps-methods for finding overlapping genomic ranges and/or positions.
nearest-methods for finding the nearest genomic range/position neighbor.
The
snpsBySeqname
,snpsByOverlaps
, andsnpsById
methods for SNPlocs objects defined in the list("BSgenome") package for extractors that return a GPos object.SummarizedExperiment objects in the list("SummarizedExperiment") package.
Note
Like for any Vector derivative, the length of a
GPos object cannot exceed .Machine$integer.max
(i.e. 2^31 on
most platforms). GPos()
will return an error if pos_runs
contains too many genomic positions.
Internal representation of GPos objects has changed in GenomicRanges
1.29.10 (Bioc 3.6). Update any old object x
with:
x <- updateObject(x, verbose=TRUE)
.
Author
Hervé Pagès; based on ideas borrowed from Georg Stricker georg.stricker@in.tum.de and Julien Gagneur gagneur@in.tum.de
Examples
## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------
## Example 1:
gpos1 <- GPos(c("chr1:44-53", "chr1:5-10", "chr2:2-5"))
gpos1
length(gpos1)
seqnames(gpos1)
pos(gpos1) # same as 'start(gpos1)' and 'end(gpos1)'
strand(gpos1)
as.character(gpos1)
as.data.frame(gpos1)
as(gpos1, "GRanges")
as.data.frame(as(gpos1, "GRanges"))
gpos1[9:17]
## Example 2:
pos_runs <- GRanges("chrI", IRanges(c(1, 6, 12, 17), c(5, 10, 16, 20)),
strand=c("+", "-", "-", "+"))
gpos2 <- GPos(pos_runs)
gpos2
strand(gpos2)
## Example 3:
gpos3A <- gpos3B <- GPos(c("chrI:1-1000", "chrI:1005-2000"))
npos <- length(gpos3A)
mcols(gpos3A)$sample <- Rle("sA")
sA_counts <- sample(10, npos, replace=TRUE)
mcols(gpos3A)$counts <- sA_counts
mcols(gpos3B)$sample <- Rle("sB")
sB_counts <- sample(10, npos, replace=TRUE)
mcols(gpos3B)$counts <- sB_counts
gpos3 <- c(gpos3A, gpos3B)
gpos3
## Example 4:
library(BSgenome.Scerevisiae.UCSC.sacCer2)
genome <- BSgenome.Scerevisiae.UCSC.sacCer2
gpos4 <- GPos(seqinfo(genome))
gpos4 # all the positions along the genome are represented
mcols(gpos4)$dna <- do.call("c", unname(as.list(genome)))
gpos4
## Note however that, like for any Vector derivative, the length of a
## GPos object cannot exceed '.Machine$integer.max' (i.e. 2^31 on most
## platforms) so the above only works with a "small" genome.
## For example it doesn't work with the Human genome:
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
GPos(seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene)) # error!
## You can use isSmallGenome() to check upfront whether the genome is
## "small" or not.
isSmallGenome(genome)
isSmallGenome(TxDb.Hsapiens.UCSC.hg19.knownGene)
## ---------------------------------------------------------------------
## MEMORY USAGE
## ---------------------------------------------------------------------
## Coercion to GRanges works...
gr4 <- as(gpos4, "GRanges")
gr4
## ... but is generally not a good idea:
object.size(gpos4)
object.size(gr4) # 8 times bigger than the GPos object!
## Shuffling the order of the positions impacts memory usage:
gpos4r <- rev(gpos4)
object.size(gpos4r) # significantly
gpos4s <- sample(gpos4)
object.size(gpos4s) # even worse!
## AN IMPORTANT NOTE: In the worst situations, GPos still performs as
## good as a GRanges object.
object.size(as(gpos4r, "GRanges")) # same size as 'gpos4r'
object.size(as(gpos4s, "GRanges")) # same size as 'gpos4s'
## Best case scenario is when the object is strictly sorted (i.e.
## positions are in strict ascending order).
## This can be checked with:
is.unsorted(gpos4, strict=TRUE) # 'gpos4' is strictly sorted
## ---------------------------------------------------------------------
## USING MEMORY-EFFICIENT METADATA COLUMNS
## ---------------------------------------------------------------------
## In order to keep memory usage as low as possible, it is recommended
## to use a memory-efficient representation of the metadata columns that
## we want to set on the object. Rle's are particularly well suited for
## this, especially if the metadata columns contain long runs of
## identical values. This is the case for example if we want to use a
## GPos object to represent the coverage of sequencing reads along a
## genome.
## Example 5:
library(pasillaBamSubset)
library(Rsamtools) # for the BamFile() constructor function
bamfile1 <- BamFile(untreated1_chr4())
bamfile2 <- BamFile(untreated3_chr4())
gpos5 <- GPos(seqinfo(bamfile1))
library(GenomicAlignments) # for "coverage" method for BamFile objects
cvg1 <- unlist(coverage(bamfile1), use.names=FALSE)
cvg2 <- unlist(coverage(bamfile2), use.names=FALSE)
mcols(gpos5) <- DataFrame(cvg1, cvg2)
gpos5
object.size(gpos5) # lightweight
## Keep only the positions where coverage is at least 10 in one of the
## 2 samples:
|gpos5[mcols(gpos5)$cvg1 >= 10 | mcols(gpos5)$cvg2 >= 10]|
## ---------------------------------------------------------------------
## USING A GPos OBJECT IN A SummarizedExperiment OBJECT
## ---------------------------------------------------------------------
## Because the GPos class extends the GenomicRanges virtual class, a GPos
## object can be used as the rowRanges component of a SummarizedExperiment
## object.
## As a 1st example, we show how the counts for samples sA and sB in
## 'gpos3' can be stored in a SummarizedExperiment object where the rows
## correspond to unique genomic positions and the columns to samples:
library(SummarizedExperiment)
counts <- cbind(sA=sA_counts, sB=sB_counts)
mcols(gpos3A) <- NULL
rse3 <- SummarizedExperiment(list(counts=counts), rowRanges=gpos3A)
rse3
rowRanges(rse3)
head(assay(rse3))
## Finally we show how the coverage data from Example 5 can be easily
## stored in a lightweight SummarizedExperiment object:
cvg <- mcols(gpos5)
mcols(gpos5) <- NULL
rse5 <- SummarizedExperiment(list(cvg=cvg), rowRanges=gpos5)
rse5
rowRanges(rse5)
assay(rse5)
## Keep only the positions where coverage is at least 10 in one of the
## 2 samples:
|rse5[assay(rse5)$cvg1 >= 10 | assay(rse5)$cvg2 >= 10]|
GRangesList_class()
GRangesList objects
Description
The GRangesList class is a container for storing a collection of GRanges objects. It is a subclass of GenomicRangesList . It exists in 2 flavors: SimpleGRangesList and CompressedGRangesList. Each flavor uses a particular internal representation. The CompressedGRangesList flavor is the default. It is particularly efficient for storing a large number of list elements and operating on them.
Seealso
GRanges objects.
seqinfo
in the GenomeInfoDb package.IntegerRangesList objects in the IRanges package.
RleList objects in the IRanges package.
DataFrameList objects in the IRanges package.
intra-range-methods , inter-range-methods , coverage-methods , setops-methods , and findOverlaps-methods .
GenomicRangesList objects.
Author
P. Aboyoun & H. Pagès
Examples
## Construction with GRangesList():
gr1 <-
GRanges(seqnames = "chr2", ranges = IRanges(3, 6),
strand = "+", score = 5L, GC = 0.45)
gr2 <-
GRanges(seqnames = c("chr1", "chr1"),
ranges = IRanges(c(7,13), width = 3),
strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5))
gr3 <-
GRanges(seqnames = c("chr1", "chr2"),
ranges = IRanges(c(1, 4), c(3, 9)),
strand = c("-", "-"), score = c(6L, 2L), GC = c(0.4, 0.1))
grl <- GRangesList("gr1" = gr1, "gr2" = gr2, "gr3" = gr3)
grl
## Summarizing elements:
elementNROWS(grl)
table(seqnames(grl))
## Extracting subsets:
grl[seqnames(grl) == "chr1", ]
grl[seqnames(grl) == "chr1" & strand(grl) == "+", ]
## Renaming the underlying sequences:
seqlevels(grl)
seqlevels(grl) <- sub("chr", "Chrom", seqlevels(grl))
grl
## Coerce to IRangesList (seqnames and strand information is lost):
as(grl, "IRangesList")
## isDisjoint():
isDisjoint(grl)
## disjoin():
disjoin(grl) # metadata columns and order NOT preserved
## Construction with makeGRangesListFromFeatureFragments():
filepath <- system.file("extdata", "feature_frags.txt",
package="GenomicRanges")
featfrags <- read.table(filepath, header=TRUE, stringsAsFactors=FALSE)
grl2 <- with(featfrags,
makeGRangesListFromFeatureFragments(seqnames=targetName,
fragmentStarts=targetStart,
fragmentWidths=blockSizes,
strand=strand))
names(grl2) <- featfrags$RefSeqID
grl2
GRanges_class()
GRanges objects
Description
The GRanges class is a container for the genomic locations and their associated annotations.
Details
GRanges is a vector of genomic locations and associated annotations. Each element in the vector is comprised of a sequence name, an interval, a strand , and optional metadata columns (e.g. score, GC content, etc.). This information is stored in four components: list(" ", " ", list(list(list("seqnames")), list("a 'factor' ", list("Rle"), " object ", " containing the sequence names.")), " ", " ", list(list(list("ranges")), list("an ", list("IRanges"), " object containing ", " the ranges.")), " ", " ", list(list(list("strand")), list("a 'factor' ", list("Rle"), " object containing ", " the ", list("strand"), " information.")), " ", " ", list(list(list("mcols")), list("a ", list("DataFrame"), " object ", " containing the metadata columns. Columns cannot be named ",
" ", list(""seqnames""), ", ", list(""ranges""), ", ", list(""strand""), ",
", " ", list(""seqlevels""), ", ", list(""seqlengths""), ", ", list(""isCircular""), ", ", " ", list(""start""), ", ", list(""end""), ", ", list(""width""), ", or ", list(""element""), ".")), " ", " ", list(list(list("seqinfo")), list("a ", list("Seqinfo"), " object containing information ", " about the set of genomic sequences present in the GRanges object.")), " ", " ")
Seealso
makeGRangesFromDataFrame
for making a GRanges object from a data.frame or DataFrame object.seqinfo
for accessing/modifying information about the underlying sequences of a GRanges object.The GPos class, a memory-efficient GenomicRanges derivative for representing genomic positions (i.e. genomic ranges of width 1).
GenomicRanges-comparison for comparing and ordering genomic ranges.
findOverlaps-methods for finding/counting overlapping genomic ranges.
intra-range-methods and inter-range-methods for intra range and inter range transformations of a GRanges object.
coverage-methods for computing the coverage of a GRanges object.
setops-methods for set operations on GRanges objects.
nearest-methods for finding the nearest genomic range neighbor.
absoluteRanges
for transforming genomic ranges into absolute ranges (i.e. into ranges on the sequence obtained by virtually concatenating all the sequences in a genome).tileGenome
for putting tiles on a genome.genomicvars for manipulating genomic variables.
GRangesList objects.
IntegerRanges objects in the IRanges package.
Vector , Rle , and DataFrame objects in the S4Vectors package.
Author
P. Aboyoun and H. Pagès
Examples
## ---------------------------------------------------------------------
## CONSTRUCTION
## ---------------------------------------------------------------------
## Specifying the bare minimum i.e. seqnames and ranges only. The
## GRanges object will have no names, no strand information, and no
## metadata columns:
gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1))
gr0
## Specifying names, strand, metadata columns. They can be set on an
## existing object:
names(gr0) <- head(letters, 10)
strand(gr0) <- Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2))
mcols(gr0)$score <- 1:10
mcols(gr0)$GC <- seq(1, 0, length=10)
gr0
## ... or specified at construction time:
gr <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1, names=head(letters, 10)),
Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
score=1:10, GC=seq(1, 0, length=10))
stopifnot(identical(gr0, gr))
## Specifying the seqinfo. It can be set on an existing object:
seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
seqinfo(gr0) <- merge(seqinfo(gr0), seqinfo)
seqlevels(gr0) <- seqlevels(seqinfo)
## ... or specified at construction time:
gr <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1, names=head(letters, 10)),
Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
score=1:10, GC=seq(1, 0, length=10),
seqinfo=seqinfo)
stopifnot(identical(gr0, gr))
## ---------------------------------------------------------------------
## COERCION
## ---------------------------------------------------------------------
## From GRanges:
as.character(gr)
as.factor(gr)
as.data.frame(gr)
## From character to GRanges:
x1 <- "chr2:56-125"
as(x1, "GRanges")
as(rep(x1, 4), "GRanges")
x2 <- c(A=x1, B="chr1:25-30:-")
as(x2, "GRanges")
## From data.frame to GRanges:
df <- data.frame(chrom="chr2", start=11:15, end=20:24)
gr3 <- as(df, "GRanges")
## Alternatively, coercion to GRanges can be done by just calling the
## GRanges() constructor on the object to coerce:
gr1 <- GRanges(x1) # same as as(x1, "GRanges")
gr2 <- GRanges(x2) # same as as(x2, "GRanges")
gr3 <- GRanges(df) # same as as(df, "GRanges")
## Sanity checks:
stopifnot(identical(as(x1, "GRanges"), gr1))
stopifnot(identical(as(x2, "GRanges"), gr2))
stopifnot(identical(as(df, "GRanges"), gr3))
## ---------------------------------------------------------------------
## SUMMARIZING ELEMENTS
## ---------------------------------------------------------------------
table(seqnames(gr))
table(strand(gr))
sum(width(gr))
table(gr)
summary(mcols(gr)[,"score"])
## The number of lines displayed in the 'show' method are controlled
## with two global options:
longGR <- sample(gr, 25, replace=TRUE)
longGR
options(showHeadLines=7)
options(showTailLines=2)
longGR
## Revert to default values
options(showHeadLines=NULL)
options(showTailLines=NULL)
## ---------------------------------------------------------------------
## INVERTING THE STRAND
## ---------------------------------------------------------------------
invertStrand(gr)
## ---------------------------------------------------------------------
## RENAMING THE UNDERLYING SEQUENCES
## ---------------------------------------------------------------------
seqlevels(gr)
seqlevels(gr) <- sub("chr", "Chrom", seqlevels(gr))
gr
seqlevels(gr) <- sub("Chrom", "chr", seqlevels(gr)) # revert
## ---------------------------------------------------------------------
## COMBINING OBJECTS
## ---------------------------------------------------------------------
gr2 <- GRanges(seqnames=Rle(c('chr1', 'chr2', 'chr3'), c(3, 3, 4)),
IRanges(1:10, width=5),
strand='-',
score=101:110, GC=runif(10),
seqinfo=seqinfo)
gr3 <- GRanges(seqnames=Rle(c('chr1', 'chr2', 'chr3'), c(3, 4, 3)),
IRanges(101:110, width=10),
strand='-',
score=21:30,
seqinfo=seqinfo)
some.gr <- c(gr, gr2)
c(gr, gr2, gr3)
c(gr, gr2, gr3, ignore.mcols=TRUE)
## ---------------------------------------------------------------------
## USING A GRANGES OBJECT AS A SUBSCRIPT TO SUBSET ANOTHER OBJECT
## ---------------------------------------------------------------------
## Subsetting *by* a GRanges subscript is supported only if the object
## to subset is a named list-like object:
x <- RleList(chr1=101:120, chr2=2:-8, chr3=31:40)
x[gr]
GenomicRangesList_class()
GenomicRangesList objects
Description
The GenomicRangesList virtual class is a general container for storing a list of GenomicRanges objects.
Most users are probably more interested in the GRangesList container, a GenomicRangesList derivative for storing a list of GRanges objects.
Details
The place of GenomicRangesList in the list("Vector class hierarchy") :
| list("
", " Vector
", " ^
", " |
", " List
", " ^
", " |
", " RangesList
", " ^ ^
", " /
", " /
", " /
", " /
", |
| " /
", " /
", " IntegerRangesList GenomicRangesList
", " ^ ^
", " | |
", " IRangesList GRangesList
", " ^ ^ ^ ^
", " / \ /
", " / \ /
", |
" / \ / \
", " SimpleIRangesList \ SimpleGRangesList
", " CompressedIRangesList CompressedGRangesList
", " ")
Note that the list("Vector class hierarchy") has many more classes.
In particular Vector , List ,
RangesList , and IntegerRangesList
have other subclasses not shown here.
Seealso
GRangesList objects.
GenomicRanges and GRanges objects.
Author
H. Pagès and M. Lawrence
GenomicRanges_comparison()
Comparing and ordering genomic ranges
Description
Methods for comparing and/or ordering GenomicRanges objects.
Usage
## duplicated()
## ------------
list(list("duplicated"), list("GenomicRanges"))(x, incomparables=FALSE, fromLast=FALSE,
nmax=NA, method=c("auto", "quick", "hash"))
## match() & selfmatch()
## ---------------------
list(list("match"), list("GenomicRanges,GenomicRanges"))(x, table, nomatch=NA_integer_, incomparables=NULL,
method=c("auto", "quick", "hash"), ignore.strand=FALSE)
list(list("selfmatch"), list("GenomicRanges"))(x, method=c("auto", "quick", "hash"), ignore.strand=FALSE)
## order() and related methods
## ----------------------------
list(list("is.unsorted"), list("GenomicRanges"))(x, na.rm=FALSE, strictly=FALSE, ignore.strand=FALSE)
list(list("order"), list("GenomicRanges"))(..., na.last=TRUE, decreasing=FALSE,
method=c("auto", "shell", "radix"))
list(list("sort"), list("GenomicRanges"))(x, decreasing=FALSE, ignore.strand=FALSE, by)
list(list("rank"), list("GenomicRanges"))(x, na.last=TRUE,
ties.method=c("average", "first", "last", "random", "max", "min"),
ignore.strand=FALSE)
## Generalized parallel comparison of 2 GenomicRanges objects
## ----------------------------------------------------------
list(list("pcompare"), list("GenomicRanges,GenomicRanges"))(x, y)
Arguments
Argument | Description |
---|---|
x, table, y | GenomicRanges objects. |
incomparables | Not supported. |
fromLast, method, nomatch, nmax, na.rm, strictly, na.last, decreasing | See `?`` in the IRanges package for a description of these arguments. |
ignore.strand | Whether or not the strand should be ignored when comparing 2 genomic ranges. |
... | One or more GenomicRanges objects. The GenomicRanges objects after the first one are used to break ties. |
ties.method | A character string specifying how ties are treated. Only "first" is supported for now. |
by | An optional formula that is resolved against as.env(x) ; the resulting variables are passed to order to generate the ordering permutation. |
Details
Two elements of a GenomicRanges derivative (i.e. two genomic ranges)
are considered equal iff they are on the same underlying sequence and strand,
and share the same start and width. duplicated()
and unique()
on a GenomicRanges derivative are conforming to this.
The "natural order" for the elements of a GenomicRanges derivative
is to order them (a) first by sequence level, (b) then by strand, (c) then
by start, (d) and finally by width.
This way, the space of genomic ranges is totally ordered.
Note that, because we already do (c) and (d) for regular ranges (see
?`` ), genomic ranges that belong to the same underlying sequence and strand are ordered like regular ranges.
pcompare(),
==,
!=,
<=,
>=,
<and
>on [GenomicRanges](#genomicranges) derivatives behave accordingly to this "natural order".
is.unsorted(),
order(),
sort(),
rank()on [GenomicRanges](#genomicranges) derivatives also behave accordingly to this "natural order". Finally, note that some inter range transformations like [
reduce](#reduce) or [
disjoin](#disjoin) also use this "natural order" implicitly when operating on [GenomicRanges](#genomicranges) derivatives. ## Seealso * The [GenomicRanges](#genomicranges) class. * [IPosRanges-comparison](#iposranges-comparison) in the IRanges package for comparing and ordering genomic ranges. * [findOverlaps-methods](#findoverlaps-methods) for finding overlapping genomic ranges. * [intra-range-methods](#intra-range-methods) and [inter-range-methods](#inter-range-methods) for intra range and inter range transformations of a [GRanges](#granges) object. * [setops-methods](#setops-methods) for set operations on [GenomicRanges](#genomicranges) objects. ## Author H. Pagès,
is.unsorted` contributed by Pete Hickey
## Examples
r gr0 <- GRanges( Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), IRanges(c(1:9,7L), end=10), strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), seqlengths=c(chr1=11, chr2=12, chr3=13) ) gr <- c(gr0, gr0[7:3]) names(gr) <- LETTERS[seq_along(gr)] ## --------------------------------------------------------------------- ## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 GenomicRanges OBJECTS ## --------------------------------------------------------------------- gr[2] == gr[2] # TRUE gr[2] == gr[5] # FALSE gr == gr[4] gr >= gr[3] ## --------------------------------------------------------------------- ## B. match(), selfmatch(), %in%, duplicated(), unique() ## --------------------------------------------------------------------- table <- gr[1:7] match(gr, table) match(gr, table, ignore.strand=TRUE) gr %in% table duplicated(gr) unique(gr) ## --------------------------------------------------------------------- ## C. findMatches(), countMatches() ## --------------------------------------------------------------------- findMatches(gr, table) countMatches(gr, table) findMatches(gr, table, ignore.strand=TRUE) countMatches(gr, table, ignore.strand=TRUE) gr_levels <- unique(gr) countMatches(gr_levels, gr) ## --------------------------------------------------------------------- ## D. order() AND RELATED METHODS ## --------------------------------------------------------------------- is.unsorted(gr) order(gr) sort(gr) is.unsorted(sort(gr)) is.unsorted(gr, ignore.strand=TRUE) gr2 <- sort(gr, ignore.strand=TRUE) is.unsorted(gr2) # TRUE is.unsorted(gr2, ignore.strand=TRUE) # FALSE ## TODO: Broken. Please fix! #sort(gr, by = ~ seqnames + start + end) # equivalent to (but slower than) above score(gr) <- rev(seq_len(length(gr))) ## TODO: Broken. Please fix! #sort(gr, by = ~ score) rank(gr, ties.method="first") rank(gr, ties.method="first", ignore.strand=TRUE) ## --------------------------------------------------------------------- ## E. GENERALIZED ELEMENT-WISE COMPARISON OF 2 GenomicRanges OBJECTS ## --------------------------------------------------------------------- gr3 <- GRanges(c(rep("chr1", 12), "chr2"), IRanges(c(1:11, 6:7), width=3)) strand(gr3)[12] <- "+" gr4 <- GRanges("chr1", IRanges(5, 9)) pcompare(gr3, gr4) rangeComparisonCodeToLetter(pcompare(gr3, gr4))
absoluteRanges()
Transform genomic ranges into "absolute" ranges
Description
absoluteRanges
transforms the genomic ranges in x
into
absolute ranges i.e. into ranges counted from the beginning of
the virtual sequence obtained by concatenating all the sequences in the
underlying genome (in the order reported by seqlevels(x)
).
relativeRanges
performs the reverse transformation.
NOTE: These functions only work on small genomes. See Details section below.
Usage
absoluteRanges(x)
relativeRanges(x, seqlengths)
## Related utility:
isSmallGenome(seqlengths)
Arguments
Argument | Description |
---|---|
x | For absoluteRanges : a GenomicRanges object with ranges defined on a small genome (see Details section below). For relativeRanges : an IntegerRanges object. |
seqlengths | An object holding sequence lengths. This can be a named integer (or numeric) vector with no duplicated names as returned by seqlengths , or any object from which sequence lengths can be extracted with seqlengths . For relativeRanges , seqlengths must describe a small genome (see Details section below). |
Details
Because absoluteRanges
returns the absolute ranges in an
IRanges object, and because an IRanges
object cannot hold ranges with an end > .Machine$integer.max
(i.e. >= 2^31 on most platforms), absoluteRanges
cannot be used
if the size of the underlying genome (i.e. the total length of the
sequences in it) is > .Machine$integer.max
. Utility function
isSmallGenome
is provided as a mean for the user to check
upfront whether the genome is small (i.e. its size is <=
.Machine$integer.max
) or not, and thus compatible with
absoluteRanges
or not.
relativeRanges
applies the same restriction by looking at the
seqlengths
argument.
Value
An IRanges object for absoluteRanges
.
A GRanges object for relativeRanges
.
absoluteRanges
and relativeRanges
both return an object that
is parallel to the input object (i.e. same length and names).
isSmallGenome
returns TRUE if the total length of the underlying
sequences is <= .Machine$integer.max
(e.g. Fly genome),
FALSE if not (e.g. Human genome), or NA if it cannot be computed (because
some sequence lengths are NA).
Seealso
GRanges objects.
IntegerRanges objects in the IRanges package.
Seqinfo objects and the
seqlengths
getter in the GenomeInfoDb package.genomicvars for manipulating genomic variables.
The
tileGenome
function for putting tiles on a genome.
Author
H. Pagès
Examples
## ---------------------------------------------------------------------
## TOY EXAMPLE
## ---------------------------------------------------------------------
gr <- GRanges(Rle(c("chr2", "chr1", "chr3", "chr1"), 4:1),
IRanges(1:10, width=5),
seqinfo=Seqinfo(c("chr1", "chr2", "chr3"), c(100, 50, 20)))
ar <- absoluteRanges(gr)
ar
gr2 <- relativeRanges(ar, seqlengths(gr))
gr2
## Sanity check:
stopifnot(all(gr == gr2))
## ---------------------------------------------------------------------
## ON REAL DATA
## ---------------------------------------------------------------------
## With a "small" genome
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
ex <- exons(txdb)
ex
isSmallGenome(ex)
## Note that because isSmallGenome() can return NA (see Value section
## above), its result should always be wrapped inside isTRUE() when
## used in an if statement:
if (isTRUE(isSmallGenome(ex))) {
ar <- absoluteRanges(ex)
ar
ex2 <- relativeRanges(ar, seqlengths(ex))
ex2 # original strand is not restored
## Sanity check:
strand(ex2) <- strand(ex) # restore the strand
stopifnot(all(ex == ex2))
}
## With a "big" genome (but we can reduce it)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
ex <- exons(txdb)
isSmallGenome(ex)
absoluteRanges(ex) # error!
## However, if we are only interested in some chromosomes, we might
## still be able to use absoluteRanges():
seqlevels(ex, pruning.mode="coarse") <- paste0("chr", 1:10)
isSmallGenome(ex) # TRUE!
ar <- absoluteRanges(ex)
ex2 <- relativeRanges(ar, seqlengths(ex))
## Sanity check:
strand(ex2) <- strand(ex)
stopifnot(all(ex == ex2))
constraint()
Enforcing constraints thru Constraint objects
Description
Attaching a Constraint object to an object of class A (the "constrained" object) is meant to be a convenient/reusable/extensible way to enforce a particular set of constraints on particular instances of A.
THIS IS AN EXPERIMENTAL FEATURE AND STILL VERY MUCH A WORK-IN-PROGRESS!
Details
For the developer, using constraints is an alternative to the more traditional approach that consists in creating subclasses of A and implementing specific validity methods for each of them. However, using constraints offers the following advantages over the traditional approach:
The traditional approach often tends to lead to a proliferation of subclasses of A.
Constraints can easily be re-used across different classes without the need to create any new class.
Constraints can easily be combined.
All constraints are implemented as concrete subclasses of the Constraint class, which is a virtual class with no slots. Like the Constraint virtual class itself, concrete Constraint subclasses cannot have slots.
Here are the 7 steps typically involved in the process of putting constraints on objects of class A:
Add a slot named
constraint
to the definition of class A. The type of this slot must be Constraint_OR_NULL. Note that any subclass of A will inherit this slot.Implements the
constraint()
accessors (getter and setter) for objects of class A. This is done by implementing the"constraint"
method (getter) and replacement method (setter) for objects of class A (see the examples below). As a convenience to the user, the setter should also accept the name of a constraint (i.e. the name of its class) in addition to an instance of that class. Note that those accessors will work on instances of any subclass of A.Modify the validity method for class A so it also returns the result of
checkConstraint(x, constraint(x))
(append this result to the result returned by the validity method).Testing: Create
x
, an instance of class A (or subclass of A). By default there is no constraint on it (constraint(x)
isNULL
).validObject(x)
should returnTRUE
.Create a new constraint (MyConstraint) by extending the Constraint class, typically with
setClass("MyConstraint", contains="Constraint")
. This constraint is not enforcing anything yet so you could put it onx
(withconstraint(x) <- "MyConstraint"
), but not much would happen. In order to actually enforce something, a"checkConstraint"
method for signaturec(x="A", constraint="MyConstraint")
needs to be implemented.Implement a
"checkConstraint"
method for signaturec(x="A", constraint="MyConstraint")
. Like validity methods,"checkConstraint"
methods must returnNULL
or a character vector describing the problems found. Like validity methods, they should never fail (i.e. they should never raise an error). Note that, alternatively, an existing constraint (e.g. SomeConstraint) can be adapted to work on objects of class A by just defining a new"checkConstraint"
method for signaturec(x="A", constraint="SomeConstraint")
. Also, stricter constraints can be built on top of existing constraints by extending one or more existing constraints (see the examples below).Testing: Try
constraint(x) <- "MyConstraint"
. It will or will not work depending on whetherx
satisfies the constraint or not. In the former case, trying to modifyx
in a way that breaks the constraint on it will also raise an error.
Seealso
setClass
,
is
,
setMethod
,
showMethods
,
validObject
,
GenomicRanges-class
Note
WARNING: This note is not true anymore as the constraint
slot has
been temporarily removed from GenomicRanges objects (starting with
package GenomicRanges >= 1.7.9).
Currently, only GenomicRanges objects can be constrained, that is:
they have a
constraint
slot;they have
constraint()
accessors (getter and setter) for this slot;their validity method has been modified so it also returns the result of
checkConstraint(x, constraint(x))
.
More classes in the GenomicRanges and IRanges packages will support constraints in the near future.
Author
H. Pagès
Examples
## The examples below show how to define and set constraints on
## GenomicRanges objects. Note that this is how the constraint()
## setter is defined for GenomicRanges objects:
#setReplaceMethod("constraint", "GenomicRanges",
# function(x, value)
# {
# if (isSingleString(value))
# value <- new(value)
# if (!is(value, "Constraint_OR_NULL"))
# stop("the supplied 'constraint' must be a ",
# "Constraint object, a single string, or NULL")
# x@constraint <- value
# validObject(x)
# x
# }
#)
#selectMethod("constraint", "GenomicRanges") # the getter
#selectMethod("constraint<-", "GenomicRanges") # the setter
## We'll use the GRanges instance 'gr' created in the GRanges examples
## to test our constraints:
example(GRanges, echo=FALSE)
gr
#constraint(gr)
## ---------------------------------------------------------------------
## EXAMPLE 1: The HasRangeTypeCol constraint.
## ---------------------------------------------------------------------
## The HasRangeTypeCol constraint checks that the constrained object
## has a unique "rangeType" metadata column and that this column
## is a 'factor' Rle with no NAs and with the following levels
## (in this order): gene, transcript, exon, cds, 5utr, 3utr.
setClass("HasRangeTypeCol", contains="Constraint")
## Like validity methods, "checkConstraint" methods must return NULL or
## a character vector describing the problems found. They should never
## fail i.e. they should never raise an error.
setMethod("checkConstraint", c("GenomicRanges", "HasRangeTypeCol"),
function(x, constraint, verbose=FALSE)
{
x_mcols <- mcols(x)
idx <- match("rangeType", colnames(x_mcols))
|if (length(idx) != 1L || is.na(idx)) {|
msg <- c("'mcols(x)' must have exactly 1 column ",
"named "rangeType"")
return(paste(msg, collapse=""))
}
rangeType <- x_mcols[[idx]]
.LEVELS <- c("gene", "transcript", "exon", "cds", "5utr", "3utr")
|if (!is(rangeType, "Rle") |||
|S4Vectors:::anyMissing(runValue(rangeType)) |||
!identical(levels(rangeType), .LEVELS))
{
msg <- c("'mcols(x)$rangeType' must be a ",
"'factor' Rle with no NAs and with levels: ",
paste(.LEVELS, collapse=", "))
return(paste(msg, collapse=""))
}
NULL
}
)
#ontrun{
#constraint(gr) <- "HasRangeTypeCol" # will fail
#}
checkConstraint(gr, new("HasRangeTypeCol")) # with GenomicRanges >= 1.7.9
levels <- c("gene", "transcript", "exon", "cds", "5utr", "3utr")
rangeType <- Rle(factor(c("cds", "gene"), levels=levels), c(8, 2))
mcols(gr)$rangeType <- rangeType
#constraint(gr) <- "HasRangeTypeCol" # OK
checkConstraint(gr, new("HasRangeTypeCol")) # with GenomicRanges >= 1.7.9
## Use is() to check whether the object has a given constraint or not:
#is(constraint(gr), "HasRangeTypeCol") # TRUE
#ontrun{
#mcols(gr)$rangeType[3] <- NA # will fail
#}
mcols(gr)$rangeType[3] <- NA
checkConstraint(gr, new("HasRangeTypeCol")) # with GenomicRanges >= 1.7.9
## ---------------------------------------------------------------------
## EXAMPLE 2: The GeneRanges constraint.
## ---------------------------------------------------------------------
## The GeneRanges constraint is defined on top of the HasRangeTypeCol
## constraint. It checks that all the ranges in the object are of type
## "gene".
setClass("GeneRanges", contains="HasRangeTypeCol")
## The checkConstraint() generic will check the HasRangeTypeCol constraint
## first, and, only if it's statisfied, it will then check the GeneRanges
## constraint.
setMethod("checkConstraint", c("GenomicRanges", "GeneRanges"),
function(x, constraint, verbose=FALSE)
{
rangeType <- mcols(x)$rangeType
if (!all(rangeType == "gene")) {
msg <- c("all elements in 'mcols(x)$rangeType' ",
"must be equal to "gene"")
return(paste(msg, collapse=""))
}
NULL
}
)
#ontrun{
#constraint(gr) <- "GeneRanges" # will fail
#}
checkConstraint(gr, new("GeneRanges")) # with GenomicRanges >= 1.7.9
mcols(gr)$rangeType[] <- "gene"
## This replace the previous constraint (HasRangeTypeCol):
#constraint(gr) <- "GeneRanges" # OK
checkConstraint(gr, new("GeneRanges")) # with GenomicRanges >= 1.7.9
#is(constraint(gr), "GeneRanges") # TRUE
## However, 'gr' still indirectly has the HasRangeTypeCol constraint
## (because the GeneRanges constraint extends the HasRangeTypeCol
## constraint):
#is(constraint(gr), "HasRangeTypeCol") # TRUE
#ontrun{
#mcols(gr)$rangeType[] <- "exon" # will fail
#}
mcols(gr)$rangeType[] <- "exon"
checkConstraint(gr, new("GeneRanges")) # with GenomicRanges >= 1.7.9
## ---------------------------------------------------------------------
## EXAMPLE 3: The HasGCCol constraint.
## ---------------------------------------------------------------------
## The HasGCCol constraint checks that the constrained object has a
## unique "GC" metadata column, that this column is of type numeric,
## with no NAs, and that all the values in that column are >= 0 and <= 1.
setClass("HasGCCol", contains="Constraint")
setMethod("checkConstraint", c("GenomicRanges", "HasGCCol"),
function(x, constraint, verbose=FALSE)
{
x_mcols <- mcols(x)
idx <- match("GC", colnames(x_mcols))
|if (length(idx) != 1L || is.na(idx)) {|
msg <- c("'mcols(x)' must have exactly ",
"one column named "GC"")
return(paste(msg, collapse=""))
}
GC <- x_mcols[[idx]]
|if (!is.numeric(GC) |||
|S4Vectors:::anyMissing(GC) |||
|any(GC < 0) || any(GC > 1))|
{
msg <- c("'mcols(x)$GC' must be a numeric vector ",
"with no NAs and with values between 0 and 1")
return(paste(msg, collapse=""))
}
NULL
}
)
## This replace the previous constraint (GeneRanges):
#constraint(gr) <- "HasGCCol" # OK
checkConstraint(gr, new("HasGCCol")) # with GenomicRanges >= 1.7.9
#is(constraint(gr), "HasGCCol") # TRUE
#is(constraint(gr), "GeneRanges") # FALSE
#is(constraint(gr), "HasRangeTypeCol") # FALSE
## ---------------------------------------------------------------------
## EXAMPLE 4: The HighGCRanges constraint.
## ---------------------------------------------------------------------
## The HighGCRanges constraint is defined on top of the HasGCCol
## constraint. It checks that all the ranges in the object have a GC
## content >= 0.5.
setClass("HighGCRanges", contains="HasGCCol")
## The checkConstraint() generic will check the HasGCCol constraint
## first, and, if it's statisfied, it will then check the HighGCRanges
## constraint.
setMethod("checkConstraint", c("GenomicRanges", "HighGCRanges"),
function(x, constraint, verbose=FALSE)
{
GC <- mcols(x)$GC
if (!all(GC >= 0.5)) {
msg <- c("all elements in 'mcols(x)$GC' ",
"must be >= 0.5")
return(paste(msg, collapse=""))
}
NULL
}
)
#ontrun{
#constraint(gr) <- "HighGCRanges" # will fail
#}
checkConstraint(gr, new("HighGCRanges")) # with GenomicRanges >= 1.7.9
mcols(gr)$GC[6:10] <- 0.5
#constraint(gr) <- "HighGCRanges" # OK
checkConstraint(gr, new("HighGCRanges")) # with GenomicRanges >= 1.7.9
#is(constraint(gr), "HighGCRanges") # TRUE
#is(constraint(gr), "HasGCCol") # TRUE
## ---------------------------------------------------------------------
## EXAMPLE 5: The HighGCGeneRanges constraint.
## ---------------------------------------------------------------------
## The HighGCGeneRanges constraint is the combination (AND) of the
## GeneRanges and HighGCRanges constraints.
setClass("HighGCGeneRanges", contains=c("GeneRanges", "HighGCRanges"))
## No need to define a method for this constraint: the checkConstraint()
## generic will automatically check the GeneRanges and HighGCRanges
## constraints.
#constraint(gr) <- "HighGCGeneRanges" # OK
checkConstraint(gr, new("HighGCGeneRanges")) # with GenomicRanges >= 1.7.9
#is(constraint(gr), "HighGCGeneRanges") # TRUE
#is(constraint(gr), "HighGCRanges") # TRUE
#is(constraint(gr), "HasGCCol") # TRUE
#is(constraint(gr), "GeneRanges") # TRUE
#is(constraint(gr), "HasRangeTypeCol") # TRUE
## See how all the individual constraints are checked (from less
## specific to more specific constraints):
#checkConstraint(gr, constraint(gr), verbose=TRUE)
checkConstraint(gr, new("HighGCGeneRanges"), verbose=TRUE) # with
# GenomicRanges
# >= 1.7.9
## See all the "checkConstraint" methods:
showMethods("checkConstraint")
coverage_methods()
Coverage of a GRanges or GRangesList object
Description
coverage
methods for GRanges and
GRangesList objects.
NOTE: The coverage
generic function and methods
for IntegerRanges and IntegerRangesList
objects are defined and documented in the IRanges package.
Methods for GAlignments and
GAlignmentPairs objects are defined and
documented in the GenomicAlignments package.
Usage
list(list("coverage"), list("GenomicRanges"))(x, shift=0L, width=NULL, weight=1L,
method=c("auto", "sort", "hash"))
list(list("coverage"), list("GRangesList"))(x, shift=0L, width=NULL, weight=1L,
method=c("auto", "sort", "hash"))
Arguments
Argument | Description |
---|---|
x | A GenomicRanges or GRangesList object. |
shift, weight | A numeric vector or a list-like object. If numeric, it must be parallel to x (recycled if necessary). If a list-like object, it must have 1 list element per seqlevel in x , and its names must be exactly seqlevels(x) . Alternatively, each of these arguments can also be specified as a single string naming a metadata column in x (i.e. a column in mcols(x) ) to be used as the shift (or weight ) vector. See ? in the IRanges package for more information about these arguments. Note that when x is a GPos object, each of these arguments can only be a single number or a named list-like object. |
width | Either NULL (the default), or an integer vector. If NULL , it is replaced with seqlengths(x) . Otherwise, the vector must have the length and names of seqlengths(x) and contain NAs or non-negative integers. See ? in the IRanges package for more information about this argument. |
method | See ? in the IRanges package for a description of this argument. |
Details
When x
is a GRangesList object, coverage(x, ...)
is equivalent to coverage(unlist(x), ...)
.
Value
A named RleList object with one coverage vector per
seqlevel in x
.
Seealso
coverage
in the IRanges package.coverage-methods in the GenomicAlignments package.
RleList objects in the IRanges package.
GRanges , GPos , and GRangesList objects.
Author
H. Pagès and P. Aboyoun
Examples
## Coverage of a GRanges object:
gr <- GRanges(
seqnames=Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges=IRanges(1:10, end=10),
strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
seqlengths=c(chr1=11, chr2=12, chr3=13))
cvg <- coverage(gr)
pcvg <- coverage(gr[strand(gr) == "+"])
mcvg <- coverage(gr[strand(gr) == "-"])
scvg <- coverage(gr[strand(gr) == "*"])
stopifnot(identical(pcvg + mcvg + scvg, cvg))
## Coverage of a GPos object:
pos_runs <- GRanges(c("chr1", "chr1", "chr2"),
IRanges(c(1, 5, 9), c(10, 8, 15)))
gpos <- GPos(pos_runs)
coverage(gpos)
## Coverage of a GRangesList object:
gr1 <- GRanges(seqnames="chr2",
ranges=IRanges(3, 6),
strand = "+")
gr2 <- GRanges(seqnames=c("chr1", "chr1"),
ranges=IRanges(c(7,13), width=3),
strand=c("+", "-"))
gr3 <- GRanges(seqnames=c("chr1", "chr2"),
ranges=IRanges(c(1, 4), c(3, 9)),
strand=c("-", "-"))
grl <- GRangesList(gr1=gr1, gr2=gr2, gr3=gr3)
stopifnot(identical(coverage(grl), coverage(unlist(grl))))
findOverlaps_methods()
Finding overlapping genomic ranges
Description
Various methods for finding/counting overlaps between objects containing genomic ranges. This man page describes the methods that operate on GenomicRanges and GRangesList objects.
NOTE: The findOverlaps
generic function
and methods for IntegerRanges and
IntegerRangesList objects are defined and
documented in the IRanges package.
The methods for GAlignments ,
GAlignmentPairs , and
GAlignmentsList objects are defined and
documented in the GenomicAlignments package.
GenomicRanges and GRangesList objects also support
countOverlaps
, overlapsAny
, and subsetByOverlaps
thanks to the default methods defined in the IRanges package and
to the findOverlaps
and countOverlaps
methods defined in
this package and documented below.
Usage
list(list("findOverlaps"), list("GenomicRanges,GenomicRanges"))(query, subject,
maxgap=-1L, minoverlap=0L,
type=c("any", "start", "end", "within", "equal"),
select=c("all", "first", "last", "arbitrary"),
ignore.strand=FALSE)
list(list("countOverlaps"), list("GenomicRanges,GenomicRanges"))(query, subject,
maxgap=-1L, minoverlap=0L,
type=c("any", "start", "end", "within", "equal"),
ignore.strand=FALSE)
Arguments
Argument | Description |
---|---|
query, subject | A GRanges or GRangesList object. |
maxgap, minoverlap, type | See ? in the IRanges package for a description of these arguments. |
select | When select is "all" (the default), the results are returned as a Hits object. Otherwise the returned value is an integer vector parallel to query (i.e. same length) containing the first, last, or arbitrary overlapping interval in subject , with NA indicating intervals that did not overlap any intervals in subject . |
ignore.strand | When set to TRUE , the strand information is ignored in the overlap calculations. |
Details
When the query and the subject are GRanges or
GRangesList objects, findOverlaps
uses the triplet
(sequence name, range, strand) to determine which features (see
paragraph below for the definition of feature) from the query
overlap which features in the subject
, where a strand value
of "*"
is treated as occurring on both the "+"
and
"-"
strand.
An overlap is recorded when a feature in the query
and a feature
in the subject
have the same sequence name, have a compatible
pairing of strands (e.g. "+"
/ "+"
, "-"
/ "-"
,
"*"
/ "+"
, "*"
/ "-"
, etc.), and satisfy the
interval overlap requirements.
In the context of findOverlaps
, a feature is a collection of
ranges that are treated as a single entity. For GRanges objects,
a feature is a single range; while for GRangesList objects,
a feature is a list element containing a set of ranges. In the results,
the features are referred to by number, which run from 1 to
length(query)
/ length(subject)
.
For type="equal"
with GRangesList objects, query[[i]]
matches subject[[j]]
iff for each range in query[[i]]
there is an identical range in subject[[j]]
, and vice versa.
Value
For findOverlaps
either a Hits object when
select="all"
or an integer vector otherwise.
For countOverlaps
an integer vector containing the tabulated
query overlap hits.
Seealso
The Hits class for representing a set of hits between 2 vector-like objects.
The
findOverlaps
generic function defined in the IRanges package.The GNCList constructor and class for preprocessing and representing a GenomicRanges or object as a data structure based on Nested Containment Lists.
The GRanges and GRangesList classes.
Author
P. Aboyoun, S. Falcon, M. Lawrence, and H. Pagès
Examples
## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------
## GRanges object:
gr <- GRanges(
seqnames=Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges=IRanges(1:10, width=10:1, names=head(letters,10)),
strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
score=1:10,
GC=seq(1, 0, length=10)
)
gr
## GRangesList object:
gr1 <- GRanges(seqnames="chr2", ranges=IRanges(4:3, 6),
strand="+", score=5:4, GC=0.45)
gr2 <- GRanges(seqnames=c("chr1", "chr1"),
ranges=IRanges(c(7,13), width=3),
strand=c("+", "-"), score=3:4, GC=c(0.3, 0.5))
gr3 <- GRanges(seqnames=c("chr1", "chr2"),
ranges=IRanges(c(1, 4), c(3, 9)),
strand=c("-", "-"), score=c(6L, 2L), GC=c(0.4, 0.1))
grl <- GRangesList("gr1"=gr1, "gr2"=gr2, "gr3"=gr3)
## Overlapping two GRanges objects:
table(!is.na(findOverlaps(gr, gr1, select="arbitrary")))
countOverlaps(gr, gr1)
findOverlaps(gr, gr1)
subsetByOverlaps(gr, gr1)
countOverlaps(gr, gr1, type="start")
findOverlaps(gr, gr1, type="start")
subsetByOverlaps(gr, gr1, type="start")
findOverlaps(gr, gr1, select="first")
findOverlaps(gr, gr1, select="last")
findOverlaps(gr1, gr)
findOverlaps(gr1, gr, type="start")
findOverlaps(gr1, gr, type="within")
findOverlaps(gr1, gr, type="equal")
## ---------------------------------------------------------------------
## MORE EXAMPLES
## ---------------------------------------------------------------------
table(!is.na(findOverlaps(gr, gr1, select="arbitrary")))
countOverlaps(gr, gr1)
findOverlaps(gr, gr1)
subsetByOverlaps(gr, gr1)
## Overlaps between a GRanges and a GRangesList object:
table(!is.na(findOverlaps(grl, gr, select="first")))
countOverlaps(grl, gr)
findOverlaps(grl, gr)
subsetByOverlaps(grl, gr)
countOverlaps(grl, gr, type="start")
findOverlaps(grl, gr, type="start")
subsetByOverlaps(grl, gr, type="start")
findOverlaps(grl, gr, select="first")
table(!is.na(findOverlaps(grl, gr1, select="first")))
countOverlaps(grl, gr1)
findOverlaps(grl, gr1)
subsetByOverlaps(grl, gr1)
countOverlaps(grl, gr1, type="start")
findOverlaps(grl, gr1, type="start")
subsetByOverlaps(grl, gr1, type="start")
findOverlaps(grl, gr1, select="first")
## Overlaps between two GRangesList objects:
countOverlaps(grl, rev(grl))
findOverlaps(grl, rev(grl))
subsetByOverlaps(grl, rev(grl))
genomic_range_squeezers()
Squeeze the genomic ranges out of a range-based object
Description
S4 generic functions for squeezing the genomic ranges out of a range-based object.
These are analog to range squeezers ranges
and
rglist
defined in the IRanges package, except
that granges
returns the ranges in a GRanges object (instead
of an IRanges object for ranges
),
and grglist
returns them in a GRangesList object (instead of
an IRangesList object for rglist
).
Usage
granges(x, use.names=TRUE, use.mcols=FALSE, ...)
grglist(x, use.names=TRUE, use.mcols=FALSE, ...)
Arguments
Argument | Description |
---|---|
x | An object containing genomic ranges e.g. a GenomicRanges , RangedSummarizedExperiment , GAlignments , GAlignmentPairs , or GAlignmentsList object, or a Pairs object containing genomic ranges. |
use.names, use.mcols, ... | See ranges in the IRanges package for a description of these arguments. |
Details
See ranges
in the IRanges package for
some details.
For some objects (e.g. GAlignments and
GAlignmentPairs objects defined in the
GenomicAlignments package), as(x, "GRanges")
and
as(x, "GRangesList")
, are equivalent to
granges(x, use.names=TRUE, use.mcols=TRUE)
and
grglist(x, use.names=TRUE, use.mcols=TRUE)
, respectively.
Value
A GRanges object for granges
.
A GRangesList object for grglist
.
If x
is a vector-like object (e.g.
GAlignments ), the returned object is expected
to be parallel to x
, that is, the i-th element in the output
corresponds to the i-th element in the input.
If use.names
is TRUE, then the names on x
(if any) are propagated to the returned object.
If use.mcols
is TRUE, then the metadata columns on x
(if any) are propagated to the returned object.
Seealso
GRanges and GRangesList objects.
RangedSummarizedExperiment objects in the SummarizedExperiment packages.
GAlignments , GAlignmentPairs , and GAlignmentsList objects in the GenomicAlignments package.
Author
H. Pagès
Examples
## See ?GAlignments in the GenomicAlignments package for examples of
## "ranges" and "rglist" methods.
genomicvars()
Manipulating genomic variables
Description
A genomic variable is a variable defined along a genome. Here are 2 ways a genomic variable is generally represented in Bioconductor:
as a named RleList object with one list element per chromosome;
as a metadata column on a disjoint GRanges object.
This man page documents tools for switching from one form to the other.
Usage
bindAsGRanges(...)
mcolAsRleList(x, varname)
binnedAverage(bins, numvar, varname, na.rm=FALSE)
Arguments
Argument | Description |
---|---|
... | One or more genomic variables in the form of named RleList objects. |
x | A disjoint GRanges object with metadata columns on it. A GRanges object is said to be disjoint if it contains ranges that do not overlap with each other. This can be tested with isDisjoint . See ?`` for more information about the isDisjoint` method for GRanges objects. |
varname | The name of the genomic variable. For mcolAsRleList this must be the name of the metadata column on x to be turned into an RleList object. For binnedAverage this will be the name of the metadata column that contains the binned average in the returned object. |
bins | A GRanges object representing the genomic bins. Typically obtained by calling tileGenome with cut.last.tile.in.chrom=TRUE . |
numvar | A named RleList object representing a numerical variable defined along the genome covered by bins (which is the genome described by seqinfo(bins) ). |
na.rm | A logical value indicating whether NA values should be stripped before the average is computed. |
Details
bindAsGRanges
allows to switch the representation of one or
more genomic variables from the named RleList form to the
metadata column on a disjoint GRanges object form by binding
the supplied named RleList objects together and putting them
on the same GRanges object. This transformation is lossless.
mcolAsRleList
performs the opposite transformation and is also
lossless (however the circularity flags and genome information in
seqinfo(x)
won't propagate). It works for any metadata column on
x
that can be put in Rle form i.e. that is an
atomic vector or a factor.
binnedAverage
computes the binned average of a numerical variable
defined along a genome.
Value
For bindAsGRanges
: a GRanges object with 1 metadata column
per supplied genomic variable.
For mcolAsRleList
: a named RleList object with
1 list element per seqlevel in x
.
For binnedAverage
: input GRanges object bins
with
an additional metadata column named varname
containing the binned
average.
Seealso
RleList objects in the IRanges package.
coverage,GenomicRanges-method for computing the coverage of a GRanges object.
The
tileGenome
function for putting tiles on a genome.GRanges objects and isDisjoint,GenomicRanges-method for the
isDisjoint
method for GenomicRanges objects.
Author
H. Pagès
Examples
## ---------------------------------------------------------------------
## A. TWO WAYS TO REPRESENT A GENOMIC VARIABLE
## -----------------------------------------------------------------
## 1) As a named RleList object
## ----------------------------
## Let's create a genomic variable in the "named RleList" form:
library(BSgenome.Scerevisiae.UCSC.sacCer2)
set.seed(55)
my_var <- RleList(
lapply(seqlengths(Scerevisiae),
function(seqlen) {
tmp <- sample(50L, seqlen, replace=TRUE)%/% 50L
Rle(cumsum(tmp - rev(tmp)))
}
),
compress=FALSE)
my_var
## 2) As a metadata column on a disjoint GRanges object
## ----------------------------------------------------
gr1 <- bindAsGRanges(my_var=my_var)
gr1
gr2 <- GRanges(c("chrI:1-150",
"chrI:211-285",
"chrI:291-377",
"chrV:51-60"),
score=c(0.4, 8, -10, 2.2),
id=letters[1:4],
seqinfo=seqinfo(Scerevisiae))
gr2
## Going back to the "named RleList" form:
mcolAsRleList(gr1, "my_var")
score <- mcolAsRleList(gr2, "score")
score
id <- mcolAsRleList(gr2, "id")
id
bindAsGRanges(score=score, id=id)
## Bind 'my_var', 'score', and 'id' together:
gr3 <- bindAsGRanges(my_var=my_var, score=score, id=id)
## Sanity checks:
stopifnot(identical(my_var, mcolAsRleList(gr3, "my_var")))
stopifnot(identical(score, mcolAsRleList(gr3, "score")))
stopifnot(identical(id, mcolAsRleList(gr3, "id")))
gr2b <- bindAsGRanges(score=score, id=id)
seqinfo(gr2b) <- seqinfo(gr2)
stopifnot(identical(gr2, gr2b))
## ---------------------------------------------------------------------
## B. BIND TOGETHER THE COVERAGES OF SEVERAL BAM FILES
## ---------------------------------------------------------------------
library(pasillaBamSubset)
library(GenomicAlignments)
untreated1_cvg <- coverage(BamFile(untreated1_chr4()))
untreated3_cvg <- coverage(BamFile(untreated3_chr4()))
all_cvg <- bindAsGRanges(untreated1=untreated1_cvg,
untreated3=untreated3_cvg)
## Keep regions with coverage:
all_cvg[with(mcols(all_cvg), untreated1 + untreated3 >= 1)]
## Plot the coverage profiles with the Gviz package:
library(Gviz)
plotNumvars <- function(numvars, region, name="numvars", ...)
{
stopifnot(is(numvars, "GRanges"))
stopifnot(is(region, "GRanges"), length(region) == 1L)
gtrack <- GenomeAxisTrack()
dtrack <- DataTrack(numvars,
chromosome=as.character(seqnames(region)),
name=name,
groups=colnames(mcols(numvars)), type="l", ...)
plotTracks(list(gtrack, dtrack), from=start(region), to=end(region))
}
plotNumvars(all_cvg, GRanges("chr4:1-25000"),
"coverage", col=c("red", "blue"))
plotNumvars(all_cvg, GRanges("chr4:1.03e6-1.08e6"),
"coverage", col=c("red", "blue"))
## Sanity checks:
stopifnot(identical(untreated1_cvg, mcolAsRleList(all_cvg, "untreated1")))
stopifnot(identical(untreated3_cvg, mcolAsRleList(all_cvg, "untreated3")))
## ---------------------------------------------------------------------
## C. COMPUTE THE BINNED AVERAGE OF A NUMERICAL VARIABLE DEFINED ALONG A
## GENOME
## ---------------------------------------------------------------------
## In some applications (e.g. visualization), there is the need to compute
## the average of a genomic variable for a set of predefined fixed-width
## regions (sometimes called "bins").
## Let's use tileGenome() to create such a set of bins:
bins1 <- tileGenome(seqinfo(Scerevisiae), tilewidth=100,
cut.last.tile.in.chrom=TRUE)
## Compute the binned average for 'my_var' and 'score':
bins1 <- binnedAverage(bins1, my_var, "binned_var")
bins1
bins1 <- binnedAverage(bins1, score, "binned_score")
bins1
## Binned average in "named RleList" form:
binned_var1 <- mcolAsRleList(bins1, "binned_var")
binned_var1
stopifnot(all.equal(mean(my_var), mean(binned_var1))) # sanity check
mcolAsRleList(bins1, "binned_score")
## With bigger bins:
bins2 <- tileGenome(seqinfo(Scerevisiae), tilewidth=50000,
cut.last.tile.in.chrom=TRUE)
bins2 <- binnedAverage(bins2, my_var, "binned_var")
bins2 <- binnedAverage(bins2, score, "binned_score")
bins2
binned_var2 <- mcolAsRleList(bins2, "binned_var")
binned_var2
stopifnot(all.equal(mean(my_var), mean(binned_var2))) # sanity check
mcolAsRleList(bins2, "binned_score")
## Not surprisingly, the "binned" variables are much more compact in
## memory than the original variables (they contain much less runs):
object.size(my_var)
object.size(binned_var1)
object.size(binned_var2)
## ---------------------------------------------------------------------
## D. SANITY CHECKS
## ---------------------------------------------------------------------
bins3 <- tileGenome(c(chr1=10, chr2=8), tilewidth=5,
cut.last.tile.in.chrom=TRUE)
my_var3 <- RleList(chr1=Rle(c(1:3, NA, 5:7)), chr2=Rle(c(-3, NA, -3, NaN)))
bins3 <- binnedAverage(bins3, my_var3, "binned_var3", na.rm=TRUE)
binned_var3 <- mcols(bins3)$binned_var3
stopifnot(
identical(mean(my_var3$chr1[1:5], na.rm=TRUE),
binned_var3[1]),
identical(mean(c(my_var3$chr1, 0, 0, 0)[6:10], na.rm=TRUE),
binned_var3[2]),
#identical(mean(c(my_var3$chr2, 0), na.rm=TRUE),
# binned_var3[3]),
identical(0, binned_var3[4])
)
inter_range_methods()
Inter range transformations of a GRanges or GRangesList object
Description
This man page documents list("inter range transformations") of a GenomicRanges object (i.e. of an object that belongs to the GenomicRanges class or one of its subclasses, this includes for example GRanges objects), or a GRangesList object.
See ?`` and
?in the list("IRanges") package for a quick introduction to list("intra range") and list("inter ", " range transformations") . See `?
for
list("intra range transformations") of a GenomicRanges object or
GRangesList object.
Usage
list(list("range"), list("GenomicRanges"))(x, ..., with.revmap=FALSE, ignore.strand=FALSE, na.rm=FALSE)
list(list("range"), list("GRangesList"))(x, ..., with.revmap=FALSE, ignore.strand=FALSE, na.rm=FALSE)
list(list("reduce"), list("GenomicRanges"))(x, drop.empty.ranges=FALSE, min.gapwidth=1L, with.revmap=FALSE,
with.inframe.attrib=FALSE, ignore.strand=FALSE)
list(list("reduce"), list("GRangesList"))(x, drop.empty.ranges=FALSE, min.gapwidth=1L, with.revmap=FALSE,
with.inframe.attrib=FALSE, ignore.strand=FALSE)
list(list("gaps"), list("GenomicRanges"))(x, start=1L, end=seqlengths(x))
list(list("disjoin"), list("GenomicRanges"))(x, with.revmap=FALSE, ignore.strand=FALSE)
list(list("disjoin"), list("GRangesList"))(x, with.revmap=FALSE, ignore.strand=FALSE)
list(list("isDisjoint"), list("GenomicRanges"))(x, ignore.strand=FALSE)
list(list("isDisjoint"), list("GRangesList"))(x, ignore.strand=FALSE)
list(list("disjointBins"), list("GenomicRanges"))(x, ignore.strand=FALSE)
Arguments
Argument | Description |
---|---|
x | A GenomicRanges or GenomicRangesList object. |
drop.empty.ranges, min.gapwidth, with.revmap, with.inframe.attrib, start, end | See `?`` in the IRanges package. |
ignore.strand | TRUE or FALSE . Whether the strand of the input ranges should be ignored or not. See details below. |
... | For range , additional GenomicRanges objects to consider. Ignored otherwise. |
na.rm | Ignored. |
Details
list(list("On a GRanges object"), list(" ", " ", list("range"), " returns an object of the same type as ", list("x"), " ", " containing range bounds for each distinct (seqname, strand) pairing. ", " The names (", list("names(x)"), ") and the metadata columns in ", list("x"), " are ", " dropped. ", " ", " ", list("reduce"), " returns an object of the same type as ", list("x"), " ", " containing reduced ranges for each distinct (seqname, strand) pairing. ", " The names (",
list("names(x)"), ") and the metadata columns in ", list("x"), " are
", " dropped. ", " See ", list("?", list("reduce")), " for more information about range ", " reduction and for a description of the optional arguments. ", " ", " ", list("gaps"), " returns an object of the same type as ", list("x"), " ", " containing complemented ranges for each distinct (seqname, strand) pairing. ", " The names (", list("names(x)"), ") and the columns in ", list("x"), " are dropped. ",
" For the start and end arguments of this gaps method, it is expected that
", " the user will supply a named integer vector (where the names correspond to ", " the appropriate seqlevels). See ", list("?", list("gaps")), " for more ", " information about range complements and for a description of the optional ", " arguments. ", " ", " ", list("disjoin"), " returns an object of the same type as ", list("x"), " ", " containing disjoint ranges for each distinct (seqname, strand) pairing. ",
" The names (", list("names(x)"), ") and the metadata columns in ", list("x"), " are
", " dropped. If ", list("with.revmap=TRUE"), ", a metadata column that maps the ", " ouput ranges to the input ranges is added to the returned object. ", " See ", list("?", list("disjoin")), " for more information. ", " ", " ", list("isDisjoint"), " returns a logical value indicating whether the ranges ", " in ", list("x"), " are disjoint (i.e. non-overlapping). ", " ", " ", list(
"disjointBins"), " returns bin indexes for the ranges in ", list("x"), ", such
", " that ranges in the same bin do not overlap. If ", list("ignore.strand=FALSE"), ", ", " the two features cannot overlap if they are on different strands. ", " ")) list(list("On a GRangesList/GenomicRangesList object"), list(" ", " When they are supported on GRangesList object ", list("x"), ", the above inter ", " range transformations will apply the transformation to each of the list ", " elements in ", list("x"), " and return a list-like object ", list("parallel"), " to ", " ", list("x"), " (i.e. with 1 list element per list element in ", list("x"), "). ", " If ", list("x"), " has names on it, they're propagated to the returned object. ",
" "))
Seealso
The GenomicRanges and GRanges classes.
The IntegerRanges class in the IRanges package.
The inter-range-methods man page in the IRanges package.
GenomicRanges-comparison for comparing and ordering genomic ranges.
endoapply
in the S4Vectors package.
Author
H. Pagès and P. Aboyoun
Examples
gr <- GRanges(
seqnames=Rle(paste("chr", c(1, 2, 1, 3), sep=""), c(1, 3, 2, 4)),
ranges=IRanges(1:10, width=10:1, names=letters[1:10]),
strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
score=1:10,
GC=seq(1, 0, length=10)
)
gr
gr1 <- GRanges(seqnames="chr2", ranges=IRanges(3, 6),
strand="+", score=5L, GC=0.45)
gr2 <- GRanges(seqnames="chr1",
ranges=IRanges(c(10, 7, 19), width=5),
strand=c("+", "-", "+"), score=3:5, GC=c(0.3, 0.5, 0.66))
gr3 <- GRanges(seqnames=c("chr1", "chr2"),
ranges=IRanges(c(1, 4), c(3, 9)),
strand=c("-", "-"), score=c(6L, 2L), GC=c(0.4, 0.1))
grl <- GRangesList(gr1=gr1, gr2=gr2, gr3=gr3)
grl
## ---------------------------------------------------------------------
## range()
## ---------------------------------------------------------------------
## On a GRanges object:
range(gr)
range(gr, with.revmap=TRUE)
## On a GRangesList object:
range(grl)
range(grl, ignore.strand=TRUE)
range(grl, with.revmap=TRUE, ignore.strand=TRUE)
# ---------------------------------------------------------------------
## reduce()
## ---------------------------------------------------------------------
reduce(gr)
gr2 <- reduce(gr, with.revmap=TRUE)
revmap <- mcols(gr2)$revmap # an IntegerList
## Use the mapping from reduced to original ranges to group the original
## ranges by reduced range:
relist(gr[unlist(revmap)], revmap)
## Or use it to split the DataFrame of original metadata columns by
## reduced range:
relist(mcols(gr)[unlist(revmap), ], revmap) # a SplitDataFrameList
## [For advanced users] Use this reverse mapping to compare the reduced
## ranges with the ranges they originate from:
expanded_gr2 <- rep(gr2, elementNROWS(revmap))
reordered_gr <- gr[unlist(revmap)]
codes <- pcompare(expanded_gr2, reordered_gr)
## All the codes should translate to "d", "e", "g", or "h" (the 4 letters
## indicating that the range on the left contains the range on the right):
alphacodes <- rangeComparisonCodeToLetter(pcompare(expanded_gr2, reordered_gr))
stopifnot(all(alphacodes %in% c("d", "e", "g", "h")))
## On a big GRanges object with a lot of seqlevels:
mcols(gr) <- NULL
biggr <- c(gr, GRanges("chr1", IRanges(c(4, 1), c(5, 2)), strand="+"))
seqlevels(biggr) <- paste0("chr", 1:2000)
biggr <- rep(biggr, 25000)
set.seed(33)
seqnames(biggr) <- sample(factor(seqlevels(biggr), levels=seqlevels(biggr)),
length(biggr), replace=TRUE)
biggr2 <- reduce(biggr, with.revmap=TRUE)
revmap <- mcols(biggr2)$revmap
expanded_biggr2 <- rep(biggr2, elementNROWS(revmap))
reordered_biggr <- biggr[unlist(revmap)]
codes <- pcompare(expanded_biggr2, reordered_biggr)
alphacodes <- rangeComparisonCodeToLetter(pcompare(expanded_biggr2,
reordered_biggr))
stopifnot(all(alphacodes %in% c("d", "e", "g", "h")))
table(alphacodes)
## On a GRangesList object:
reduce(grl) # Doesn't really reduce anything but note the reordering
# of the inner elements in the 2nd and 3rd list elements:
# the ranges are reordered by sequence name first (which
# should appear in the same order as in 'seqlevels(grl)'),
# and then by strand.
reduce(grl, ignore.strand=TRUE) # 2nd list element got reduced
## ---------------------------------------------------------------------
## gaps()
## ---------------------------------------------------------------------
gaps(gr, start=1, end=10)
## ---------------------------------------------------------------------
## disjoin(), isDisjoint(), disjointBins()
## ---------------------------------------------------------------------
disjoin(gr)
disjoin(gr, with.revmap=TRUE)
disjoin(gr, with.revmap=TRUE, ignore.strand=TRUE)
isDisjoint(gr)
stopifnot(isDisjoint(disjoin(gr)))
disjointBins(gr)
stopifnot(all(sapply(split(gr, disjointBins(gr)), isDisjoint)))
## On a GRangesList object:
disjoin(grl) # doesn't really disjoin anything but note the reordering
disjoin(grl, with.revmap=TRUE)
intra_range_methods()
Intra range transformations of a GRanges or GRangesList object
Description
This man page documents list("intra range transformations") of a GenomicRanges object (i.e. of an object that belongs to the GenomicRanges class or one of its subclasses, this includes for example GRanges objects), or a GRangesList object.
See ?`` and
?in the list("IRanges") package for a quick introduction to list("intra range") and list("inter ", " range transformations") . list("Intra range") methods for [GAlignments](#galignments) and [GAlignmentsList](#galignmentslist) objects are defined and documented in the list("GenomicAlignments") package. See `?
for
list("inter range transformations") of a GenomicRanges or
GRangesList object.
Usage
list(list("shift"), list("GenomicRanges"))(x, shift=0L, use.names=TRUE)
list(list("narrow"), list("GenomicRanges"))(x, start=NA, end=NA, width=NA, use.names=TRUE)
list(list("resize"), list("GenomicRanges"))(x, width, fix="start", use.names=TRUE, ignore.strand=FALSE)
list(list("flank"), list("GenomicRanges"))(x, width, start=TRUE, both=FALSE, use.names=TRUE,
ignore.strand=FALSE)
list(list("promoters"), list("GenomicRanges"))(x, upstream=2000, downstream=200, use.names=TRUE)
list(list("restrict"), list("GenomicRanges"))(x, start=NA, end=NA, keep.all.ranges=FALSE, use.names=TRUE)
list(list("trim"), list("GenomicRanges"))(x, use.names=TRUE)
Arguments
Argument | Description |
---|---|
x | A GenomicRanges object. |
shift, use.names, start, end, width, both, fix, keep.all.ranges, |
| See ?`` . | |
ignore.strand|
TRUEor
FALSE. Whether the strand of the input ranges should be ignored or not. See details below. | |
list()| Additional arguments to methods. | ## Details * list() list(list("shift"), " behaves like the ", list("shift"), " method for ", " ", list("IntegerRanges"), " objects. See ", " ", list("?
", list("intra-range-methods"), ""), " for the details. ", " ") * list() list(list("narrow"), " on a ", list("GenomicRanges"), " object behaves ", " like on an ", list("IntegerRanges"), " object. See ", " ", list("?
", list("intra-range-methods"), ""), " for the details. ", " ", " A major difference though is that it returns a ", list("GenomicRanges"), " ", " object instead of an ", list("IntegerRanges"), " object. ", " The returned object is ", list("parallel"), " (i.e. same length and names) ", " to the original object ", list("x"), ". ", " ") * list() list(list("resize"), " returns an object of the same type and length as ", " ", list("x"), " containing intervals that have been resized to width ", " ", list("width"), " based on the ", list("strand(x)"), " values. Elements where ", " ", list("strand(x) == "+""), " or ", list("strand(x) == "*""), " are anchored at ", " ", list("start(x)"), " and elements where ", list("strand(x) == "-""), " are anchored ", " at the ", list("end(x)"), ". The ", list("use.names"), " argument determines whether ", " or not to keep the names on the ranges. ", " ") * list() list(list("flank"), " returns an object of the same type and length ", " as ", list("x"), " containing intervals of width ", list("width"), " that flank ", " the intervals in ", list("x"), ". The ", list("start"), " argument takes a ", " logical indicating whether ", list("x"), " should be flanked at the ", " "start" (", list("TRUE"), ") or the "end" (", list("FALSE"), "), which for ", " ", list("strand(x) != "-""), " is ", list("start(x)"), " and ", list("end(x)"), " ", " respectively and for ", list("strand(x) == "-""), " is ", list("end(x)"), " and ", " ", list("start(x)"), " respectively. The ", list("both"), " argument takes a ", " single logical value indicating whether the flanking region ", " ", list("width"), " positions extends ", list("into"), " the range. If ", " ", list("both=TRUE"), ", the resulting range thus straddles the end ", " point, with ", list("width"), " positions on either side. ", " ") * list() list(" ", " ", list("promoters"), " returns an object of the same type and length ", " as ", list("x"), " containing promoter ranges. Promoter ranges extend ", " around the transcription start site (TSS) which is defined as ", " ", list("start(x)"), " for ranges on the ", list("+"), " or ", list("*"), " strand ", " and as ", list("end(x)"), " for ranges on the ", list("-"), " strand. ", " The ", list("upstream"), " and ", list("downstream"), " arguments define the ", " number of nucleotides in the 5' and 3' direction, respectively. ", " More precisely, the output range is defined as ", " ", list(" (start(x) - upstream) to (start(x) + downstream - 1) ", " "), " ", " for ranges on the ", list("+"), " or ", list("*"), " strand, and as ", " ", list(" (end(x) - downstream + 1) to (end(x) + upstream) ", " "), " ", " for ranges on the ", list("-"), " strand. ", " ", " Note that the returned object might contain ", list("out-of-bound"), " ranges ", " i.e. ranges that start before the first nucleotide position and/or end ", " after the last nucleotide position of the underlying sequence. ", " ") * list() list(list("restrict"), " returns an object of the same type and length as ", " ", list("x"), " containing restricted ranges for distinct seqnames. The ", " ", list("start"), " and ", list("end"), " arguments can be a named numeric vector of ", " seqnames for the ranges to be resticted or a numeric vector or length 1 ", " if the restriction operation is to be applied to all the sequences in ", " ", list("x"), ". See ", list("?
", list("intra-range-methods"), "`"), " for more
",
" information about range restriction and for a description of the optional
", " arguments.
", " ")
list() list(list("trim"), " trims out-of-bound ranges located on non-circular
", " sequences whose length is not NA.
", " ")
## Seealso
GenomicRanges , GRanges , and GRangesList objects.
The intra-range-methods man page in the IRanges package.
The IntegerRanges class in the IRanges package.
## Author
P. Aboyoun and V. Obenchain vobencha@fhcrc.org
## Examples
r ## --------------------------------------------------------------------- ## A. ON A GRanges OBJECT ## --------------------------------------------------------------------- gr <- GRanges( seqnames=Rle(paste("chr", c(1, 2, 1, 3), sep=""), c(1, 3, 2, 4)), ranges=IRanges(1:10, width=10:1, names=letters[1:10]), strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score=1:10, GC=seq(1, 0, length=10) ) gr shift(gr, 1) narrow(gr[-10], start=2, end=-2) resize(gr, width=10) flank(gr, width=10) restrict(gr, start=3, end=7) gr <- GRanges("chr1", IRanges(rep(10, 3), width=6), c("+", "-", "*")) promoters(gr, 2, 2) ## --------------------------------------------------------------------- ## B. ON A GRangesList OBJECT ## --------------------------------------------------------------------- gr1 <- GRanges("chr2", IRanges(3, 6)) gr2 <- GRanges(c("chr1", "chr1"), IRanges(c(7,13), width=3), strand=c("+", "-")) gr3 <- GRanges(c("chr1", "chr2"), IRanges(c(1, 4), c(3, 9)), strand="-") grl <- GRangesList(gr1= gr1, gr2=gr2, gr3=gr3) grl resize(grl, width=20) flank(grl, width=20) restrict(grl, start=3)
makeGRangesFromDataFrame()
Make a GRanges object from a data.frame or DataFrame
Description
makeGRangesFromDataFrame
takes a data-frame-like object as
input and tries to automatically find the columns that describe
genomic ranges. It returns them as a GRanges object.
makeGRangesFromDataFrame
is also the workhorse behind the
coercion method from data.frame (or DataFrame ) to
GRanges .
Usage
makeGRangesFromDataFrame(df,
keep.extra.columns=FALSE,
ignore.strand=FALSE,
seqinfo=NULL,
seqnames.field=c("seqnames", "seqname",
"chromosome", "chrom",
"chr", "chromosome_name",
"seqid"),
start.field="start",
end.field=c("end", "stop"),
strand.field="strand",
starts.in.df.are.0based=FALSE)
Arguments
Argument | Description |
---|---|
df | A data.frame or DataFrame object. If not, then the function first tries to turn df into a data frame with as.data.frame(df) . |
keep.extra.columns | TRUE or FALSE (the default). If TRUE , the columns in df that are not used to form the genomic ranges of the returned GRanges object are then returned as metadata columns on the object. Otherwise, they are ignored. If df has a width column, then it's always ignored. |
ignore.strand | TRUE or FALSE (the default). If TRUE , then the strand of the returned GRanges object is set to "*" . |
seqinfo | Either NULL , or a Seqinfo object, or a character vector of seqlevels, or a named numeric vector of sequence lengths. When not NULL , it must be compatible with the genomic ranges in df i.e. it must include at least the sequence levels represented in df . |
seqnames.field | A character vector of recognized names for the column in df that contains the chromosome name (a.k.a. sequence name) associated with each genomic range. Only the first name in seqnames.field that is found in colnames(df) is used. If no one is found, then an error is raised. |
start.field | A character vector of recognized names for the column in df that contains the start positions of the genomic ranges. Only the first name in start.field that is found in colnames(df) is used. If no one is found, then an error is raised. |
end.field | A character vector of recognized names for the column in df that contains the end positions of the genomic ranges. Only the first name in start.field that is found in colnames(df) is used. If no one is found, then an error is raised. |
strand.field | A character vector of recognized names for the column in df that contains the strand associated with each genomic range. Only the first name in strand.field that is found in colnames(df) is used. If no one is found or if ignore.strand is TRUE , then the strand of the returned GRanges object is set to "*" . |
starts.in.df.are.0based | TRUE or FALSE (the default). If TRUE , then the start positions of the genomic ranges in df are considered to be 0-based and are converted to 1-based in the returned GRanges object. This feature is intended to make it more convenient to handle input that contains data obtained from resources using the "0-based start" convention. A notorious example of such resource is the UCSC Table Browser ( http://genome.ucsc.edu/cgi-bin/hgTables ). |
Value
A GRanges object with one element per row in the input.
If the seqinfo
argument was supplied, the returned object will
have exactly the seqlevels specified in seqinfo
and in the same
order. Otherwise, the seqlevels are ordered according to the output of
the rankSeqlevels
function (except if
df
contains the seqnames in the form of a factor-Rle, in which
case the levels of the factor-Rle become the seqlevels of the returned
object and with no re-ordering).
If df
has non-automatic row names (i.e. rownames(df)
is
not NULL
and is not seq_len(nrow(df))
), then they will be
used to set names on the returned GRanges object.
Seealso
GRanges objects.
Seqinfo objects and the
rankSeqlevels
function in the GenomeInfoDb package.The
makeGRangesListFromFeatureFragments
function for making a GRangesList object from a list of fragmented features.The
getTable
function in the rtracklayer package for an R interface to the UCSC Table Browser.DataFrame objects in the S4Vectors package.
Note
Coercing data.frame or DataFrame df
into
a GRanges object (with as(df, "GRanges")
), or
calling GRanges(df)
, are both equivalent to calling
makeGRangesFromDataFrame(df, keep.extra.columns=TRUE)
.
Author
H. Pagès, based on a proposal by Kasper Daniel Hansen
Examples
## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------
df <- data.frame(chr="chr1", start=11:15, end=12:16,
strand=c("+","-","+","*","."), score=1:5)
df
makeGRangesFromDataFrame(df) # strand value "." is replaced with "*"
## The strand column is optional:
df <- data.frame(chr="chr1", start=11:15, end=12:16, score=1:5)
makeGRangesFromDataFrame(df)
gr <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE)
gr2 <- as(df, "GRanges") # equivalent to the above
stopifnot(identical(gr, gr2))
gr2 <- GRanges(df) # equivalent to the above
stopifnot(identical(gr, gr2))
makeGRangesFromDataFrame(df, ignore.strand=TRUE)
makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
ignore.strand=TRUE)
makeGRangesFromDataFrame(df, seqinfo=paste0("chr", 4:1))
makeGRangesFromDataFrame(df, seqinfo=c(chrM=NA, chr1=500, chrX=100))
makeGRangesFromDataFrame(df, seqinfo=Seqinfo(paste0("chr", 4:1)))
## ---------------------------------------------------------------------
## ABOUT AUTOMATIC DETECTION OF THE seqnames/start/end/strand COLUMNS
## ---------------------------------------------------------------------
## Automatic detection of the seqnames/start/end/strand columns is
## case insensitive:
df <- data.frame(ChRoM="chr1", StarT=11:15, stoP=12:16,
STRAND=c("+","-","+","*","."), score=1:5)
makeGRangesFromDataFrame(df)
## It also ignores a common prefix between the start and end columns:
df <- data.frame(seqnames="chr1", tx_start=11:15, tx_end=12:16,
strand=c("+","-","+","*","."), score=1:5)
makeGRangesFromDataFrame(df)
## The common prefix between the start and end columns is used to
## disambiguate between more than one seqnames column:
df <- data.frame(chrom="chr1", tx_start=11:15, tx_end=12:16,
tx_chr="chr2", score=1:5)
makeGRangesFromDataFrame(df)
## ---------------------------------------------------------------------
## 0-BASED VS 1-BASED START POSITIONS
## ---------------------------------------------------------------------
if (require(rtracklayer)) {
session <- browserSession()
genome(session) <- "sacCer2"
query <- ucscTableQuery(session, "Assembly")
df <- getTable(query)
head(df)
## A common pitfall is to forget that the UCSC Table Browser uses the
## "0-based start" convention:
gr0 <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
start.field="chromStart",
end.field="chromEnd")
head(gr0)
## The start positions need to be converted into 1-based positions,
## to adhere to the convention used in Bioconductor:
gr1 <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
start.field="chromStart",
end.field="chromEnd",
starts.in.df.are.0based=TRUE)
head(gr1)
}
makeGRangesListFromDataFrame()
Make a GRangesList object from a data.frame or DataFrame
Description
makeGRangesListFromDataFrame
extends the
makeGRangesFromDataFrame functionality from
GenomicRanges
. It can take a data-frame-like object as input
and tries to automatically find the columns that describe the genomic
ranges. It returns a GRangesList object. This is different from the
makeGRangesFromDataFrame
function by requiring a
split.field
. The split.field
acts like the
"f" argument in the split
function. This factor
must be of the same length as the number of rows in the DataFrame
argument. The split.field
may also be a character vector.
Usage
makeGRangesListFromDataFrame(df,
split.field = NULL,
names.field = NULL,
...)
Arguments
Argument | Description |
---|---|
df | A DataFrame or data.frame class object |
split.field | A character string of a recognized column name in df that contains the grouping. This column defines how the rows of df are split and is typically a factor or character vector. When split.field is not provided the df will be split by the number of rows. |
names.field | An optional single character string indicating the name of the column in df that designates the names for the ranges in the elements of the GRangesList . |
... | Additional arguments passed on to makeGRangesFromDataFrame |
Value
A GRangesList of the same length as the number of levels or
unique character strings in the df
column indicated by
split.field
. When split.field
is not provided the df
is split by row and the resulting GRangesList has the
same length as nrow(df).
Names on the individual ranges are taken from the names.field
argument. Names on the outer list elements of the GRangesList
are propagated from split.field
.
Seealso
Author
M. Ramos
Examples
## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------
df <- data.frame(chr="chr1", start=11:15, end=12:16,
strand=c("+","-","+","*","."), score=1:5,
specimen = c("a", "a", "b", "b", "c"),
gene_symbols = paste0("GENE", letters[1:5]))
df
grl <- makeGRangesListFromDataFrame(df, split.field = "specimen",
names.field = "gene_symbols")
grl
names(grl)
## Keep metadata columns
makeGRangesListFromDataFrame(df, split.field = "specimen",
keep.extra.columns = TRUE)
nearest_methods()
Finding the nearest genomic range neighbor
Description
The nearest
, precede
, follow
, distance
and distanceToNearest
methods for GenomicRanges
objects and subclasses.
Usage
list(list("precede"), list("GenomicRanges,GenomicRanges"))(x, subject,
select=c("first", "all"), ignore.strand=FALSE)
list(list("precede"), list("GenomicRanges,missing"))(x, subject,
select=c("first", "all"), ignore.strand=FALSE)
list(list("follow"), list("GenomicRanges,GenomicRanges"))(x, subject,
select=c("last", "all"), ignore.strand=FALSE)
list(list("follow"), list("GenomicRanges,missing"))(x, subject,
select=c("last", "all"), ignore.strand=FALSE)
list(list("nearest"), list("GenomicRanges,GenomicRanges"))(x, subject,
select=c("arbitrary", "all"), ignore.strand=FALSE)
list(list("nearest"), list("GenomicRanges,missing"))(x, subject,
select=c("arbitrary", "all"), ignore.strand=FALSE)
list(list("distanceToNearest"), list("GenomicRanges,GenomicRanges"))(x, subject,
ignore.strand=FALSE, ...)
list(list("distanceToNearest"), list("GenomicRanges,missing"))(x, subject,
ignore.strand=FALSE, ...)
list(list("distance"), list("GenomicRanges,GenomicRanges"))(x, y,
ignore.strand=FALSE, ...)
Arguments
Argument | Description |
---|---|
x | The query GenomicRanges instance. |
subject | The subject GenomicRanges instance within which the nearest neighbors are found. Can be missing, in which case x is also the subject. |
y | For the distance method, a GRanges instance. Cannot be missing. If x and y are not the same length, the shortest will be recycled to match the length of the longest. |
select | Logic for handling ties. By default, all methods select a single interval (arbitrary for nearest , the first by order in subject for precede , and the last for follow ). When select="all" a Hits object is returned with all matches for x . |
ignore.strand | A logical indicating if the strand of the input ranges should be ignored. When TRUE , strand is set to '+' . |
... | Additional arguments for methods. |
Details
list("nearest: ") list(" ", " Performs conventional nearest neighbor finding. ", " Returns an integer vector containing the index of the nearest neighbor ", " range in ", list("subject"), " for each range in ", list("x"), ". If there is no ", " nearest neighbor ", list("NA"), " is returned. For details of the algorithm ", " see the man page in the ", list("IRanges"), " package (", list("?nearest"), "). ", " ")
list("precede: ") list(" ", " For each range in ", list("x"), ", ", list("precede"), " returns ", " the index of the range in ", list("subject"), " that is directly ", " preceded by the range in ", list("x"), ". Overlapping ranges are excluded. ", " ", list("NA"), " is returned when there are no qualifying ranges in ", " ", list("subject"), ". ", " ")
list("follow: ") list(" ", " The opposite of ", list("precede"), ", ", list("follow"), " returns ", " the index of the range in ", list("subject"), " that is directly followed by the ", " range in ", list("x"), ". Overlapping ranges are excluded. ", list("NA"), " is returned ", " when there are no qualifying ranges in ", list("subject"), ". ", " ")
list("Orientation and strand for ", list("precede"), " and ", list("follow"), ": ") list(" ", " Orientation is 5' to 3', consistent with the direction of translation. ", " Because positional numbering along a chromosome is from left to ", " right and transcription takes place from 5' to 3', ", list("precede"), " and ", " ", list("follow"), " can appear to have
opposite' behavior on the ", list("+"), " ", " and ", list("-"), " strand. Using positions 5 and 6 as an example, 5 precedes ", " 6 on the ", list("+"), " strand but follows 6 on the ", | list("-"), " strand. ", " ", " The table below outlines the orientation when ranges on different ", " strands are compared. In general, a feature on ", list("*"), " is considered ", " to belong to both strands. The single exception is when both ", list("x"), " ", " and ", list("subject"), " are ", list("*"), " in which case both are treated as ", list("+"), ". ", list(" ", " x | subject | orientation ", " -----+-----------+---------------- ", "a) + | + | ---> ", | | "b) + | - | NA ", "c) + | * | ---> ", "d) - | + | NA ", "e) - | - | <--- ", "f) - | * | <--- ", "g) * | + | ---> ", "h) * | - | <--- ", "i) * | * | ---> (the only situation where * arbitrarily means +) "), " ", " ", " ") | * list("distanceToNearest: ") list("Returns the distance for each range in ", list("x"), " ", " to its nearest neighbor in the ", list("subject"), ". ", " ") * list("distance: ") list(" ", " Returns the distance for each range in ", list("x"), " to the range in ", list("y"), ". ", " The behavior of ", list("distance"), " has changed in Bioconductor 2.12. ", " See the man page ", list("?distance"), " in the ", list("IRanges"), " package for ", " details. ", " ") ## Value For
nearest,
precedeand
follow, an integer vector of indices in
subject, or a [Hits](#hits) if
select="all". For
distanceToNearest, a [Hits](#hits) object with a column for the
queryindex (queryHits),
subjectindex (subjectHits) and the
distancebetween the pair. For
distance, an integer vector of distances between the ranges in
xand
y` . ## Seealso The GenomicRanges and GRanges classes.
The IntegerRanges class in the IRanges package.
The Hits class in the S4Vectors package.
The nearest-methods man page in the IRanges package.
findOverlaps-methods for finding just the overlapping ranges.
The nearest-methods man page in the GenomicFeatures package. ## Author P. Aboyoun and V. Obenchain ## Examplesr ## ----------------------------------------------------------- ## precede() and follow() ## ----------------------------------------------------------- query <- GRanges("A", IRanges(c(5, 20), width=1), strand="+") subject <- GRanges("A", IRanges(rep(c(10, 15), 2), width=1), strand=c("+", "+", "-", "-")) precede(query, subject) follow(query, subject) strand(query) <- "-" precede(query, subject) follow(query, subject) ## ties choose first in order query <- GRanges("A", IRanges(10, width=1), c("+", "-", "*")) subject <- GRanges("A", IRanges(c(5, 5, 5, 15, 15, 15), width=1), rep(c("+", "-", "*"), 2)) precede(query, subject) precede(query, rev(subject)) ## ignore.strand=TRUE treats all ranges as '+' precede(query[1], subject[4:6], select="all", ignore.strand=FALSE) precede(query[1], subject[4:6], select="all", ignore.strand=TRUE) ## ----------------------------------------------------------- ## nearest() ## ----------------------------------------------------------- ## When multiple ranges overlap an "arbitrary" range is chosen query <- GRanges("A", IRanges(5, 15)) subject <- GRanges("A", IRanges(c(1, 15), c(5, 19))) nearest(query, subject) ## select="all" returns all hits nearest(query, subject, select="all") ## Ranges in 'x' will self-select when 'subject' is present query <- GRanges("A", IRanges(c(1, 10), width=5)) nearest(query, query) ## Ranges in 'x' will not self-select when 'subject' is missing nearest(query) ## ----------------------------------------------------------- ## distance(), distanceToNearest() ## ----------------------------------------------------------- ## Adjacent, overlap, separated by 1 query <- GRanges("A", IRanges(c(1, 2, 10), c(5, 8, 11))) subject <- GRanges("A", IRanges(c(6, 5, 13), c(10, 10, 15))) distance(query, subject) ## recycling distance(query[1], subject) ## zero-width ranges zw <- GRanges("A", IRanges(4,3)) stopifnot(distance(zw, GRanges("A", IRanges(3,4))) == 0L) sapply(-3:3, function(i) distance(shift(zw, i), GRanges("A", IRanges(4,3)))) query <- GRanges(c("A", "B"), IRanges(c(1, 5), width=1)) distanceToNearest(query, subject) ## distance() with GRanges and TxDb see the ## ?'distance,GenomicRanges,TxDb-method' man ## page in the GenomicFeatures package.
phicoef()
Calculate the "phi coefficient" between two binary variables
Description
The phicoef
function calculates the "phi coefficient" between
two binary variables.
Usage
phicoef(x, y=NULL)
Arguments
Argument | Description |
---|---|
x, y | Two logical vectors of the same length. If y is not supplied, x must be a 2x2 integer matrix (or an integer vector of length 4) representing the contingency table of two binary variables. |
Value
The "phi coefficient" between the two binary variables. This is a single numeric value ranging from -1 to +1.
Author
H. Pagès
References
http://en.wikipedia.org/wiki/Phi_coefficient
Examples
set.seed(33)
x <- sample(c(TRUE, FALSE), 100, replace=TRUE)
y <- sample(c(TRUE, FALSE), 100, replace=TRUE)
phicoef(x, y)
phicoef(rep(x, 10), c(rep(x, 9), y))
stopifnot(phicoef(table(x, y)) == phicoef(x, y))
stopifnot(phicoef(y, x) == phicoef(x, y))
stopifnot(phicoef(x, !y) == - phicoef(x, y))
stopifnot(phicoef(x, x) == 1)
setops_methods()
Set operations on genomic ranges
Description
Performs set operations on GRanges and GRangesList objects.
NOTE: The punion
, pintersect
,
psetdiff
, and pgap
generic
functions and methods for IntegerRanges objects are defined
and documented in the IRanges package.
Usage
## Vector-wise set operations
## --------------------------
list(list("union"), list("GenomicRanges,GenomicRanges"))(x, y, ignore.strand=FALSE)
list(list("intersect"), list("GenomicRanges,GenomicRanges"))(x, y, ignore.strand=FALSE)
list(list("setdiff"), list("GenomicRanges,GenomicRanges"))(x, y, ignore.strand=FALSE)
## Element-wise (aka "parallel") set operations
## --------------------------------------------
list(list("punion"), list("GRanges,GRanges"))(x, y, fill.gap=FALSE, ignore.strand=FALSE)
list(list("pintersect"), list("GRanges,GRanges"))(x, y, drop.nohit.ranges=FALSE,
ignore.strand=FALSE, strict.strand=FALSE)
list(list("psetdiff"), list("GRanges,GRanges"))(x, y, ignore.strand=FALSE)
Arguments
Argument | Description |
---|---|
x, y | For union , intersect , and setdiff : 2 GenomicRanges objects or 2 GRangesList objects. For punion and pintersect : 2 GRanges objects, or 1 GRanges object and 1 GRangesList object. For psetdiff : x must be a GRanges object and y can be a GRanges or GRangesList object. For pgap : 2 GRanges objects. In addition, for the parallel operations, x and y must be of equal length (i.e. length(x) == length(y) ). |
fill.gap | Logical indicating whether or not to force a union by using the rule start = min(start(x), start(y)), end = max(end(x), end(y)) . |
ignore.strand | For set operations: If set to TRUE, then the strand of x and y is set to "*" prior to any computation. For parallel set operations: If set to TRUE, the strand information is ignored in the computation and the result has the strand information of x . |
drop.nohit.ranges | If TRUE then elements in x that don't intersect with their corresponding element in y are removed from the result (so the returned object is no more parallel to the input). If FALSE (the default) then nothing is removed and a hit metadata column is added to the returned object to indicate elements in x that intersect with the corresponding element in y . For those that don't, the reported intersection is a zero-width range that has the same start as x . |
strict.strand | If set to FALSE (the default), features on the "*" strand are treated as occurring on both the "+" and "-" strand. If set to TRUE, the strand of intersecting elements must be strictly the same. |
Details
The pintersect
methods involving GRanges and/or
GRangesList objects use the triplet (sequence name, range, strand)
to determine the element by element intersection of features, where a
strand value of "*"
is treated as occurring on both the "+"
and "-"
strand (unless strict.strand
is set to TRUE, in
which case the strand of intersecting elements must be strictly the same).
The psetdiff
methods involving GRanges and/or
GRangesList objects use the triplet (sequence name, range,
strand) to determine the element by element set difference of features,
where a strand value of "*"
is treated as occurring on both the
"+"
and "-"
strand.
Value
For union
, intersect
, and setdiff
: a GRanges
object if x
and y
are GenomicRanges objects,
and a GRangesList object if they are GRangesList objects.
For punion
and pintersect
: when x
or y
is
not a GRanges object, an object of the same class as this
non- GRanges object. Otherwise, a GRanges object.
For psetdiff
: either a GRanges object when both x
and y
are GRanges objects, or a GRangesList object
when y
is a GRangesList object.
For pgap
: a GRanges object.
Seealso
setops-methods in the IRanges package for set operations on IntegerRanges and IntegerRangesList objects.
findOverlaps-methods for finding/counting overlapping genomic ranges.
intra-range-methods and inter-range-methods for intra range and inter range transformations of a GRanges object.
GRanges and GRangesList objects.
mendoapply
in the S4Vectors package.
Author
P. Aboyoun and H. Pagès
Examples
## ---------------------------------------------------------------------
## A. SET OPERATIONS
## ---------------------------------------------------------------------
x <- GRanges("chr1", IRanges(c(2, 9) , c(7, 19)), strand=c("+", "-"))
y <- GRanges("chr1", IRanges(5, 10), strand="-")
union(x, y)
union(x, y, ignore.strand=TRUE)
intersect(x, y)
intersect(x, y, ignore.strand=TRUE)
setdiff(x, y)
setdiff(x, y, ignore.strand=TRUE)
## With 2 GRangesList objects:
gr1 <- GRanges(seqnames="chr2",
ranges=IRanges(3, 6))
gr2 <- GRanges(seqnames=c("chr1", "chr1"),
ranges=IRanges(c(7,13), width = 3),
strand=c("+", "-"))
gr3 <- GRanges(seqnames=c("chr1", "chr2"),
ranges=IRanges(c(1, 4), c(3, 9)),
strand=c("-", "-"))
grlist <- GRangesList(gr1=gr1, gr2=gr2, gr3=gr3)
union(grlist, shift(grlist, 3))
intersect(grlist, shift(grlist, 3))
setdiff(grlist, shift(grlist, 3))
## Sanity checks:
grlist2 <- shift(grlist, 3)
stopifnot(identical(
union(grlist, grlist2),
mendoapply(union, grlist, grlist2)
))
stopifnot(identical(
intersect(grlist, grlist2),
mendoapply(intersect, grlist, grlist2)
))
stopifnot(identical(
setdiff(grlist, grlist2),
mendoapply(setdiff, grlist, grlist2)
))
## ---------------------------------------------------------------------
## B. PARALLEL SET OPERATIONS
## ---------------------------------------------------------------------
punion(x, shift(x, 6))
punion(x, shift(x, 7)) # will fail
punion(x, shift(x, 7), fill.gap=TRUE)
pintersect(x, shift(x, 6))
pintersect(x, shift(x, 7))
psetdiff(x, shift(x, 7))
## ---------------------------------------------------------------------
## C. MORE EXAMPLES
## ---------------------------------------------------------------------
## GRanges object:
gr <- GRanges(seqnames=c("chr2", "chr1", "chr1"),
ranges=IRanges(1:3, width = 12),
strand=Rle(strand(c("-", "*", "-"))))
## Parallel intersection of a GRanges and a GRangesList object
pintersect(gr, grlist)
pintersect(grlist, gr)
## For a fast 'mendoapply(intersect, grlist, as(gr, "GRangesList"))'
## call pintersect() with 'strict.strand=TRUE' and call reduce() on
## the result with 'drop.empty.ranges=TRUE':
reduce(pintersect(grlist, gr, strict.strand=TRUE),
drop.empty.ranges=TRUE)
## Parallel set difference of a GRanges and a GRangesList object
psetdiff(gr, grlist)
strand_utils()
Strand utilities
Description
A bunch of useful strand
and invertStrand
methods.
Usage
list(list("strand"), list("missing"))(x)
list(list("strand"), list("character"))(x)
list(list("strand"), list("factor"))(x)
list(list("strand"), list("integer"))(x)
list(list("strand"), list("logical"))(x)
list(list("strand"), list("Rle"))(x)
list(list("strand"), list("RleList"))(x)
list(list("strand"), list("DataTable"))(x)
list(list("strand"), list("DataTable,ANY"))(x) <- value
list(list("invertStrand"), list("character"))(x)
list(list("invertStrand"), list("factor"))(x)
list(list("invertStrand"), list("integer"))(x)
list(list("invertStrand"), list("logical"))(x)
list(list("invertStrand"), list("Rle"))(x)
list(list("invertStrand"), list("RleList"))(x)
Arguments
Argument | Description |
---|
|x
| The object from which to obtain a list("strand factor") , list("strand
", " factor ", list("Rle")) , or list("strand factor ", list("RleList")) object. Can be missing. See Details and Value sections below for more information. |
|value
| Replacement value for the strand. |
Details
All the strand
and invertStrand
methods documented
here return either a list("strand factor") , list("strand factor
", " ", list("Rle")) , or list("strand factor ", list("RleList")) object.
These are factor, factor- Rle , or factor- RleList
objects containing the "standard strand levels" (i.e. +
, -
,
and *
) and no NAs.
Value
All the strand
and invertStrand
methods documented here
return an object that is parallel to input object x
when
x
is a character, factor, integer, logical, Rle ,
or RleList object.
For the strand
methods:
If
x
is missing, returns an empty factor with the "standard strand levels" i.e.+
,-
, and*
.If
x
is a character vector or factor, it is coerced to a factor with the levels listed above.NA
values inx
are not accepted.If
x
is an integer vector, it is coerced to a factor with the levels listed above.1
,-1
, andNA
values inx
are mapped to the+
,-
, and*
levels respectively.If
x
is a logical vector, it is coerced to a factor with the levels listed above.FALSE
,TRUE
, andNA
values inx
are mapped to the+
,-
, and*
levels respectively.If
x
is a character-, factor-, integer-, or logical- Rle , it is transformed withrunValue(x) <- strand(runValue(x))
and returned.If
x
is an RleList object, each list element inx
is transformed by callingstrand()
on it and the resulting RleList object is returned. More precisely the returned object isendoapply(x, strand)
. Note that in addition to being parallel tox
, this object also has the same shape asx
(i.e. its list elements have the same lengths as inx
).If
x
inherits fromDataTable
, the"strand"
column is passed thrustrand()
and returned. Ifx
has no"strand"
column, this return value is populated with*
s.
Each invertStrand
method returns the same object as its corresponding
strand
method but with "+"
and "-"
switched.
Seealso
Author
M. Lawrence and H. Pagès
Examples
strand()
x1 <- c("-", "*", "*", "+", "-", "*")
x2 <- factor(c("-", "-", "+", "-"))
x3 <- c(-1L, NA, NA, 1L, -1L, NA)
x4 <- c(TRUE, NA, NA, FALSE, TRUE, NA)
strand(x1)
invertStrand(x1)
strand(x2)
invertStrand(x2)
strand(x3)
invertStrand(x3)
strand(x4)
invertStrand(x4)
strand(Rle(x1))
invertStrand(Rle(x1))
strand(Rle(x2))
invertStrand(Rle(x2))
strand(Rle(x3))
invertStrand(Rle(x3))
strand(Rle(x4))
invertStrand(Rle(x4))
x5 <- RleList(x1, character(0), as.character(x2))
strand(x5)
invertStrand(x5)
strand(DataFrame(score=2:-3))
strand(DataFrame(score=2:-3, strand=x3))
strand(DataFrame(score=2:-3, strand=Rle(x3)))
## Sanity checks:
target <- strand(x1)
stopifnot(identical(target, strand(x3)))
stopifnot(identical(target, strand(x4)))
stopifnot(identical(Rle(strand(x1)), strand(Rle(x1))))
stopifnot(identical(Rle(strand(x2)), strand(Rle(x2))))
stopifnot(identical(Rle(strand(x3)), strand(Rle(x3))))
stopifnot(identical(Rle(strand(x4)), strand(Rle(x4))))
tileGenome()
Put (virtual) tiles on a given genome
Description
tileGenome
returns a set of genomic regions that form a
partitioning of the specified genome. Each region is called a "tile".
Usage
tileGenome(seqlengths, ntile, tilewidth, cut.last.tile.in.chrom=FALSE)
Arguments
Argument | Description |
---|---|
seqlengths | Either a named numeric vector of chromosome lengths or a Seqinfo object. More precisely, if a named numeric vector, it must have a length >= 1, cannot contain NAs or negative values, and cannot have duplicated names. If a Seqinfo object, then it's first replaced with the vector of sequence lengths stored in the object (extracted from the object with the seqlengths getter), then the restrictions described previously apply to this vector. |
ntile | The number of tiles to generate. |
tilewidth | The desired tile width. The effective tile width might be slightly different but is guaranteed to never be more than the desired width. |
cut.last.tile.in.chrom | Whether or not to cut the last tile in each chromosome. This is set to FALSE by default. Can be set to TRUE only when tilewidth is specified. In that case, a tile will never overlap with more than 1 chromosome and a GRanges object is returned with one element (i.e. one genomic range) per tile. |
Value
If cut.last.tile.in.chrom
is FALSE
(the default),
a GRangesList object with one list element per tile, each of
them containing a number of genomic ranges equal to the number of
chromosomes it overlaps with. Note that when the tiles are small (i.e.
much smaller than the chromosomes), most of them only overlap with a
single chromosome.
If cut.last.tile.in.chrom
is TRUE
, a GRanges
object with one element (i.e. one genomic range) per tile.
Seealso
genomicvars for an example of how to compute the binned average of a numerical variable defined along a genome.
GRangesList and GRanges objects.
Seqinfo objects and the
seqlengths
getter.IntegerList objects.
Views objects.
Author
H. Pagès, based on a proposal by M. Morgan
Examples
## ---------------------------------------------------------------------
## A. WITH A TOY GENOME
## ---------------------------------------------------------------------
seqlengths <- c(chr1=60, chr2=20, chr3=25)
## Create 5 tiles:
tiles <- tileGenome(seqlengths, ntile=5)
tiles
elementNROWS(tiles) # tiles 3 and 4 contain 2 ranges
width(tiles)
## Use sum() on this IntegerList object to get the effective tile
## widths:
sum(width(tiles)) # each tile covers exactly 21 genomic positions
## Create 9 tiles:
tiles <- tileGenome(seqlengths, ntile=9)
elementNROWS(tiles) # tiles 6 and 7 contain 2 ranges
table(sum(width(tiles))) # some tiles cover 12 genomic positions,
# others 11
## Specify the tile width:
tiles <- tileGenome(seqlengths, tilewidth=20)
length(tiles) # 6 tiles
table(sum(width(tiles))) # effective tile width is <= specified
## Specify the tile width and cut the last tile in each chromosome:
tiles <- tileGenome(seqlengths, tilewidth=24,
cut.last.tile.in.chrom=TRUE)
tiles
width(tiles) # each tile covers exactly 24 genomic positions, except
# the last tile in each chromosome
## Partition a genome by chromosome ("natural partitioning"):
tiles <- tileGenome(seqlengths, tilewidth=max(seqlengths),
cut.last.tile.in.chrom=TRUE)
tiles # one tile per chromosome
## sanity check
stopifnot(all.equal(setNames(end(tiles), seqnames(tiles)), seqlengths))
## ---------------------------------------------------------------------
## B. WITH A REAL GENOME
## ---------------------------------------------------------------------
library(BSgenome.Scerevisiae.UCSC.sacCer2)
tiles <- tileGenome(seqinfo(Scerevisiae), ntile=20)
tiles
tiles <- tileGenome(seqinfo(Scerevisiae), tilewidth=50000,
cut.last.tile.in.chrom=TRUE)
tiles
## ---------------------------------------------------------------------
## C. AN APPLICATION: COMPUTE THE BINNED AVERAGE OF A NUMERICAL VARIABLE
## DEFINED ALONG A GENOME
## ---------------------------------------------------------------------
## See '?genomicvars' for an example of how to compute the binned
## average of a numerical variable defined along a genome.
tile_methods()
Generate windows for a GenomicRanges
Description
tile
and slidingWindows
methods for GenomicRanges . tile
partitions each range
into a set of tiles, which are defined in terms of their number or
width. slidingWindows
generates sliding windows of a specified
width and frequency.
Usage
list(list("tile"), list("GenomicRanges"))(x, n, width)
list(list("slidingWindows"), list("GenomicRanges"))(x, width, step=1L)
Arguments
Argument | Description |
---|---|
x | A GenomicRanges object, like a GRanges . |
n | The number of tiles to generate. See ? in the IRanges package for more information about this argument. |
width | The (maximum) width of each tile. See ? in the IRanges package for more information about this argument. |
step | The distance between the start positions of the sliding windows. |
Details
The tile
function splits x
into a GRangesList
,
each element of which corresponds
to a tile, or partition, of x
. Specify the tile geometry with either
n
or width
(not both). Passing n
creates n
tiles
of approximately equal width, truncated by sequence end, while passing
width
tiles the region with ranges of the given width, again truncated
by sequence end.
The slidingWindows
function generates sliding windows within
each range of x
, according to width
and step
,
returning a GRangesList
. If the sliding windows do not exactly
cover a range in x
, the last window is partial.
Value
A GRangesList
object, each element of which corresponds to a window.
Seealso
tile
in the IRanges package.
Author
M. Lawrence
Examples
gr <- GRanges(
seqnames=Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges=IRanges(1:10, end=11),
strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
seqlengths=c(chr1=11, chr2=12, chr3=13))
# split every range in half
tiles <- tile(gr, n = 2L)
stopifnot(all(elementNROWS(tiles) == 2L))
# split ranges into subranges of width 2
# odd width ranges must contain one subrange of width 1
tiles <- tile(gr, width = 2L)
stopifnot(all(all(width(tiles) %in% c(1L, 2L))))
windows <- slidingWindows(gr, width=3L, step=2L)
width(windows[[1L]]) # last range is truncated