bioconductor v3.9.0 Rtracklayer
Extensible framework for interacting with multiple genome
Link to this section Summary
Functions
BEDFile objects
Export to BAM Files
Class "Bed15TrackLine"
BigWig Import and Export
Selection of ranges and columns
Lists of BrowserView
Chain objects
FastaFile objects
GFFFile objects
GRanges for a Genome
Data on a Genome
Genomic data selection
Ranges on a Genome
Quickload Genome Access
Quickload Access
RTLFile objects
TabixFile Import/Export
Track Databases
2bit Files
Class "UCSCData"
UCSCFile objects
UCSC Schema
Querying UCSC Tables
WIG Import and Export
Accessing the active view
Coerce to BED structure
Coerce to GFF structure
Class "BasicTrackLine"
Get blocks/exons
Browse a genome
Class "BrowserSession"
Get a genome browser session
Class "BrowserView"
Getting browser views
Getting the browser views
CPNE1 SNP track
Import and export
Get available genome browsers
Load a sequence
Laying tracks
Lift intervals between genome builds
Reads a file in GFF format
microRNA target sites
Accessing track names
Get available genomes on UCSC
Class "UCSCSession"
Class "TrackLine"
Class "UCSCTrackModes"
Accessing UCSC track modes
Class "UCSCView"
Convert WIG to BigWig
Class "GraphTrackLine"
Link to this section Functions
BEDFile_class()
BEDFile objects
Description
These functions support the import and export of the UCSC BED format and its variants, including BEDGraph.
Usage
list(list("import"), list("BEDFile,ANY,ANY"))(con, format, text, trackLine = TRUE,
genome = NA, colnames = NULL,
which = NULL, seqinfo = NULL, extraCols = character(),
sep = c(" ", ""))
import.bed(con, ...)
import.bed15(con, ...)
import.bedGraph(con, ...)
list(list("export"), list("ANY,BEDFile,ANY"))(object, con, format, ...)
list(list("export"), list("GenomicRanges,BEDFile,ANY"))(object, con, format,
append = FALSE, index = FALSE,
ignore.strand = FALSE, trackLine = NULL)
list(list("export"), list("UCSCData,BEDFile,ANY"))(object, con, format,
trackLine = TRUE, ...)
export.bed(object, con, ...)
export.bed15(object, con, ...)
list(list("export"), list("GenomicRanges,BED15File,ANY"))(object, con, format,
expNames = NULL, trackLine = NULL, ...)
export.bedGraph(object, con, ...)
Arguments
Argument | Description |
---|---|
con | A path, URL, connection or BEDFile object. For the functions ending in .bed , .bedGraph and .bed15 , the file format is indicated by the function name. For the base export and import functions, the format must be indicated another way. If con is a path, URL or connection, either the file extension or the format argument needs to be one of bed , bed15 , bedGraph , bedpe , narrowPeak , or broadPeak . Compressed files ( gz , bz2 and xz ) are handled transparently. |
object | The object to export, should be a GRanges or something coercible to a GRanges . If targeting the BEDPE format, this should be something coercible to Pairs . If the object has a method for asBED (like GRangesList ), it is called prior to coercion. This makes it possible to export a GRangesList or TxDb in a way that preserves the hierarchical structure. For exporting multiple tracks, in the UCSC track line metaformat, pass a GenomicRangesList , or something coercible to one. |
format | If not missing, should be one of bed , bed15 , bedGraph , bedpe , narrowPeak or broadPeak . |
text | If con is missing, a character vector to use as the input |
trackLine | For import, an imported track line will be stored in a TrackLine object, as part of the returned UCSCData . For the UCSCData method on export, whether to output the UCSC track line stored on the object, for the other export methods, the actual TrackLine object to export. |
genome | The identifier of a genome, or a Seqinfo , or NA if unknown. Typically, this is a UCSC identifier like hg19 . An attempt will be made to derive the seqinfo on the return value using either an installed BSgenome package or UCSC, if network access is available. |
colnames | A character vector naming the columns to parse. These should name columns in the result, not those in the BED spec, so e.g. specify thick , instead of thickStart . |
which | A GRanges or other range-based object supported by findOverlaps . Only the intervals in the file overlapping the given ranges are returned. This is much more efficient when the file is indexed with the tabix utility. |
index | If TRUE , automatically compress and index the output file with bgzf and tabix. Note that tabix indexing will sort the data by chromosome and start. Tabix supports a single track in a file. |
ignore.strand | Whether to output the strand when not required (by the existence of later fields). |
seqinfo | If not NULL , the Seqinfo object to set on the result. Ignored if genome is a Seqinfo object. If the genome argument is not NA , it must agree with genome(seqinfo) . |
extraCols | A character vector in the same form as colClasses from read.table . It should indicate the name and class of each extra/special column to read from the BED file. As BED does not encode column names, these are assumed to be the last columns in the file. This enables parsing of the various BEDX+Y formats. |
sep | A character vector with a single character indicating the field separator, like read.table . This defaults to " " , as BEDtools requires, but BED files are also allowed to be whitespace separated ( "" ) according to the UCSC spec. |
append | If TRUE , and con points to a file path, the data is appended to the file. Obviously, if con is a connection, the data is always appended. |
expNames | character vector naming columns in mcols(object) to export as data columns in the BED15 file. These correspond to the sample names in the experiment. If NULL (the default), there is an attempt to extract these from trackLine . If the attempt fails, no scores are exported. |
... | Arguments to pass down to methods to other methods. For import, the flow eventually reaches the BEDFile method on import . When trackLine is TRUE or the target format is BED15, the arguments are passed through export.ucsc , so track line parameters are supported. |
Details
The BED format is a tab-separated table of intervals, with annotations like name, score and even sub-intervals for representing alignments and gene models. Official (UCSC) child formats currently include BED15 (adding a number matrix for e.g. expression data across multiple samples) and BEDGraph (a compressed means of storing a single score variable, e.g. coverage; overlapping features are not allowed). Many tools and organizations have extended the BED format with additional columns for particular use cases. The advantage of BED is its balance between simplicity and expressiveness. It is also relatively scalable, because only the first three columns (chrom, start and end) are required. Thus, BED is best suited for representing simple features. For specialized cases, one is usually better off with another format. For example, genome-scale vectors belong in BigWig , alignments from high-throughput sequencing belong in BAM , and gene models are more richly expressed in GFF .
The following is the mapping of BED elements to a GRanges
object.
NA values are allowed only where indicated.
These appear as a list(".") in the file. Only the first three columns
(chrom, start and strand) are required. The other columns can only be
included if all previous columns (to the left) are included. Upon export,
default values are used to automatically pad the table, if necessary.
list("
", " ", list(list("chrom, start, end"), list("the ", list("ranges"), " component.")), "
", " ", list(list("name"), list("character vector (NA's allowed) in the ", list("name"), "
", " column; defaults to NA on export.
", " ")), "
", " ", list(list("score"), list("numeric vector in the ", list("score"), "
", " column, accessible via the ", list("score"), " accessor. Defaults to 0
", " on export. This is the only column present in BEDGraph (besides
", " chrom, start and end), and it is required.
",
" ")), "
", " ", list(list("strand"), list("strand factor (NA's allowed) in the ", list("strand"), " ", " column, accessible via the ", list("strand"), " accessor; defaults to NA ", " on export. ", " ")), " ", " ", list(list("thickStart, thickEnd"), list(list("IntegerRanges"), " object in a ", " column named ", list("thick"), "; defaults to the ranges of the feature ", " on export. ", " ")), " ", " ", list(list("itemRgb"), list("an integer matrix of color codes, as returned by ",
" ", list(list("col2rgb")), ", or any valid input to
", " ", list(list("col2rgb")), ", in the ", list("itemRgb"), " column; default is NA ", " on export, which translates to black. ", " ")), " ", " ", list(list("blockSizes, blockStarts, blockCounts"), list(list("IntegerRangesList"), " ", " object in a column named ", list("blocks"), "; defaults to empty upon BED15 ", " export. ", " ")), " ", " ")
For BED15 files, there should be a column of scores in
mcols(object)
for each sample in the experiment. The columns
are named according to the expNames
(found in the file, or
passed as an argument during export). NA
scores are stored as
list("-10000") in the file.
Value
For a bedpe file, a Pairs
object combining two
GRanges
. The name
and score
are carried over to
the metadata columns.
Otherwise, a GRanges
with the metadata columns described in the
details.
Author
Michael Lawrence
References
http://genome.ucsc.edu/goldenPath/help/customTrack.html http://bedtools.readthedocs.org/en/latest/content/general-usage.html
Examples
test_path <- system.file("tests", package = "rtracklayer")
test_bed <- file.path(test_path, "test.bed")
test <- import(test_bed)
test
test_bed_file <- BEDFile(test_bed)
import(test_bed_file)
test_bed_con <- file(test_bed)
import(test_bed_con, format = "bed")
import(test_bed, trackLine = FALSE)
import(test_bed, genome = "hg19")
import(test_bed, colnames = c("name", "strand", "thick"))
which <- GRanges("chr7:1-127473000")
import(test_bed, which = which)
bed15_file <- file.path(test_path, "test.bed15")
bed15 <- import(bed15_file)
test_bed_out <- file.path(tempdir(), "test.bed")
export(test, test_bed_out)
test_bed_out_file <- BEDFile(test_bed_out)
export(test, test_bed_out_file)
export(test, test_bed_out, name = "Alternative name")
test_bed_gz <- paste(test_bed_out, ".gz", sep = "")
export(test, test_bed_gz)
export(test, test_bed_out, index = TRUE)
export(test, test_bed_out, index = TRUE, trackLine = FALSE)
bed_text <- export(test, format = "bed")
test <- import(format = "bed", text = bed_text)
test_bed15_out <- file.path(tempdir(), "test.bed15")
export(bed15, test_bed15_out) # UCSCData knows the expNames
export(as(bed15, "GRanges"), test_bed15_out, # have to specify expNames
expNames=paste0("breast_", c("A", "B", "C")))
BamFile_methods()
Export to BAM Files
Description
Methods for import and export of
GAlignments
or
GAlignmentPairs
objects from and to BAM
files, represented as BamFile
objects.
Usage
list(list("import"), list("BamFile,ANY,ANY"))(con, format, text, paired = FALSE,
use.names = FALSE,
param = ScanBamParam(...),
genome = NA_character_, ...)
list(list("export"), list("ANY,BamFile,ANY"))(object, con, format, ...)
Arguments
Argument | Description |
---|---|
object | The object to export, such as a GAlignments or GAlignmentPairs . |
con | A path, URL, connection or BamFile object. |
format | If not missing, should be bam . |
text | Not supported. |
paired | If TRUE , return a GAlignmentPairs object, otherwise a GAlignments. |
use.names | Whether to parse QNAME as the names on the result. |
param | The ScanBamParam object governing the import. |
genome | Single string or Seqinfo object identifying the genome |
list() | Arguments that are passed to ScanBamParam if param is missing. |
Details
BAM fields not formally present in the GAlignments[Pairs]
object
are extracted from the metadata columns, if present; otherwise, the
missing value, "." , is output. The file is sorted and
indexed. This can be useful for subsetting BAM files, although
filterBam
may eventually become flexible
enough to be the favored alternative.
Seealso
The readGAlignments
and
readGAlignmentPairs
functions
for reading BAM files.
Author
Michael Lawrence
Examples
library(Rsamtools)
ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
gal <- import(ex1_file, param=ScanBamParam(what="flag"))
gal.minus <- gal[strand(gal) == "-"]
export(gal, BamFile("ex1-minus.bam"))
Bed15TrackLine_class()
Class "Bed15TrackLine"
Description
A UCSC track line for graphical tracks.
Seealso
export.bed15
for exporting bed15 tracks.
Author
Michael Lawrence
References
Official documentation: http://genomewiki.ucsc.edu/index.php/Microarray_track .
BigWigFile()
BigWig Import and Export
Description
These functions support the import and export of the UCSC BigWig format, a compressed, binary form of WIG/BEDGraph with a spatial index and precomputed summaries. These functions do not work on Windows.
Usage
list(list("import"), list("BigWigFile,ANY,ANY"))(con, format, text,
selection = BigWigSelection(which, ...),
which = con,
as = c("GRanges", "RleList", "NumericList"), ...)
import.bw(con, ...)
list(list("export"), list("ANY,BigWigFile,ANY"))(object, con, format, ...)
list(list("export"), list("GenomicRanges,BigWigFile,ANY"))(object, con, format,
dataFormat = c("auto", "variableStep", "fixedStep",
"bedGraph"), compress = TRUE, fixedSummaries = FALSE)
export.bw(object, con, ...)
Arguments
Argument | Description |
---|---|
con | A path, URL or BigWigFile object. Connections are not supported. For the functions ending in .bw , the file format is indicated by the function name. For the export and import methods, the format must be indicated another way. If con is a path, or URL, either the file extension or the format argument needs to be bigWig or bw . |
object | The object to export, should be an RleList , IntegerList , NumericList , GRanges or something coercible to a GRanges . |
format | If not missing, should be bigWig or bw (case insensitive). |
text | Not supported. |
as | Specifies the class of the return object. Default is GRanges , which has one range per range in the file, and a score column holding the value for each range. For NumericList , one numeric vector is returned for each range in the selection argument. For RleList , there is one Rle per sequence, and that Rle spans the entire sequence. |
selection | A BigWigSelection object indicating the ranges to load. |
which | A range data structure coercible to IntegerRangesList , like a GRanges , or a BigWigFile . Only the intervals in the file overlapping the given ranges are returned. By default, the value is the BigWigFile itself. Its Seqinfo object is extracted and coerced to a IntegerRangesList that represents the entirety of the file. |
dataFormat | Probably best left to auto . Exists only for historical reasons. |
compress | If TRUE , compress the data. No reason to change this. |
fixedSummaries | If TRUE , compute summaries at fixed resolutions corresponding to the default zoom levels in the Ensembl genome browser (with some extrapolation): 30X, 65X, 130X, 260X, 450X, 648X, 950X, 1296X, 4800X, 19200X. Otherwise, the resolutions are dynamically determined by an algorithm that computes an initial summary size by initializing to 10X the size of the smallest feature and doubling the size as needed until the size of the summary is less than half that of the data (or there are no further gains). It then computes up to 10 more levels of summary, quadrupling the size each time, until the summaries start to exceed the sequence size. |
... | Arguments to pass down to methods to other methods. For import, the flow eventually reaches the BigWigFile method on import . |
Value
A GRanges
(default), RleList
or NumericList
.
GRanges
return ranges with non-zero score values in a score
metadata column. The length of the NumericList
is the same length
as the selection
argument (one list element per range).
The return order in the NumericList
matches the order of the
BigWigSelection
object.
Seealso
wigToBigWig
for converting a WIG file to BigWig.
Author
Michael Lawrence
Examples
if (.Platform$OS.type != "windows") {
test_path <- system.file("tests", package = "rtracklayer")
test_bw <- file.path(test_path, "test.bw")
## GRanges
## Returns ranges with non-zero scores.
gr <- import(test_bw)
gr
which <- GRanges(c("chr2", "chr2"), IRanges(c(1, 300), c(400, 1000)))
import(test_bw, which = which)
## RleList
## Scores returned as an RleList is equivalent to the coverage.
## Best option when 'which' or 'selection' contain many small ranges.
mini <- narrow(unlist(tile(which, 50)), 2)
rle <- import(test_bw, which = mini, as = "RleList")
rle
## NumericList
## The 'which' is stored as metadata:
track <- import(test_bw, which = which, as = "NumericList")
metadata(track)
test_bw_out <- file.path(tempdir(), "test_out.bw")
export(test, test_bw_out)
bwf <- BigWigFile(test_bw)
track <- import(bwf)
seqinfo(bwf)
summary(bwf) # for each sequence, average all values into one
summary(bwf, range(head(track))) # just average the first few features
summary(bwf, size = seqlengths(bwf) / 10) # 10X reduction
summary(bwf, type = "min") # min instead of mean
summary(bwf, track, size = 10, as = "matrix") # each feature 10 windows
}
BigWigSelection_class()
Selection of ranges and columns
Description
A BigWigSelection
represents a query against a
BigWig file, see import.bw
. It is simply
a RangedSelection that requires its colnames
parameter to be "score", if non-empty, as that is the only column
supported by BigWig.
Author
Michael Lawrence
Examples
rl <- IRangesList(chr1 = IRanges::IRanges(c(1, 5), c(3, 6)))
BigWigSelection(rl)
as(rl, "BigWigSelection") # same as above
# do not select the 'score' column
BigWigSelection(rl, character())
BrowserViewList_class()
Lists of BrowserView
Description
A formal list of
BrowserView objects. Extends and inherits all
its methods from
Vector
. Usually generated by
passing multiple ranges to the browserView
function.
Author
Michael Lawrence
Chain_class()
Chain objects
Description
A Chain
object represents a UCSC chain alignment, typically
imported from a chain
file, and is essentially a list of
ChainBlock
objects. Each ChainBlock
has a corresponding
chromosome (its name in the list) and is a run-length
encoded alignment, mapping a set of intervals on that chromosome to
intervals on the same or other chromosomes.
Seealso
liftOver
for performing lift overs using a chain alignment
Note
A chain file essentially details many local alignments, so it is possible for the "from" ranges to map to overlapping regions in the other sequence. The "from" ranges are guaranteed to be disjoint (but do not necessarily cover the entire "from" sequence).
Author
Michael Lawrence
FastaFile_class()
FastaFile objects
Description
These functions support the import and export of the Fasta sequence format, using the Biostrings package.
Usage
list(list("import"), list("FastaFile,ANY,ANY"))(con, format, text,
type = c("DNA", "RNA", "AA", "B"), ...)
list(list("export"), list("ANY,FastaFile,ANY"))(object, con, format, ...)
list(list("export"), list("XStringSet,FastaFile,ANY"))(object, con, format, ...)
Arguments
Argument | Description |
---|---|
con | A path or FastaFile object. URLs and connections are not supported. If con is not a FastaFile , either the file extension or the format argument needs to be fasta . Compressed files ( gz , bz2 and xz ) are handled transparently. |
object | The object to export, should be an XStringSet or something coercible to a DNAStringSet , like a character vector. |
format | If not missing, should be fasta . |
text | If con is missing, a character vector to use as the input |
type | Type of biological sequence. |
... | Arguments to pass down to writeXStringSet (export) or the readDNAStringSet family of functions (import). |
Seealso
These functions are implemented by the Biostrings
writeXStringSet
(export) and the
readDNAStringSet
family of functions
(import).
See export-methods in the BSgenome package for exporting a BSgenome object as a FASTA file.
Author
Michael Lawrence
GFFFile_class()
GFFFile objects
Description
These functions support the import and export of the GFF format, of which there are three versions and several flavors.
Usage
list(list("import"), list("GFFFile,ANY,ANY"))(con, format, text,
version = c("", "1", "2", "3"),
genome = NA, colnames = NULL, which = NULL,
feature.type = NULL, sequenceRegionsAsSeqinfo = FALSE)
import.gff(con, ...)
import.gff1(con, ...)
import.gff2(con, ...)
import.gff3(con, ...)
list(list("export"), list("ANY,GFFFile,ANY"))(object, con, format, ...)
list(list("export"), list("GenomicRanges,GFFFile,ANY"))(object, con, format,
version = c("1", "2", "3"),
source = "rtracklayer", append = FALSE, index = FALSE)
list(list("export"), list("GenomicRangesList,GFFFile,ANY"))(object, con, format, ...)
export.gff(object, con, ...)
export.gff1(object, con, ...)
export.gff2(object, con, ...)
export.gff3(object, con, ...)
Arguments
Argument | Description |
---|---|
con | A path, URL, connection or GFFFile object. For the functions ending in .gff , .gff1 , etc, the file format is indicated by the function name. For the base export and import functions, the format must be indicated another way. If con is a path, URL or connection, either the file extension or the format argument needs to be one of gff , gff1 gff2 , gff3 , gvf , or gtf . Compressed files ( gz , bz2 and xz ) are handled transparently. |
object | The object to export, should be a GRanges or something coercible to a GRanges . If the object has a method for asGFF , it is called prior to coercion. This makes it possible to export a GRangesList or TxDb in a way that preserves the hierarchical structure. For exporting multiple tracks, in the UCSC track line metaformat, pass a GenomicRangesList , or something coercible to one. |
format | If not missing, should be one of gff , gff1 gff2 , gff3 , gvf , or gtf . |
version | If the format is given as list("gff") , i.e., it does not specify a version, then this should indicate the GFF version as one of list() (for import only, from the gff-version directive in the file or list("1") if none), list("1") , list("2") or list("3") . |
text | If con is missing, a character vector to use as the input. |
genome | The identifier of a genome, or a Seqinfo , or NA if unknown. Typically, this is a UCSC identifier like hg19 . An attempt will be made to derive the Seqinfo on the return value using either an installed BSgenome package or UCSC, if network access is available. |
colnames | A character vector naming the columns to parse. These should name either fixed fields, like source or type , or, for GFF2 and GFF3, any attribute. |
which | A GRanges or other range-based object supported by findOverlaps . Only the intervals in the file overlapping the given ranges are returned. This is much more efficient when the file is indexed with the tabix utility. |
feature.type | NULL (the default) or a character vector of valid feature types. If not NULL , then only the features of the specified type(s) are imported. |
sequenceRegionsAsSeqinfo | If TRUE , attempt to infer the Seqinfo ( seqlevels and seqlengths ) from the ##sequence-region directives as specified by GFF3. |
source | The value for the source column in GFF. This is typically the name of the package or algorithm that generated the feature. |
index | If TRUE , automatically compress and index the output file with bgzf and tabix. Note that tabix indexing will sort the data by chromosome and start. Tabix supports a single track in a file. |
append | If TRUE , and con points to a file path, the data is appended to the file. Obviously, if con is a connection, the data is always appended. |
... | Arguments to pass down to methods to other methods. For import, the flow eventually reaches the GFFFile method on import . When trackLine is TRUE or the target format is BED15, the arguments are passed through export.ucsc , so track line parameters are supported. |
Details
The Generic Feature Format (GFF) format is a tab-separated table of intervals. There are three different versions of GFF, and they all have the same number of columns. In GFF1, the last column is a grouping factor, whereas in the later versions the last column holds application-specific attributes, with some conventions defined for those commonly used. This attribute support facilitates specifying extensions to the format. These include GTF (Gene Transfer Format, an extension of GFF2) and GVF (Genome Variation Format, an extension of GFF3). The rtracklayer package recognizes the list("gtf") and list("gvf") extensions and parses the extra attributes into columns of the result; however, it does not perform any extension-specific processing. Both GFF1 and GFF2 have been proclaimed obsolete; however, the UCSC Genome Browser only supports GFF1 (and GTF), and GFF2 is still in broad use.
GFF is distinguished from the simpler BED format by its flexible
attribute support and its hierarchical structure, as specified by the
group
column in GFF1 (only one level of grouping) and the
Parent
attribute in GFF3. GFF2 does not specify a convention
for representing hierarchies, although its GTF extension provides this
for gene structures. The combination of support for hierarchical data
and arbitrary descriptive attributes makes GFF(3) the preferred format
for representing gene models.
Although GFF features a score
column, large quantitative data
belong in a format like BigWig and alignments from
high-throughput experiments belong in
BAM . For variants, the VCF format (supported
by the VariantAnnotation package) seems to be more widely adopted than
the GVF extension.
A note on the UCSC track line metaformat: track lines are a means for
passing hints to visualization tools like the UCSC Genome Browser and
the Integrated Genome Browser (IGB), and they allow multiple tracks to
be concatenated in the same file. Since GFF is not a UCSC format, it
is not common to annotate GFF data with track lines, but rtracklayer
still supports it. To export or import GFF data in the track line
format, call export.ucsc
or import.ucsc
.
The following is the mapping of GFF elements to a GRanges
object.
NA values are allowed only where indicated.
These appear as a list(".") in the file. GFF requires that all columns
are included, so export
generates defaults for missing columns.
list(" ", " ", list(list("seqid, start, end"), list("the ", list("ranges"), " component.")), " ", " ", list(list("source"), list("character vector in the ", list("source"), " ", " column; defaults to ", list("rtracklayer"), " on export. ", " ")), " ", " ", list(list("type"), list("character vector in the ", list("type"), " column; defaults ", " to ", list("sequence_feature"), " in the output, i.e., SO:0000110. ", " ")), " ", " ", list(list("score"), list("numeric vector (NA's allowed) in the ",
list("score"), "
", " column, accessible via the ", list("score"), " accessor; defaults ", " to ", list("NA"), " upon export. ", " ")), " ", " ", list(list("strand"), list("strand factor (NA's allowed) in the ", list("strand"), " ", " column, accessible via the ", list("strand"), " accessor; defaults ", " to ", list("NA"), " upon export. ", " ")), " ", " ", list(list("phase"), list("integer vector, either 0, 1 or 2 (NA's allowed); ", " defaults to ",
list("NA"), " upon export.
", " ")), " ", " ", list(list("group"), list("a factor (GFF1 only); defaults to the ", list("seqid"), " ", " (e.g., chromosome) on export. ", " ")), " ", " ")
In GFF versions 2 and 3, attributes map to arbitrary columns in the
result. In GFF3, some attributes ( Parent
, Alias
,
Note
, DBxref
and Ontology_term
) can have
multiple, comma-separated values; these columns are thus always
CharacterList
objects.
Value
A GRanges
with the metadata columns described in the details.
Author
Michael Lawrence
References
list(" ", " ", list(list("GFF1, GFF2"), list(" ", " ", list("http://www.sanger.ac.uk/resources/software/gff/spec.html"), " ", " ")), " ", " ", list(list("GFF3"), list(list("http://www.sequenceontology.org/gff3.shtml"))), " ", " ", list(list("GVF"), list(list("http://www.sequenceontology.org/resources/gvf.html"))), " ", " ", list(list("GTF"), list(list("http://mblab.wustl.edu/GTF22.html"))), " ", " ")
Examples
test_path <- system.file("tests", package = "rtracklayer")
test_gff3 <- file.path(test_path, "genes.gff3")
## basic import
test <- import(test_gff3)
test
## import.gff functions
import.gff(test_gff3)
import.gff3(test_gff3)
## GFFFile derivatives
test_gff_file <- GFF3File(test_gff3)
import(test_gff_file)
test_gff_file <- GFFFile(test_gff3)
import(test_gff_file)
test_gff_file <- GFFFile(test_gff3, version = "3")
import(test_gff_file)
## from connection
test_gff_con <- file(test_gff3)
test <- import(test_gff_con, format = "gff")
## various arguments
import(test_gff3, genome = "hg19")
import(test_gff3, colnames = character())
import(test_gff3, colnames = c("type", "geneName"))
## 'which'
which <- GRanges("chr10:90000-93000")
import(test_gff3, which = which)
## 'append'
test_gff3_out <- file.path(tempdir(), "genes.gff3")
export(test[seqnames(test) == "chr10"], test_gff3_out)
export(test[seqnames(test) == "chr12"], test_gff3_out, append = TRUE)
import(test_gff3_out)
## 'index'
export(test, test_gff3_out, index = TRUE)
test_bed_gz <- paste(test_gff3_out, ".gz", sep = "")
import(test_bed_gz, which = which)
GRangesForUCSCGenome()
GRanges for a Genome
Description
These functions assist in the creation of
Seqinfo
or
GRanges
for a genome.
Usage
GRangesForUCSCGenome(genome, chrom = NULL, ranges = NULL, ...)
GRangesForBSGenome(genome, chrom = NULL, ranges = NULL, ...)
SeqinfoForUCSCGenome(genome)
SeqinfoForBSGenome(genome)
Arguments
Argument | Description |
---|---|
genome | A string identifying a genome, usually one assigned by UCSC, like "hg19". |
chrom | A character vector of chromosome names, or NULL . |
ranges | A IntegerRanges object with the intervals. |
list() | Additional arguments to pass to the GRanges constructor. |
Details
The genome ID is stored in the metadata of the ranges and is
retrievable via the genome
function. The sequence
lengths are also properly initialized for the genome. This mitigates
the possibility of accidentally storing intervals for the wrong
genome.
GRangesForUCSCGenome
obtains sequence information from the UCSC
website, while GRangesForBSGenome
looks for it in an
installed BSGenome
package. Using the latter is more efficient
in the long-run, but requires downloading and installing a potentially
large genome package, or creating one from scratch if it does not yet
exist for the genome of interest.
Value
For the GRangesFor*
functions, a GRanges
object, with the
appropriate seqlengths
and
genome
ID.
The SeqinfoFor*
functions return a Seqinfo
for the
indicated genome.
Author
Michael Lawrence
GenomicData()
Data on a Genome
Description
The rtracklayer
package adds
convenience methods on top of GenomicRanges
and IntegerRangesList
to manipulate data on genomic ranges.
Author
Michael Lawrence and Patrick Aboyoun
Examples
range1 <- IRanges(c(1,2,3), c(5,2,8))
## with some data ##
filter <- c(1L, 0L, 1L)
score <- c(10L, 2L, NA)
strand <- factor(c("+", NA, "-"), levels = levels(strand()))
## GRanges instance
gr <- GenomicData(range1, score, chrom = "chr1", genome = "hg18")
mcols(gr)[["score"]]
strand(gr) ## all '*'
gr <- GenomicData(range1, score, filt = filter, strand = strand,
chrom = "chr1")
mcols(gr)[["filt"]]
strand(gr) ## equal to 'strand'
## coercion from data.frame ##
df <- as.data.frame(gr)
GenomicSelection()
Genomic data selection
Description
Convenience constructor of a RangedSelection object for selecting a data on a per-chromosome basis for a given genome.
Usage
GenomicSelection(genome, chrom = NULL, colnames = character(0))
Arguments
Argument | Description |
---|---|
genome | A string identifying a genome. Should match the end of a BSgenome package name, e.g. "hg19". |
chrom | Character vector naming chromosomes to select. |
colnames | The column names to select from the dataset. |
Value
A RangedSelection object, selecting entire chromosomes
Seealso
RangedSelection
, BigWigSelection
Author
Michael Lawrence
Examples
# every chromosome from hg19
GenomicSelection("hg19")
# chr1 and 2 from hg19, with a score column
GenomicSelection("hg19", c("chr1", "chr2"), "score")
IntegerRangesList_methods()
Ranges on a Genome
Description
Genomic coordinates are often specified in terms of a genome identifier,
chromosome name, start position and end
position. The rtracklayer
package adds convenience methods to
IntegerRangesList
for the manipulation of genomic ranges.
The spaces (or names) of IntegerRangesList
are the chromosome names.
The universe
slot indicates the genome, usually as given by UCSC
(e.g. hg18 ).
Author
Michael Lawrence
QuickloadGenome_class()
Quickload Genome Access
Description
A Quickload data source is a collection of tracks and sequences,
separated by genome. This class, QuickloadGenome
provides
direct access to the data for one particular genome.
Author
Michael Lawrence
Examples
tests_dir <- system.file("tests", package = "rtracklayer")
ql <- Quickload(file.path(tests_dir, "quickload"))
qlg <- QuickloadGenome(ql, "T_species_Oct_2011")
seqinfo(qlg)
organism(qlg)
releaseDate(qlg)
names(qlg)
mcols(qlg)
if (.Platform$OS.type != "windows") { # temporary
qlg$bedData
}
## populating the test repository
ql <- Quickload(file.path(tests_dir, "quickload"), create = TRUE)
reference_seq <- import(file.path(tests_dir, "test.2bit"))
names(reference_seq) <- "test"
qlg <- QuickloadGenome(ql, "T_species_Oct_2011", create = TRUE,
seqinfo = seqinfo(reference_seq))
referenceSequence(qlg) <- reference_seq
test_bed <- import(file.path(tests_dir, "test.bed"))
names(test_bed) <- "test"
qlg$bedData <- test_bed
test_bedGraph <- import(file.path(tests_dir, "test.bedGraph"))
names(test_bedGraph) <- "test"
start(test_bedGraph) <- seq(1, 90, 10)
width(test_bedGraph) <- 10
track(qlg, "bedGraphData", format = "bw") <- test_bedGraph
Quickload_class()
Quickload Access
Description
The Quickload
class represents a Quickload data source,
essentially directory layout separating tracks and sequences by
genome, along with a few metadata files. This interface abstracts
those details and provides access to a Quickload at any URL supported
by R (HTTP, FTP, and local files). This is an easy way to make data
accessible to the Integrated Genome Browser (IGB).
Author
Michael Lawrence
Examples
ql <- Quickload(system.file("tests", "quickload", package = "rtracklayer"))
uri(ql)
genome(ql)
ql$T_species_Oct_2011
RTLFile_class()
RTLFile objects
Description
A RTLFile
object is the base class for classes representing
files accessible with rtracklayer. It wraps a resource (either a path,
URL or connection). We can represent a list of RTLFile
objects
with a RTLFileList
.
Seealso
Implementing classes include: BigWigFile , TwoBitFile , BEDFile , GFFFile , and WIGFile .
Author
Michael Lawrence
TabixFile_methods()
TabixFile Import/Export
Description
These methods support the import and export of
Rsamtools:TabixFile
TabixFile objects. These are generally
useful when working with tabix-indexed files that have a non-standard
format (i.e., not BED nor GFF), as well as exporting an object with
arbitrary columns (like a GRanges) to an indexed, tab-separated
file. This relies on the tabix header, which indicates the columns in
the file that correspond to the chromosome, start and end. The BED and
GFF parsers handle tabix transparently.
Usage
list(list("import"), list("TabixFile,ANY,ANY"))(con, format, text,
which = if (is.na(genome)) NULL
else as(seqinfoForGenome(genome), "GenomicRanges"),
genome = NA, header = TRUE, ...)
exportToTabix(object, con, ...)
Arguments
Argument | Description |
---|---|
con | For import , a TabixFile object; for exportToTabix , a string naming the destination file. |
object | The object to export. It is coerced to a data.frame , written to a tab-separated file, and indexed with tabix for efficient range-based retrieval of the data using import . |
format | If any known format, like bed or gff (or one of their variants), then the appropriate parser is applied. If any other value, then the tabix header is consulted for the format. By default, this is taken from the file extension. |
text | Ignored. |
which | A range data structure coercible to IntegerRangesList , like a GRanges . Only the intervals in the file overlapping the given ranges are returned. The default is to use the range over the entire genome given by genome , if specified. |
genome | The identifier of a genome, or NA if unknown. Typically, this is a UCSC identifier like hg19 . An attempt will be made to derive the seqinfo on the return value using either an installed BSgenome package or UCSC, if network access is available. |
header | If TRUE , then the header in the indexed file, which might include a track line, is sent to the parser. Otherwise, the initial lines are skipped, according to the skip field in the tabix index header. |
... | Extra arguments to pass to the underlying import routine, which for non-standard formats is read.table or write.table . |
Value
For import
, a GRanges
object.
For exportToTabix
, a TabixFile
object that is directly
passable to import
.
Seealso
scanTabix
and friends
Author
Michael Lawrence
References
TrackDb_class()
Track Databases
Description
The TrackDb
class is an abstraction around a database of
tracks. Implementations include BrowserSession
derivatives and QuickloadGenome . Here, a track is
defined as an interval dataset.
Author
Michael Lawrence
TwoBitFile_class()
2bit Files
Description
These functions support the import and export of the UCSC 2bit
compressed sequence format. The main advantage is speed of subsequence
retrieval, as it only loads the sequence in the requested
intervals. Compared to the FA format supported by Rsamtools, 2bit
offers the additional feature of masking and also has better support
in Java (and thus most genome browsers). The supporting
TwoBitFile
class is a reference to a TwoBit file.
Usage
list(list("import"), list("TwoBitFile,ANY,ANY"))(con, format, text,
which = as(seqinfo(con), "GenomicRanges"), ...)
list(list("getSeq"), list("TwoBitFile"))(x, which = as(seqinfo(x), "GenomicRanges"))
import.2bit(con, ...)
list(list("export"), list("ANY,TwoBitFile,ANY"))(object, con, format, ...)
list(list("export"), list("DNAStringSet,TwoBitFile,ANY"))(object, con, format)
list(list("export"), list("DNAStringSet,character,ANY"))(object, con, format, ...)
export.2bit(object, con, ...)
Arguments
Argument | Description |
---|---|
con | A path, URL or TwoBitFile object. Connections are not supported. For the functions ending in .2bit , the file format is indicated by the function name. For the export and import methods, the format must be indicated another way. If con is a path, or URL, either the file extension or the format argument needs to be twoBit or 2bit . |
object,x | The object to export, either a DNAStringSet or something coercible to a DNAStringSet , like a character vector. |
format | If not missing, should be twoBit or 2bit (case insensitive). |
text | Not supported. |
which | A range data structure coercible to IntegerRangesList , like a GRanges , or a TwoBitFile . Only the intervals in the file overlapping the given ranges are returned. By default, the value is the TwoBitFile itself. Its Seqinfo object is extracted and coerced to a IntegerRangesList that represents the entirety of the file. |
... | Arguments to pass down to methods to other methods. For import, the flow eventually reaches the TwoBitFile method on import . For export, the TwoBitFile methods on export are the sink. |
Value
For import, a DNAStringSet
.
Seealso
export-methods in the BSgenome package for exporting a BSgenome object as a twoBit file.
Note
The 2bit format only suports A, C, G, T and N (via an internal
mask). To export sequences with additional IUPAC ambiguity codes,
first pass the object through
replaceAmbiguities
from the Biostrings
package.
Author
Michael Lawrence
Examples
test_path <- system.file("tests", package = "rtracklayer")
test_2bit <- file.path(test_path, "test.2bit")
test <- import(test_2bit)
test
test_2bit_file <- TwoBitFile(test_2bit)
import(test_2bit_file) # the whole file
which_range <- IRanges(c(10, 40), c(30, 42))
which <- GRanges(names(test), which_range)
import(test_2bit, which = which)
seqinfo(test_2bit_file)
test_2bit_out <- file.path(tempdir(), "test_out.2bit")
export(test, test_2bit_out)
## just a character vector
test_char <- as.character(test)
export(test_char, test_2bit_out)
UCSCData_class()
Class "UCSCData"
Description
Each track in UCSC has an associated TrackLine that contains metadata on the track.
Seealso
import
and export
for reading and writing
tracks to and from connections (files), respectively.
Author
Michael Lawrence
UCSCFile_class()
UCSCFile objects
Description
These functions support the import and export of tracks emucscded within the UCSC track line metaformat, whereby multiple tracks may be concatenated within a single file, along with metadata mostly oriented towards visualization. Any UCSCData object is automatically exported in this format, if the targeted format is known to be compatible. The BED and WIG import methods check for a track line, and delegate to these functions if one is found. Thus, calling this API directly is only necessary when importing embedded GFF (rare), or when one wants to create the track line during the export process.
Usage
list(list("import"), list("UCSCFile,ANY,ANY"))(con, format, text,
subformat = "auto", drop = FALSE,
genome = NA, ...)
import.ucsc(con, ...)
list(list("export"), list("ANY,UCSCFile,ANY"))(object, con, format, ...)
list(list("export"), list("GenomicRanges,UCSCFile,ANY"))(object, con, format, ...)
list(list("export"), list("GenomicRangesList,UCSCFile,ANY"))(object, con, format,
append = FALSE, index = FALSE, ...)
list(list("export"), list("UCSCData,UCSCFile,ANY"))(object, con, format,
subformat = "auto", append = FALSE, index = FALSE, ...)
export.ucsc(object, con, ...)
Arguments
Argument | Description |
---|---|
con | A path, URL, connection or UCSCFile object. For the functions ending in .ucsc , the file format is indicated by the function name. For the base export and import functions, ucsc must be passed as the format argument. |
object | The object to export, should be a GRanges or something coercible to a GRanges . For exporting multiple tracks pass a GenomicRangesList , or something coercible to one. |
format | If not missing, should be ucsc . |
text | If con is missing, a character vector to use as the input |
subformat | The file format to use for the actual features, between the track lines. Must be a text-based format that is compatible with track lines (most are). If an RTLFile subclass other than UCSCFile is passed as con to import.ucsc or export.ucsc , the subformat is assumed to be the corresponding format of con . Otherwise it defaults to auto . The following describes the logic of the auto mode. For import, the subformat is taken as the type field in the track line. If none, the file extension is consulted. For export, if object is a UCSCData , the subformat is taken as the type in its track line, if present. Otherwise, the subformat is chosen based on whether object contains a score column. If there is a score, the target is either BEDGraph or WIG , depending on the structure of the ranges. Otherwise, BED is the target. |
genome | The identifier of a genome, or NA if unknown. Typically, this is a UCSC identifier like hg19 . An attempt will be made to derive the seqinfo on the return value using either an installed BSgenome package or UCSC, if network access is available. This defaults to the db BED track line parameter, if any. |
drop | If TRUE , and there is only one track in the file, return the track object directly, rather than embedding it in a list. |
append | If TRUE , and con points to a file path, the data is appended to the file. Obviously, if con is a connection, the data is always appended. |
index | If TRUE , automatically compress and index the output file with bgzf and tabix. Note that tabix indexing will sort the data by chromosome and start. Tabix supports a single track in a file. |
... | Should either specify track line parameters or arguments to pass down to the import and export routine for the subformat. |
Details
The UCSC track line permits the storage of multiple tracks in a single
file by separating them with a so-called track line , a line
belonging with the word track and containing
various key=value
pairs encoding metadata, most related to
visualization. The standard fields in a track depend on the type of
track being annotated. See TrackLine and its
derivatives for how these lines are represented in R. The
class UCSCData is an extension
of GRanges
with a formal slot for a TrackLine
.
Each GRanges
in the returned GenomicRangesList
has the
track line stored in its metadata, under the trackLine
key.
For each track object to be exported, if the object is not a
UCSCData
, and there is no trackLine
element in the
metadata, then a new track line needs to be generated. This happens
through the coercion of object
to UCSCData
. The track line
is initialized to have the appropriate type
parameter for the
subformat, and the required name
parameter is taken from the
name of the track in the input list (if any). Otherwise, the default
is simply R Track . The db
parameter (specific to BED
track lines) is taken as genome(object)
if not
NA
. Additional arguments passed to the export routines
override parameters in the provided track line.
If the subformat is either WIG or BEDGraph, and the features are stranded, a separate track will be output in the file for each strand. Neither of those formats encodes the strand and disallow overlapping features (which might occur upon destranding).
Value
A GenomicRangesList
unless drop
is TRUE
and there is only a single track in the file. In that case, the first and
only object is extracted from the list and returned.
The structure of that object depends on the format of the
data. The GenomicRangesList
contains UCSCData
objects.
Author
Michael Lawrence
References
UCSCSchema_class()
UCSC Schema
Description
This is a preliminary class that describes a table in the UCSC database. The description includes the table name, corresponding genome, row count, and a textual description of the format. In the future, we could provide more table information, like the links and sample data frame. This is awaiting a use-case.
Author
Michael Lawrence
Examples
session <- browserSession()
genome(session) <- "mm9"
query <- ucscTableQuery(session, "knownGene")
schema <- ucscSchema(query)
nrow(schema)
UCSCTableQuery_class()
Querying UCSC Tables
Description
The UCSC genome browser is backed by a large database,
which is exposed by the Table Browser web interface. Tracks are
stored as tables, so this is also the mechanism for retrieving tracks. The
UCSCTableQuery
class represents a query against the Table
Browser. Storing the query fields in a formal class facilitates
incremental construction and adjustment of a query.
Details
There are five supported fields for a table query: list(" ", " ", list(list("session"), list("The ", list(list("UCSCSession")), " instance from ", " the tables are retrieved. Although all sessions are based on the ", " same database, the set of user-uploaded tracks, which are represented ", " as tables, is not the same, in general. ", " ")), " ", " ", list(list("trackName"), list("The name of a track from which to retrieve a ", " table. Each track can have multiple tables. Many times there is a ", " primary table that is used to display the track, while the other ",
" tables are supplemental. Sometimes, tracks are displayed by
", " aggregating multiple tables. If ", list("NULL"), ", search for a primary ", " table across all of the tracks (will not find secondary tables). ", " ")), " ", " ", list(list("tableName"), list("The name of the specific table to retrieve. May be ", " ", list("NULL"), ", in which case the behavior depends on how the query ", " is executed, see below. ", " ")), " ", " ", list(list("range"),
list("A genome identifier, a
", " ", list(list("GRanges")), " or ", " a ", list(list("IntegerRangesList")), " indicating ", " the portion of the table to retrieve, in genome coordinates. ", " Simply specifying the genome string is the easiest way to download ", " data for the entire genome, and ", list(list("GRangesForUCSCGenome")), " ", " facilitates downloading data for e.g. an entire chromosome. ", " ")), " ", " ", list(list("names"), list("Names/accessions of the desired features")),
"
", " ")
A common workflow for querying the UCSC database is to create an
instance of UCSCTableQuery
using the ucscTableQuery
constructor, invoke tableNames
to list the available tables for
a track, and finally to retrieve the desired table either as a
data.frame
via getTable
or as a track
via track
. See the examples.
The reason for a formal query class is to facilitate multiple queries
when the differences between the queries are small. For example, one
might want to query multiple tables within the track and/or same
genomic region, or query the same table for multiple regions. The
UCSCTableQuery
instance can be incrementally adjusted for each
new query. Some caching is also performed, which enhances performance.
Author
Michael Lawrence
Examples
session <- browserSession()
genome(session) <- "mm9"
trackNames(session) ## list the track names
## choose the Conservation track for a portion of mm9 chr1
query <- ucscTableQuery(session, "Conservation",
GRangesForUCSCGenome("mm9", "chr12",
IRanges(57795963, 57815592)))
## list the table names
tableNames(query)
## get the phastCons30way track
tableName(query) <- "phastCons30way"
## retrieve the track data
track(query) # a GRanges object
## get a data.frame summarizing the multiple alignment
tableName(query) <- "multiz30waySummary"
getTable(query)
genome(session) <- "hg18"
query <- ucscTableQuery(session, "snp129",
names = c("rs10003974", "rs10087355", "rs10075230"))
ucscSchema(query)
getTable(query)
WIGFile_class()
WIG Import and Export
Description
These functions support the import and export of the UCSC WIG (Wiggle) format.
Usage
list(list("import"), list("WIGFile,ANY,ANY"))(con, format, text, genome = NA,
trackLine = TRUE, which = NULL, seqinfo = NULL, ...)
import.wig(con, ...)
list(list("export"), list("ANY,WIGFile,ANY"))(object, con, format, ...)
list(list("export"), list("GenomicRanges,WIGFile,ANY"))(object, con, format,
dataFormat = c("auto", "variableStep", "fixedStep"),
writer = .wigWriter, append = FALSE, ...)
list(list("export"), list("GenomicRangesList,WIGFile,ANY"))(object, con, format, ...)
list(list("export"), list("UCSCData,WIGFile,ANY"))(object, con, format,
trackLine = TRUE, ...)
export.wig(object, con, ...)
Arguments
Argument | Description |
---|---|
con | A path, URL, connection or WIGFile object. For the functions ending in .wig , the file format is indicated by the function name. For the base export and import functions, the format must be indicated another way. If con is a path, URL or connection, either the file extension or the format argument needs to be wig . Compressed files ( gz , bz2 and xz ) are handled transparently. |
object | The object to export, should be a GRanges or something coercible to a GRanges . For exporting multiple tracks, in the UCSC track line metaformat, pass a GenomicRangesList , or something coercible to one. |
format | If not missing, should be wig . |
text | If con is missing, a character vector to use as the input |
trackLine | Whether to parse/output a UCSC track line. An imported track line will be stored in a TrackLine object, as part of the returned UCSCData . |
genome | The identifier of a genome, or NA if unknown. Typically, this is a UCSC identifier like hg19 . An attempt will be made to derive the seqinfo on the return value using either an installed BSgenome package or UCSC, if network access is available. |
seqinfo | If not NULL , the Seqinfo object to set on the result. If the genome argument is not NA , it must agree with genome(seqinfo) . |
which | A range data structure like IntegerRangesList or GRanges . Only the intervals in the file overlapping the given ranges are returned. This is inefficient; use BigWig for efficient spatial queries. |
append | If TRUE , and con points to a file path, the data is appended to the file. Obviously, if con is a connection, the data is always appended. |
dataFormat | Probably best left to auto . Exists only for historical reasons. |
writer | Function for writing out the blocks; for internal use only. |
... | Arguments to pass down to methods to other methods. For import, the flow eventually reaches the WIGFile method on import . When trackLine is TRUE , the arguments are passed through export.ucsc , so track line parameters are supported. |
Details
The WIG format is a text-based format for efficiently representing a
dense genome-scale score vector. It encodes, for each feature, a range
and score. Features from the same sequence (chromosome) are grouped
together into a block, with a single block header line indicating the
chromosome. There are two block formats: fixed step and variable
step. For fixed step, the number of positions (or step) between
intervals is the same across an entire block. For variable step, the
start position is specified for each feature. For both fixed and
variable step, the span (or width) is specified in the header and thus
must be the same across all features. This requirement of uniform
width dramatically limits the applicability of WIG. For scored
features of variable width, consider BEDGraph or
BigWig , which is generally preferred over both WIG
and BEDGraph. To efficiently convert an existing WIG or BEDGraph file
to BigWig, call wigToBigWig
. Neither WIG, BEDGraph nor
BigWig allow overlapping features.
Value
A GRanges
with the score values in the score
metadata column,
which is accessible via the score
function.
Author
Michael Lawrence
References
http://genome.ucsc.edu/goldenPath/help/wiggle.html
Examples
test_path <- system.file("tests", package = "rtracklayer")
test_wig <- file.path(test_path, "step.wig")
## basic import calls
test <- import(test_wig)
test
import.wig(test_wig)
test_wig_file <- WIGFile(test_wig)
import(test_wig_file)
test_wig_con <- file(test_wig)
import(test_wig_con, format = "wig")
test_wig_con <- file(test_wig)
import(WIGFile(test_wig_con))
## various options
import(test_wig, genome = "hg19")
import(test_wig, trackLine = FALSE)
which <- as(test[3:4,], "IntegerRangesList")
import(test_wig, which = which)
## basic export calls
test_wig_out <- file.path(tempdir(), "test.wig")
export(test, test_wig_out)
export.wig(test, test_wig_out)
test_foo_out <- file.path(tempdir(), "test.foo")
export(test, test_foo_out, format = "wig")
test_wig_out_file <- WIGFile(test_wig_out)
export(test, test_wig_out_file)
## appending
test2 <- test
metadata(test2)$trackLine <- initialize(metadata(test)$trackLine,
name = "test2")
export(test2, test_wig_out_file, append = TRUE)
## passing track line parameters
export(test, test_wig_out, name = "test2")
## no track line
export(test, test_wig_out, trackLine = FALSE)
## gzip
test_wig_gz <- paste(test_wig_out, ".gz", sep = "")
export(test, test_wig_gz)
activeView_methods()
Accessing the active view
Description
Get the active view.
asBED()
Coerce to BED structure
Description
Coerce the structure of an object to one following BED-like conventions, i.e., with columns for blocks and thick regions.
Usage
asBED(x, ...)
list(list("asBED"), list("GRangesList"))(x)
list(list("asBED"), list("GAlignments"))(x)
Arguments
Argument | Description |
---|---|
x | Generally, a tabular object to structure as BED |
list() | Arguments to pass to methods |
Details
The exact behavior depends on the class of object
.
list("
", " ", list(list(list("GRangesList")), list("This treats ", list("object"), " as if it were a
", " list of transcripts, i.e., each element contains the exons of a
", " transcript. The ", list("blockStarts"), " and
", " ", list("blockSizes"), " columns are derived from the ranges in each
", " element. Also, add ", list("name"), " column from ", list("names(object)"), ".
", " ")), "
", " ", list(list(list("GAlignments")), list("Converts to GRangesList via ",
list("grglist"), "
", " and procedes accordingly. ", " ")), " ", " ")
Value
A GRanges
, with the columns name
,
blockStarts
and blockSizes
added.
Author
Michael Lawrence
Examples
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
exons <- exonsBy(TxDb_Hsapiens_UCSC_hg19_knownGene)
mcols(asBED(exons))
asGFF()
Coerce to GFF structure
Description
Coerce the structure of an object to one following GFF-like
conventions, i.e., using the Parent
GFF3 attribute to encode
the hierarchical structure. This object is then suitable for export as GFF3.
Usage
asGFF(x, ...)
list(list("asGFF"), list("GRangesList"))(x, parentType = "mRNA", childType = "exon")
Arguments
Argument | Description |
---|---|
x | Generally, a tabular object to structure as GFF(3) |
parentType | The value to store in the type column for the top-level (e.g., transcript) ranges. |
childType | The value to store in the type column for the child (e.g., exon) ranges. |
list() | Arguments to pass to methods |
Value
For the GRangesList
method:
A GRanges
, with the columns: ID
(unique identifier),
Name
(from names(x)
, and the names on each element of
x
, if any), type
(as given by parentType
and
childType
), and Parent
(to relate each child range to
its parent at the top-level).
Author
Michael Lawrence
Examples
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
exons <- exonsBy(TxDb_Hsapiens_UCSC_hg19_knownGene)
mcols(asGFF(exons))
basicTrackLine_class()
Class "BasicTrackLine"
Description
The type of UCSC track line used to annotate most types of tracks (every type except Wiggle).
Seealso
GraphTrackLine for Wiggle/bedGraph tracks.
Author
Michael Lawrence
References
http://genome.ucsc.edu/goldenPath/help/customTrack.html#TRACK for the official documentation.
blocks_methods()
Get blocks/exons
Description
Obtains the block ranges (subranges, usually exons) from an object,
such as a GRanges
imported
from a BED file.
Usage
blocks(x, ...)
Arguments
Argument | Description |
---|---|
x | The instance from which to obtain the block/exon information. Currently must be a GenomicRanges , with a metadata column of name blocks and of type IntegerRangesList . Such an object is returned by import.bed and asBED . |
... | Additional arguments for methods |
Value
A GRangesList
with an
element for each range in x
. The original block ranges are
relative to the start of the containing range, so the returned ranges
are shifted to absolute coordinates. The seqname and strand are
inherited from the containing range.
Seealso
import.bed
for importing a track from BED, which
can store block information; asBED
for coercing a
GenomicRanges
into a BED-like structure that can be passed to
this function.
Author
Michael Lawrence
browseGenome()
Browse a genome
Description
A generic function for launching a genome browser.
Usage
browseGenome(object, ...)
list(list("browseGenome"), list("GenomicRanges_OR_GenomicRangesList"))(object,
browser = "UCSC", range = base::range(object),
view = TRUE, trackParams = list(), viewParams = list(),
name = "customTrack", ...)
Arguments
Argument | Description |
---|---|
object | A GRanges object or a list of GRanges objects (e.g. a GenomicRangesList object). |
browser | The name of the genome browser. |
range | A genome identifier or a GRanges or IntegerRangesList to display in the initial view. |
view | Whether to open a view. |
trackParams | Named list of parameters to pass to track<- . |
viewParams | Named list of parameters to pass to browserView . |
name | The name for the track. |
list() | Arguments passed to browserSession . |
Value
Returns a BrowserSession .
Seealso
BrowserSession and BrowserView , the two main classes for interfacing with genome browsers.
Author
Michael Lawrence
Examples
## open UCSC genome browser:
browseGenome()
## to view a specific range:
range <- GRangesForUCSCGenome("hg18", "chr22", IRanges(20000, 50000))
browseGenome(range = range)
## a slightly larger range:
browseGenome(range = range, end = 75000)
## with a track:
track <- import(system.file("tests", "v1.gff", package = "rtracklayer"))
browseGenome(GRangesList(track))
browserSession_class()
Class "BrowserSession"
Description
An object representing a genome browser session. As a derivative of TrackDb , each session contains a set of loaded tracks. In addition, it has a set of views, in the form of BrowserView instances, on those tracks. Note that this is a virtual class; a concrete implementation is provided by each backend driver.
Seealso
browserSession
for obtaining implementations of this
class for a particular genome browser.
Author
Michael Lawrence
browserSession_methods()
Get a genome browser session
Description
Methods for getting browser sessions.
browserView_class()
Class "BrowserView"
Description
An object representing a genome browser view of a particular segment of a genome.
Seealso
browserView
for obtaining instances of this class.
Author
Michael Lawrence
browserView_methods()
Getting browser views
Description
Methods for creating and getting browser views.
Usage
browserView(object, range, track, ...)
Arguments
Argument | Description |
---|---|
object | The object from which to get the views. |
range | The GRanges or IntegerRangesList to display. If there are multiple elements, a view is created for each element and a BrowserViewList is returned. |
track | List of track names to make visible in the view. |
list() | Arguments to pass to methods |
Examples
session <- browserSession()
browserView(session,
GRangesForUCSCGenome("hg19", "chr2", IRanges(20000, 50000)))
## only view "knownGene" track
browserView(session, track = "knownGene")
browserViews_methods()
Getting the browser views
Description
Methods for getting browser views.
Seealso
browserView
for creating a browser view.
Examples
session <- browseGenome()
browserViews(session)
cpneTrack()
CPNE1 SNP track
Description
A GRanges
object (created by the GGtools
package)
with features from a subset of the SNPs on chromosome 20 from 60
HapMap founders in the CEU cohort. Each SNP has an associated data
value indicating its association with the expression of the CPNE1 gene
according to a Cochran-Armitage 1df test. The top 5000 scoring SNPs
were selected for the track.
Format
Each feature (row) is a SNP. The association test scores are
accessible via score
.
Usage
data(cpneTrack)
Examples
data(cpneTrack)
plot(start(cpneTrack), score(cpneTrack))
export()
Import and export
Description
The functions import
and export
load and save
objects from and to particular file formats. The rtracklayer package
implements support for a number of annotation and sequence formats.
Usage
export(object, con, format, ...)
import(con, format, text, ...)
Arguments
Argument | Description |
---|---|
object | The object to export. |
con | The connection from which data is loaded or to which data is saved. If this is a character vector, it is assumed to be a filename and a corresponding file connection is created and then closed after exporting the object. If a RTLFile derivative, the data is loaded from or saved to the underlying resource. If missing, the function will return the output as a character vector, rather than writing to a connection. |
format | The format of the output. If missing and con is a filename, the format is derived from the file extension. This argument is unnecessary when con is a derivative of RTLFile . |
text | If con is missing, this can be a character vector directly providing the string data to import. |
list() | Parameters to pass to the format-specific method. |
Details
The rtracklayer package supports a number of file formats for representing annotated genomic intervals. These are each represented as a subclass of list("RTLFile") . Below, we list the major supported formats, with some advice for when a particular file format is appropriate:
list(" ", " ", list(list(list(list("GFF"))), list("The General Feature Format is ", " meant to represent any set of genomic features, with ", " application-specific columns represented as ", " ", list("attributes"), ". There are three principal versions (1, 2, and ", " 3). This is a good format for interoperating with other genomic ", " tools and is the most flexible format, in that a feature may have ", " any number of attributes (in version 2 and above). Version 3 ",
" (GFF3) is the preferred version. Its specification lays out
", " conventions for representing various types of data, including gene ", " models, for which it is the format of choice. For variants, ", " rtracklayer has rudimentary support for an extention of GFF3 ", " called GVF. UCSC supports GFF1, but it needs to be encapsulated in ", " the UCSC metaformat, i.e. ", list("export.ucsc(subformat = ", " "gff1")"), ". The BED format is typically preferred over GFF for ",
" interaction with UCSC. GFF files can be indexed with the tabix
", " utility for fast range-based queries via rtracklayer and
", " Rsamtools.
", " ")), "
", "
", " ", list(list(list(list("BED"))), list("The Browser Extended Display
", " format is for displaying qualitative tracks in a genome browser,
", " in particular UCSC. It finds a good balance between simplicity and
", " expressiveness. It is much simpler than GFF and yet can still
", " represent multi-exon gene structures. It is somewhat limited by
",
" its lack of the attribute support of GFF. To circumvent this, many
", " tools and organizations have extended BED with additional ", " columns. These are not officially valid BED files, and as such ", " rtracklayer does not yet support them (this will be addressed ", " soon). The rtracklayer package does support two official ", " extensions of BED: Bed15 and bedGraph, and the unofficial BEDPE ", " format, see below. BED files can be indexed with the tabix utility ",
" for fast range-based queries via rtracklayer and Rsamtools.
", " ")), " ", " ", " ", list(list(list(list("Bed15"))), list("An extension of BED with 15 ", " columns, Bed15 is meant to represent data from microarray ", " experiments. Multiple samples/columns are supported, and the data ", " is displayed in UCSC as a compact heatmap. Few other tools support ", " this format. With 15 columns per feature, this format is probably ", " too verbose for e.g. ChIP-seq coverage (use multiple BigWig tracks ",
" instead).")), "
", "
", " ", list(list(list(list("bedGraph"))), list("A variant of BED that
", " represents a score column more compactly than ", list("BED"), " and
", " especially ", list("Bed15"), ", although only one sample is
", " supported. The data is displayed in UCSC as a bar or line
", " graph. For large data (the typical case), ", list("BigWig"), " is
", " preferred.
", " ")), "
", "
", " ", list(list(list(list("BEDPE"))), list(
"A variant of BED that
", " represents pairs of genomic regions, such as interaction data or ", " chromosomal rearrangements. The data cannot be displayed in UCSC ", " directly but can be represented using the BED12 format. ", " ")), " ", " ", " ", list(list(list(list("WIG"))), list("The Wiggle format is meant for ", " storing dense numerical data, such as window-based GC and ", " conservation scores. The data is displayed in UCSC as a bar or ", " line graph. The WIG format only works for intervals with a uniform ",
" width. For non-uniform widths, consider ", list("bedGraph"), ". For large
", " data, consider ", list("BigWig"), ". ", " ")), " ", " ", " ", list(list(list(list("BigWig"))), list("The BigWig format is a ", " binary version of both ", list("bedGraph"), " and ", list("WIG"), " (which are ", " now somewhat obsolete). A BigWig file contains a spatial index for ", " fast range-based queries and also embeds summary statistics of the ", " scores at several zoom levels. Thus, it is ideal for visualization ",
" of and parallel computing on genome-scale vectors, like the
", " coverage from a high-throughput sequencing experiment.
", " ")), "
", " ")
In summary, for the typical use case of combining gene models with
experimental data, list("GFF") is preferred for gene models and
BigWig
is preferred for quantitative score vectors. Note that
the Rsamtools package provides support for the
BAM
file format (for representing
read alignments), among others. Based on this, the rtracklayer package
provides an export
method for writing GAlignments
and GappedReads
objects as BAM
. For variants, consider
VCF, supported by the VariantAnnotation package.
There is also support for reading and writing biological sequences,
including the UCSC TwoBit
format for
compactly storing a genome sequence along with a mask. The files are
binary, so they are efficiently queried for particular ranges. A
similar format is FA
, supported by
Rsamtools.
Value
If con
is missing, a character vector containing the string
output. Otherwise, nothing is returned.
Seealso
Format-specific options for the popular formats: list("GFF") , list("BED") , list("Bed15") , list("bedGraph") , list("WIG") , list("BigWig")
Author
Michael Lawrence
Examples
track <- import(system.file("tests", "v1.gff", package = "rtracklayer"))
export(track, "my.gff", version = "3")
## equivalently,
export(track, "my.gff3")
## or
con <- file("my.gff3")
export(track, con, "gff3")
## or as a string
export(track, format = "gff3")
genomeBrowsers()
Get available genome browsers
Description
Gets the identifiers of the loaded genome browser drivers.
Usage
genomeBrowsers(where = topenv(parent.frame()))
Arguments
Argument | Description |
---|---|
where | The environment in which to search for drivers. |
Details
This searches the specified environment for classes that extend BrowserSession . The prefix of the class name, e.g. "ucsc" in "UCSCSession", is returned for each driver.
Value
A character vector of driver identifiers.
Seealso
browseGenome
and browserSession
that create browserSession
implementations given an identifier
returned from this function.
Author
Michael Lawrence
laySequence_methods()
Load a sequence
Description
Methods for loading sequences.
layTrack_methods()
Laying tracks
Description
Methods for loading tracks into genome browsers.
Usage
track(object, ...) <- value
Arguments
Argument | Description |
---|---|
object | A BrowserSession into which the track is loaded. |
value | The track(s) to load. |
list() | Arguments to pass on to methods. Can be: |
name The name(s) of the track(s) being loaded.
view Whether to create a view of the track after loading it.
Seealso
track
for getting a track from a session.
Examples
session <- browserSession()
track <- import(system.file("tests", "v1.gff", package = "rtracklayer"))
track(session, "My Track") <- track
liftOver()
Lift intervals between genome builds
Description
A reimplementation of the UCSC liftover tool for lifting features from one genome build to another. In our preliminary tests, it is significantly faster than the command line tool. Like the UCSC tool, a chain file is required input.
Usage
liftOver(x, chain, ...)
Arguments
Argument | Description |
---|---|
x | The intervals to lift-over, usually a GRanges . |
chain | A Chain object, usually imported with import.chain , or something coercible to one. |
list() | Arguments for methods. |
Value
A GRangesList
object. Each element contains the ranges mapped
from the corresponding element in the input (may be one-to-many).
Author
Michael Lawrence
References
http://genome.ucsc.edu/cgi-bin/hgLiftOver
Examples
chain <- import.chain("hg19ToHg18.over.chain")
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
tx_hg19 <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
tx_hg18 <- liftOver(tx_hg19, chain)
readGFF()
Reads a file in GFF format
Description
Reads a file in GFF format and creates a data frame or DataFrame object from it. This is a low-level function that should not be called by user code.
Usage
readGFF(filepath, version=0,
columns=NULL, tags=NULL, filter=NULL, nrows=-1,
raw_data=FALSE)
GFFcolnames(GFF1=FALSE)
Arguments
Argument | Description |
---|---|
filepath | A single string containing the path or URL to the file to read. Alternatively can be a connection. |
version | readGFF should do a pretty descent job at detecting the GFF version. Use this argument only if it doesn't or if you want to force it to parse and import the file as if its 9-th column was in a different format than what it really is (e.g. specify version=1 on a GTF or GFF3 file to interpret its 9-th column as the "group" column of a GFF1 file). Supported versions are 1, 2, and 3. |
columns | The standard GFF columns to load. All of them are loaded by default. |
tags | The tags to load. All of them are loaded by default. |
filter | |
nrows | -1 or the maximum number of rows to read in (after filtering). |
raw_data | |
GFF1 |
Value
A DataFrame with columns corresponding to those in the GFF.
Seealso
makeGRangesFromDataFrame
in the GenomicRanges package for making a GRanges object from a data frame or DataFrame object.makeTxDbFromGFF
in the GenomicFeatures package for importing a GFF file as a TxDb object.The DataFrame class in the S4Vectors package.
Author
H. Pages
Examples
## Standard GFF columns.
GFFcolnames()
GFFcolnames(GFF1=TRUE) # "group" instead of "attributes"
tests_dir <- system.file("tests", package="rtracklayer")
test_gff3 <- file.path(tests_dir, "genes.gff3")
## Load everything.
df0 <- readGFF(test_gff3)
head(df0)
## Load some tags only (in addition to the standard GFF columns).
my_tags <- c("ID", "Parent", "Name", "Dbxref", "geneID")
df1 <- readGFF(test_gff3, tags=my_tags)
head(df1)
## Load no tags (in that case, the "attributes" standard column
## is loaded).
df2 <- readGFF(test_gff3, tags=character(0))
head(df2)
## Load some standard GFF columns only (in addition to all tags).
my_columns <- c("seqid", "start", "end", "strand", "type")
df3 <- readGFF(test_gff3, columns=my_columns)
df3
table(df3$seqid, df3$type)
makeGRangesFromDataFrame(df3, keep.extra.columns=TRUE)
## Combine use of 'columns' and 'tags' arguments.
readGFF(test_gff3, columns=my_columns, tags=c("ID", "Parent", "Name"))
readGFF(test_gff3, columns=my_columns, tags=character(0))
## Use the 'filter' argument to load only features of type "gene"
## or "mRNA" located on chr10.
my_filter <- list(type=c("gene", "mRNA"), seqid="chr10")
readGFF(test_gff3, filter=my_filter)
readGFF(test_gff3, columns=my_columns, tags=character(0), filter=my_filter)
targets()
microRNA target sites
Description
A data frame of human microRNA target sites retrieved from MiRBase. This is a
subset of the hsTargets
data frame in the microRNA
package. See the rtracklayer
vignette for more details.
Format
A data frame with 2981 observations on the following 6 variables. list(" ", " ", list(list(list("name")), list("The miRBase ID of the microRNA.")), " ", " ", list(list(list("target")), list("The Ensembl ID of the targeted transcript.")), " ", " ", list(list(list("chrom")), list("The name of the chromosome for target site.")), " ", " ", list(list(list("start")), list("Target start position.")), " ", " ", list(list(list("end")), list("Target stop position.")), " ", " ", list(list(list("strand")), list("The strand of the target site, ", list(
""+""), ", or
", " ", list(""-""), ".")), " ", " ")
Usage
data(targets)
Examples
data(targets)
targetTrack <- with(targets,
GenomicData(IRanges::IRanges(start, end),
strand = strand, chrom = chrom))
tracks_methods()
Accessing track names
Description
Methods for getting and setting track names.
ucscGenomes()
Get available genomes on UCSC
Description
Get a data.frame
describing the available UCSC genomes.
Usage
ucscGenomes(organism=FALSE)
Arguments
Argument | Description |
---|---|
organism | A logical(1) indicating whether scientific name should be appended. |
Details
For populating the organism column, the web url http://genome.ucsc.edu/cgi-bin is scraped for every assembly version to get the scientific name.
Value
A data.frame
with the following columns:
*
Seealso
UCSCSession for details on specifying the genome.
Author
Michael Lawrence
Examples
ucscGenomes()
ucscSession_class()
Class "UCSCSession"
Description
An implementation of BrowserSession for the UCSC genome browser.
Seealso
browserSession
for creating instances of this class.
Author
Michael Lawrence
ucscTrackLine_class()
Class "TrackLine"
Description
An object representing a "track line" in the UCSC format. There are two concrete types of track lines: BasicTrackLine (used for most types of tracks) and GraphTrackLine (used for graphical tracks). This class only declares the common elements between the two.
Seealso
BasicTrackLine (used for most types of tracks) and GraphTrackLine (used for Wiggle/bedGraph tracks).
Author
Michael Lawrence
References
http://genome.ucsc.edu/goldenPath/help/customTrack.html#TRACK for the official documentation.
ucscTrackModes_class()
Class "UCSCTrackModes"
Description
A vector of view modes ("hide", "dense", "full", "pack", "squish") for each track in a UCSC view.
Seealso
UCSCView on which track view modes may be set.
Author
Michael Lawrence
ucscTrackModes_methods()
Accessing UCSC track modes
Description
Generics for getting and setting UCSC track visibility modes ("hide", "dense", "full", "pack", "squish").
Seealso
trackNames
and trackNames<-
for just
getting or setting which tracks are visible (not of mode "hide").
Examples
# Tracks "foo" and "bar" are fully shown, "baz" is hidden
modes <- ucscTrackModes(full = c("foo", "bar"), hide = "baz")
# Update the modes to hide track "bar"
modes2 <- ucscTrackModes(modes, hide = "bar")
ucscView_class()
Class "UCSCView"
Description
An object representing a view of a genome in the UCSC browser.
Seealso
browserView
for creating instances of this class.
Author
Michael Lawrence
wigToBigWig()
Convert WIG to BigWig
Description
This function calls the Kent C library to efficiently convert a WIG file to a BigWig file, without loading the entire file into memory. This solves the problem where simple tools write out text WIG files, instead of more efficiently accessed binary BigWig files.
Usage
wigToBigWig(x, seqinfo,
dest = paste(file_path_sans_ext(x, TRUE), "bw", sep = "."),
clip = FALSE)
Arguments
Argument | Description |
---|---|
x | Path or URL to the WIG file. Connections are not supported. |
seqinfo | Seqinfo object, describing the genome of the data. All BigWig files must have this defined. |
dest | The path to which to write the BigWig file. Defaults to x with the extension changed to bw . |
clip | If TRUE , regions outside of seqinfo will be clipped, so that no error is thrown. |
Seealso
BigWig
import and export support
Author
Michael Lawrence
wigTrackLine_class()
Class "GraphTrackLine"
Description
A UCSC track line for graphical tracks.
Seealso
export.wig
, export.bedGraph
for exporting
graphical tracks.
Author
Michael Lawrence
References
Official documentation: http://genome.ucsc.edu/goldenPath/help/wiggle.html .