bioconductor v3.9.0 AnnotationDbi
Implements a user-friendly interface for querying SQLite-based annotation data packages.
Link to this section Summary
Functions
AnnDbObj objects
Check the SQL data contained in an SQLite-based annotation package
AnnotationDb objects and their progeny, methods etc.
AnnotationDbi internals
Bimap objects and the Bimap interface
Methods for getting/setting the filters on a Bimap object
Formatting a Bimap as a list or character vector
Methods for getting/setting the direction of a Bimap object, and undirected methods for getting/counting/setting its keys
Environment-like API for Bimap objects
Methods for manipulating the keys of a Bimap object
Methods for manipulating a Bimap object in a data-frame style
Descriptions of available values for columns
and
keytypes
for GO.db.
GOFrame and GOAllFrame objects
Class "GOTerms"
Descriptions of available values for columns
and
keytypes
for inparanpoid packages.
KEGGFrame objects
Descriptions of available values for columns
and keytypes
.
Creates a simple Bimap from a SQLite database in an situation that is external to AnnotationDbi
Convenience functions for mapping IDs through an appropriate set of annotation packages
A convenience function to generate graphs based on the GO.db package
Create GO to Entrez Gene maps for chip-based packages
Org package contained in annotation object
Print method for probetable objects
Convert a vector to a quoted string for use as a SQL value list
A replacement for unlist() that does not mangle the names
Link to this section Functions
AnnDbObj_class()
AnnDbObj objects
Description
The AnnDbObj class is the most general container for storing any kind of SQLite-based annotation data.
Details
Many classes in AnnotationDbi inherit directly or indirectly from the AnnDbObj class. One important particular case is the AnnDbBimap class which is the lowest class in the AnnDbObj hierarchy to also inherit the Bimap interface.
Seealso
dbConnect
,
dbListTables
,
dbListFields
,
dbGetQuery
,
Bimap
Examples
library("hgu95av2.db")
dbconn(hgu95av2ENTREZID) # same as hgu95av2_dbconn()
dbfile(hgu95av2ENTREZID) # same as hgu95av2_dbfile()
dbmeta(hgu95av2_dbconn(), "ORGANISM")
dbmeta(hgu95av2_dbconn(), "DBSCHEMA")
dbmeta(hgu95av2_dbconn(), "DBSCHEMAVERSION")
library("DBI")
dbListTables(hgu95av2_dbconn()) #lists all tables on connection
## If you use dbSendQuery instead of dbGetQuery
## (NOTE: for ease of use, this is defintitely NOT reccomended)
## Then you may need to know how to list results objects
dbListResults(hgu95av2_dbconn()) #for listing results objects
## You can also list the fields by using this connection
dbListFields(hgu95av2_dbconn(), "probes")
dbListFields(hgu95av2_dbconn(), "genes")
dbschema(hgu95av2ENTREZID) # same as hgu95av2_dbschema()
## According to the schema, the probes._id column references the genes._id
## column. Note that in all tables, the "_id" column is an internal id with
## no biological meaning (provided for allowing efficient joins between
## tables).
## The information about the probe to gene mapping is in probes:
dbGetQuery(hgu95av2_dbconn(), "SELECT * FROM probes LIMIT 10")
## This mapping is in fact the ENTREZID map:
toTable(hgu95av2ENTREZID)[1:10, ] # only relevant columns are retrieved
dbInfo(hgu95av2GO) # same as hgu95av2_dbInfo()
##Advanced example:
##Sometimes you may wish to join data from across multiple databases at
##once:
## In the following example we will attach the GO database to the
## hgu95av2 database, and then grab information from separate tables
## in each database that meet a common criteria.
library(hgu95av2.db)
library("GO.db")
attachSql <- paste('ATTACH "', GO_dbfile(), '" as go;', sep = "")
dbGetQuery(hgu95av2_dbconn(), attachSql)
sql <- 'SELECT DISTINCT a.go_id AS "hgu95av2.go_id",
a._id AS "hgu95av2._id",
g.go_id AS "GO.go_id", g._id AS "GO._id",
g.term, g.ontology, g.definition
FROM go_bp_all AS a, go.go_term AS g
WHERE a.go_id = g.go_id LIMIT 10;'
data <- dbGetQuery(hgu95av2_dbconn(), sql)
data
## For illustration purposes, the internal id "_id" and the "go_id"
## from both tables is included in the output. This makes it clear
## that the "go_ids" can be used to join these tables but the internal
## ids can NOT. The internal IDs (which are always shown as _id) are
## suitable for joins within a single database, but cannot be used
## across databases.
AnnDbPkg_checker()
Check the SQL data contained in an SQLite-based annotation package
Description
Check the SQL data contained in an SQLite-based annotation package.
Usage
checkMAPCOUNTS(pkgname)
Arguments
Argument | Description |
---|---|
pkgname | The name of the SQLite-based annotation package to check. |
Seealso
Author
H. Pagès
Examples
checkMAPCOUNTS("org.Sc.sgd.db")
AnnotationDb_class()
AnnotationDb objects and their progeny, methods etc.
Description
AnnotationDb
is the virtual base class for all annotation
packages. It contain a database connection and is meant to be the
parent for a set of classes in the Bioconductor annotation
packages. These classes will provide a means of dispatch for a
widely available set of select
methods and thus allow the
easy extraction of data from the annotation packages.
select
, columns
and keys
are used together to
extract data from an AnnotationDb
object (or any object derived
from the parent class). Examples of classes derived from the
AnnotationDb
object include (but are not limited to):
ChipDb
, OrgDb
GODb
, InparanoidDb
and
ReactomeDb
.
columns
shows which kinds of data can be returned for the
AnnotationDb
object.
keytypes
allows the user to discover which keytypes can be
passed in to select
or keys
and the keytype
argument.
keys
returns keys for the database contained in the
AnnotationDb
object . This method is already documented in the
keys manual page but is mentioned again here because it's usage with
select
is so intimate. By default it will return the primary
keys for the database, but if used with the keytype
argument,
it will return the keys from that keytype.
select
will retrieve the data as a data.frame based on
parameters for selected keys
columns
and keytype
arguments. Users should be warned that if you call select
and request
columns that have multiple matches for your keys, select will return a
data.frame with one row for each possible match. This has the effect that if
you request multiple columns and some of them have a many to one relationship
to the keys, things will continue to multiply accordingly. So it's not a good
idea to request a large number of columns unless you know that what you are
asking for should have a one to one relationship with the initial set of keys.
In general, if you need to retrieve a column (like GO) that has a many to one
relationship to the original keys, it is most useful to extract that
separately.
mapIds
gets the mapped ids (column) for a set of keys that are of a
particular keytype. Usually returned as a named character vector.
saveDb
will take an AnnotationDb object and save the database
to the file specified by the path passed in to the file
argument.
loadDb
takes a .sqlite database file as an argument and uses
data in the metadata table of that file to return an AnnotationDb
style object of the appropriate type.
species
shows the genus and species label currently attached to
the AnnotationDb
objects database.
dbfile
gets the database file associated with an object.
dbconn
gets the datebase connection associated with an object.
taxonomyId
gets the taxonomy ID associated with an object (if available).
Usage
columns(x)
keytypes(x)
keys(x, keytype, ...)
select(x, keys, columns, keytype, ...)
mapIds(x, keys, column, keytype, ..., multiVals)
saveDb(x, file)
loadDb(file, packageName=NA)
Arguments
Argument | Description |
---|---|
x | the AnnotationDb object. But in practice this will mean an object derived from an AnnotationDb object such as a OrgDb or ChipDb object. |
keys | the keys to select records for from the database. All possible keys are returned by using the keys method. |
columns | the columns or kinds of things that can be retrieved from the database. As with keys , all possible columns are returned by using the columns method. |
keytype | the keytype that matches the keys used. For the select methods, this is used to indicate the kind of ID being used with the keys argument. For the keys method this is used to indicate which kind of keys are desired from keys |
column | the column to search on (for mapIds ). Different from columns in that it can only have a single element for the value |
|...
| other arguments. These include: list("
", " ", list(list("pattern:"), list("the pattern to match (used by keys)")), "
", " ", list(list("column:"), list("the column to search on. This is used by keys and is
", " for when the thing you want to pattern match is different from
", " the keytype, or when you want to simply want to get keys that
", " have a value for the thing specified by the column argument.")), "
", " ", list(list("fuzzy:"), list("TRUE or FALSE value. Use fuzzy matching? (this is
", " used with pattern by the keys method)")), |
"
", " ")
|multiVals
| What should mapIds
do when there are multiple values that could be returned? Options include: list("
", " ", list(list("first:"), list("This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior")), "
", " ", list(list("list:"), list("This will just returns a list object to the end user")), "
", " ", list(list("filter:"), list("This will remove all elements that contain multiple matches and will therefore return a shorter vector than what came in whenever some of the keys match more than one value")), |
"
", " ", list(list("asNA:"), list("This will return an NA value whenever there are multiple matches")), " ", " ", list(list("CharacterList:"), list("This just returns a SimpleCharacterList object")), " ", " ", list(list("FUN:"), list("You can also supply a function to the ", list("multiVals"), " argument for custom behaviors. The function must take a single argument and return a single value. This function will be applied to all the elements and will serve a 'rule' that for which thing to keep when there is more than one element. So for example this example function will always grab the last element in each result: ",
list(" last <- function(x){x[[length(x)]]} "), "
", " ")), "
", " ")
|file
| an sqlite
file path. A string the represents the full name you want for your sqlite database and also where to put it.|
|packageName
| for internal use only|
Value
keys
, columns
and keytypes
each return a character
vector or possible values. select
returns a data.frame.
Seealso
keys
,
dbConnect
,
dbListTables
,
dbListFields
,
dbGetQuery
,
Bimap
Author
Marc Carlson
Examples
require(hgu95av2.db)
## display the columns
columns(hgu95av2.db)
## get the 1st 6 possible keys
keys <- head( keys(hgu95av2.db) )
keys
## lookup gene symbol and unigene ID for the 1st 6 keys
select(hgu95av2.db, keys=keys, columns = c("SYMBOL","UNIGENE"))
## get keys based on unigene
keyunis <- head( keys(hgu95av2.db, keytype="UNIGENE") )
keyunis
## list supported key types
keytypes(hgu95av2.db)
## lookup gene symbol and unigene ID based on unigene IDs by setting
## the keytype to "UNIGENE" and passing in unigene keys:
select(hgu95av2.db, keys=keyunis, columns = c("SYMBOL","UNIGENE"),
keytype="UNIGENE")
keys <- head(keys(hgu95av2.db, 'ENTREZID'))
## get a default result (captures only the 1st element)
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID')
## or use a different option
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID',
multiVals="CharacterList")
## Or define your own function
last <- function(x){x[[length(x)]]}
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID',
multiVals=last)
## For other ways to access the DB, you can use dbfile() or dbconn() to extract
dbconn(hgu95av2.db)
dbfile(hgu95av2.db)
## Try to retrieve an associated taxonomyId
taxonomyId(hgu95av2.db)
AnnotationDbi_internals()
AnnotationDbi internals
Description
AnnotationDbi objects, classes and methods that are not intended to be used directly.
Bimap()
Bimap objects and the Bimap interface
Description
What we usually call "annotation maps" are in fact Bimap objects. In the following sections we present the bimap concept and the Bimap interface as it is defined in AnnotationDbi.
Seealso
Bimap-direction , Bimap-keys , Bimap-toTable , BimapFormatting , Bimap-envirAPI
Examples
library(hgu95av2.db)
ls(2)
hgu95av2GO # calls the "show" method
summary(hgu95av2GO)
hgu95av2GO2PROBE # calls the "show" method
summary(hgu95av2GO2PROBE)
BimapFiltering()
Methods for getting/setting the filters on a Bimap object
Description
These methods are part of the Bimap interface (see
?
for a quick overview of the Bimap objects
and their interface).
Some of these methods are for getting or setting the filtering status on a Bimap object so that the mapping object can toggle between displaying all probes, only single probes (the defualt) or only multiply matching probes.
Other methods are for viewing or setting the filter threshold value on a InpAnnDbBimap object.
Usage
## Making a Bimap object that does not prefilter to remove probes that
## match multiple genes:
toggleProbes(x, value)
hasMultiProbes(x) ##T/F test for exposure of single probes
hasSingleProbes(x) ##T/F test for exposure of mulitply matched probes
## Looking at the SQL filter values for a Bimap
getBimapFilters(x)
## Setting the filter on an InpAnnDbBimap object
setInpBimapFilter(x,value)
Arguments
Argument | Description |
---|---|
x | A Bimap object. |
value | A character vector containing the new value that the Bimap should use as the filter. Or the value to toggle a probe mapping to: "all", "single", or "multiple". |
Details
toggleProbes(x)
is a methods for creating Bimaps that have an
alternate filter for which probes get exposed based upon whether these
probes map to multiple genes or not.
hasMultiProbes(x)
and hasSingleProbes(x)
are provided to
give a quick test about whether or not such probes are exposed in a
given mapping.
getBimapFilters(x)
will list all the SQL filters applied to a
Bimap object.
setInpBimapFilters(x)
will allow you to pass a value as a
character string which will be used as a filter. In order to be
useful with the InpAnnDbBimap objects provided in the inparanoid
packages, this value needs to be a to digit number written as a
percentage. So for example "80 %" or "95%" would be acceptable. This
is owing to the nature of the inparanoid data set.
Value
A Bimap object of the same subtype as x
for
exposeAllProbes(x)
, maskMultiProbes(x)
and
maskSingleProbes(x)
.
A TRUE or FALSE value in the case of hasMultiProbes(x)
and
hasSingleProbes(x)
.
Seealso
Bimap ,
Bimap-keys ,
Bimap-direction ,
BimapFormatting ,
Bimap-envirAPI ,
nhit
Author
M. Carlson
Examples
## Make a Bimap that contains all the probes
require("hgu95av2.db")
mapWithMultiProbes <- toggleProbes(hgu95av2ENTREZID, "all")
count.mappedLkeys(hgu95av2ENTREZID)
count.mappedLkeys(mapWithMultiProbes)
## Check that it has both multiply and singly matching probes:
hasMultiProbes(mapWithMultiProbes)
hasSingleProbes(mapWithMultiProbes)
## Make it have Multi probes ONLY:
OnlyMultiProbes = toggleProbes(mapWithMultiProbes, "multiple")
hasMultiProbes(OnlyMultiProbes)
hasSingleProbes(OnlyMultiProbes)
## Convert back to a default map with only single probes exposed
OnlySingleProbes = toggleProbes(OnlyMultiProbes, "single")
hasMultiProbes(OnlySingleProbes)
hasSingleProbes(OnlySingleProbes)
## List the filters on the inparanoid mapping
# library(hom.Dm.inp.db)
# getBimapFilters(hom.Dm.inpANOGA)
## Here is how you can make a mapping with a
##different filter than the default:
# f80 = setInpBimapFilter(hom.Dm.inpANOGA, "80%")
# dim(hom.Dm.inpANOGA)
# dim(f80)
BimapFormatting()
Formatting a Bimap as a list or character vector
Description
These functions format a Bimap as a list or character vector.
Usage
## Formatting as a list
as.list(x, ...)
## Formatting as a character vector
#as.character(x, ...)
Arguments
Argument | Description |
---|---|
x | A Bimap object. |
... | Further arguments are ignored. |
Seealso
Author
H. Pagès
Examples
library(hgu95av2.db)
as.list(hgu95av2CHRLOC)[1:9]
as.list(hgu95av2ENTREZID)[1:9]
as.character(hgu95av2ENTREZID)[1:9]
Bimap_direction()
Methods for getting/setting the direction of a Bimap object, and undirected methods for getting/counting/setting its keys
Description
These methods are part of the Bimap interface
(see ?
for a quick overview of the Bimap
objects and their interface).
They are divided in 2 groups: (1) methods for getting or setting the direction of a Bimap object and (2) methods for getting, counting or setting the left or right keys (or mapped keys only) of a Bimap object. Note that all the methods in group (2) are undirected methods i.e. what they return does NOT depend on the direction of the map (more on this below).
Usage
## Getting or setting the direction of a Bimap object
direction(x)
direction(x) <- value
revmap(x, ...)
## Getting, counting or setting the left or right keys (or mapped
## keys only) of a Bimap object
Lkeys(x)
Rkeys(x)
Llength(x)
Rlength(x)
mappedLkeys(x)
mappedRkeys(x)
count.mappedLkeys(x)
count.mappedRkeys(x)
Lkeys(x) <- value
Rkeys(x) <- value
list(list("subset"), list("Bimap"))(x, Lkeys = NULL, Rkeys = NULL, drop.invalid.keys = FALSE)
list(list("subset"), list("AnnDbBimap"))(x, Lkeys = NULL, Rkeys = NULL, drop.invalid.keys = FALSE,
objName = NULL)
Arguments
Argument | Description |
---|---|
x | A Bimap object. |
value | A single integer or character string indicating the new direction in direction(x) <- value . A character vector containing the new keys (must be a subset of the current keys) in Lkeys(x) <- value and Rkeys(x) <- value . |
|Lkeys, Rkeys, drop.invalid.keys, objName, ...
| Extra arguments for revmap
and subset
. Extra argument for revmap
can be: list("
", " ", list(list(list("objName")), list("
", " The name to give to the reversed map (only supported if ", list("x"), " is an
", " ", list("AnnDbBimap"), " object).
", " ")), "
", " ") Extra arguments for subset
can be: list("
", " ", list(list(list("Lkeys")), list("
", " The new Lkeys.
", " ")), "
", " ", list(list(list("Rkeys")), list("
", " The new Rkeys.
", " ")), "
", " ", list(list(list("drop.invalid.keys")), list("
", " If ", list("drop.invalid.keys=FALSE"), " (the default), an error will be raised
", " if the new Lkeys or Rkeys contain invalid keys i.e. keys that don't belong
", " to the current Lkeys or Rkeys.
", " If ", list(|
"drop.invalid.keys=TRUE"), ", invalid keys are silently dropped.
", " ")), " ", " ", list(list(list("objName")), list(" ", " The name to give to the submap (only supported if ", list("x"), " is an ", " ", list("AnnDbBimap"), " object). ", " ")), " ", " ")
Details
All Bimap objects have a direction which can be left-to-right
(i.e. the mapping goes from the left keys to the right keys)
or right-to-left (i.e. the mapping goes from the right keys to the
left keys).
A Bimap object x
that maps from left to right is
considered to be a direct map. Otherwise it is considered to be an
indirect map (when it maps from right to left).
direction
returns 1
on a direct map and -1
otherwise.
The direction of x
can be changed with direction(x) <- value
where value must be 1
or -1
.
An easy way to reverse a map (i.e. to change its direction) is to
do direction(x) <- - direction(x)
, or, even better, to use
revmap(x)
which is actually the recommended way for doing it.
The Lkeys
and Rkeys
methods return respectively
the left and right keys of a Bimap object.
Unlike the keys
method (see ?
for
more information), these methods are direction-independent i.e. what
they return does NOT depend on the direction of the map.
Such methods are also said to be "undirected methods"
and methods like the keys
method are said
to be "directed methods".
All the methods described below are also "undirected methods".
Llength(x)
and Rlength(x)
are equivalent to
(but more efficient than) length(Lkeys(x))
and
length(Rkeys(x))
, respectively.
The mappedLkeys
(or mappedRkeys
) method returns
the left keys (or right keys) that are mapped to at least one right
key (or one left key).
count.mappedLkeys(x)
and count.mappedRkeys(x)
are
equivalent to (but more efficient than) length(mappedLkeys(x))
and length(mappedRkeys(x))
, respectively. These functions give
overall summaries, if you want to know how many Rkeys correspond to a
given Lkey you can use the nhit
function.
Lkeys(x) <- value
and Rkeys(x) <- value
are the
undirected versions of keys(x) <- value
(see ?
for more information) and subset(x, Lkeys=new_Lkeys, Rkeys=new_Rkeys)
is provided as a convenient way to reduce the sets of left
and right keys in one single function call.
Value
1L
or -1L
for direction
.
A Bimap object of the same subtype as x
for revmap
and subset
.
A character vector for Lkeys
, Rkeys
, mappedLkeys
and mappedRkeys
.
A single non-negative integer for Llength
, Rlength
,
count.mappedLkeys
and count.mappedRkeys
.
Seealso
Bimap ,
Bimap-keys ,
BimapFormatting ,
Bimap-envirAPI ,
nhit
Author
H. Pagès
Examples
library(hgu95av2.db)
ls(2)
x <- hgu95av2GO
x
summary(x)
direction(x)
length(x)
Llength(x)
Rlength(x)
keys(x)[1:4]
Lkeys(x)[1:4]
Rkeys(x)[1:4]
count.mappedkeys(x)
count.mappedLkeys(x)
count.mappedRkeys(x)
mappedkeys(x)[1:4]
mappedLkeys(x)[1:4]
mappedRkeys(x)[1:4]
y <- revmap(x)
y
summary(y)
direction(y)
length(y)
Llength(y)
Rlength(y)
keys(y)[1:4]
Lkeys(y)[1:4]
Rkeys(y)[1:4]
## etc...
## Get rid of all unmapped keys (left and right)
z <- subset(y, Lkeys=mappedLkeys(y), Rkeys=mappedRkeys(y))
Bimap_envirAPI()
Environment-like API for Bimap objects
Description
These methods allow the user to manipulate any Bimap object as if it was an environment. This environment-like API is provided for backward compatibility with the traditional environment-based maps.
Usage
ls(name, pos = -1L, envir = as.environment(pos), all.names = FALSE,
pattern, sorted = TRUE)
exists(x, where, envir, frame, mode, inherits)
get(x, pos, envir, mode, inherits)
#x[[i]]
#x$name
## Converting to a list
mget(x, envir, mode, ifnotfound, inherits)
eapply(env, FUN, ..., all.names, USE.NAMES)
#contents(object, all.names)
## Additional convenience method
sample(x, size, replace=FALSE, prob=NULL, ...)
Arguments
Argument | Description |
---|---|
name | A Bimap object for ls . A key as a literal character string or a name (possibly backtick quoted) for x$name . |
pos, all.names, USE.NAMES, where, frame, mode, inherits | Ignored. |
envir | Ignored for ls . A Bimap object for mget , get and exists . |
pattern | An optional regular expression. Only keys matching 'pattern' are returned. |
x | The key(s) to search for for exists , get and mget . A Bimap object for [[ and x$name . A Bimap object or an environment for sample . |
i | Single key specifying the map element to extract. |
ifnotfound | A value to be used if the key is not found. Only NA is currently supported. |
env | A Bimap object. |
FUN | The function to be applied (see original eapply for environments for the details). |
... | Optional arguments to FUN . |
size | Non-negative integer giving the number of map elements to choose. |
replace | Should sampling be with replacement? |
prob | A vector of probability weights for obtaining the elements of the map being sampled. |
sorted | logical(1) . When TRUE (default), return primary keys in sorted order. |
Seealso
ls
,
exists
,
get
,
mget
,
eapply
,
contents
,
sample
,
BimapFormatting ,
Bimap
Examples
library(hgu95av2.db)
x <- hgu95av2CHRLOC
ls(x)[1:3]
exists(ls(x)[1], x)
exists("titi", x)
get(ls(x)[1], x)
x[[ls(x)[1]]]
x$titi # NULL
mget(ls(x)[1:3], x)
eapply(x, length)
contents(x)
sample(x, 3)
Bimap_keys()
Methods for manipulating the keys of a Bimap object
Description
These methods are part of the Bimap interface
(see ?
for a quick overview of the Bimap
objects and their interface).
Usage
#length(x)
isNA(x)
mappedkeys(x)
count.mappedkeys(x)
keys(x) <- value
#x[i]
Arguments
Argument | Description |
---|---|
x | A Bimap object. If the method being caled is keys(x) , then x can also be a AnnotationDb object or one of that objects progeny. |
value | A character vector containing the new keys (must be a subset of the current keys). |
i | A character vector containing the keys of the map elements to extract. |
Details
keys(x)
returns the set of all valid keys for map x
.
For example, keys(hgu95av2GO)
is the set of all probe set IDs
for chip hgu95av2 from Affymetrix.
Please Note that in addition to Bimap
objest, keys(x)
will also work for AnnotationDb
objects and related objects
such as OrgDb
and ChipDb
objects.
Note also that the double bracket operator [[
for Bimap
objects is guaranteed to work only with a valid key and will raise
an error if the key is invalid.
(See ?`` for more information about this operator.)
length(x)is equivalent to (but more efficient than)
length(keys(x)). A valid key is not necessarily mapped (
[[will return an
NAon an unmapped key).
isNA(x)returns a logical vector of the same length as
xwhere the
TRUEvalue is used to mark keys that are NOT mapped and the
FALSEvalue to mark keys that ARE mapped.
mappedkeys(x)returns the subset of
keys(x)where only mapped keys were kept.
count.mappedkeys(x)is equivalent to (but more efficient than)
length(mappedkeys(x)). Two (almost) equivalent forms of subsetting a [Bimap](#bimap) object are provided: (1) by setting the keys explicitely and (2) by using the single bracket operator
[for [Bimap](#bimap) objects. Let's say the user wants to restrict the mapping to the subset of valid keys stored in character vector
mykeys. This can be done either with
keys(x) <- mykeys(form (1)) or with
y <- x[mykeys](form (2)). Please note that form (1) alters object
xin an irreversible way (the original keys are lost) so form (2) should be preferred. All the methods described on this pages are "directed methods" i.e. what they return DOES depend on the direction of the [Bimap](#bimap) object that they are applied to (see
?for more information about this). ## Value A character vector for
keysand
mappedkeys. A single non-negative integer for
lengthand
count.mappedkeys. A logical vector for
isNA. A [Bimap](#bimap) object of the same subtype as
xfor
x[i]` .
## Seealso
Bimap ,
Bimap-envirAPI ,
Bimap-toTable ,
BimapFormatting ,
AnnotationDb ,
select ,
columns
## Author
H. Pagès
## Examples
r library(hgu95av2.db) x <- hgu95av2GO x length(x) count.mappedkeys(x) x[1:3] links(x[1:3]) ## Keep only the mapped keys keys(x) <- mappedkeys(x) length(x) count.mappedkeys(x) x # now it is a submap ## The above subsetting can also be achieved with x <- hgu95av2GO[mappedkeys(hgu95av2GO)] ## mappedkeys() and count.mappedkeys() also work with an environment ## or a list z <- list(k1=NA, k2=letters[1:4], k3="x") mappedkeys(z) count.mappedkeys(z) ## retrieve the set of primary keys for the ChipDb object named 'hgu95av2.db' keys <- keys(hgu95av2.db) head(keys)
Bimap_toTable()
Methods for manipulating a Bimap object in a data-frame style
Description
These methods are part of the Bimap interface
(see ?
for a quick overview of the Bimap
objects and their interface).
Usage
## Extract all the columns of the map (links + right attributes)
toTable(x, ...)
nrow(x)
ncol(x)
#dim(x)
list(list("head"), list("FlatBimap"))(x, ...)
list(list("tail"), list("FlatBimap"))(x, ...)
## Extract only the links of the map
links(x)
count.links(x)
nhit(x)
## Col names and col metanames
colnames(x, do.NULL=TRUE, prefix="col")
colmetanames(x)
Lkeyname(x)
Rkeyname(x)
keyname(x)
tagname(x)
Rattribnames(x)
Rattribnames(x) <- value
Arguments
Argument | Description |
---|---|
x | A Bimap object (or a list or an environment for nhit ). |
... | Further arguments to be passed to or from other methods (see head or tail for the details). |
do.NULL | Ignored. |
prefix | Ignored. |
value | A character vector containing the names of the new right attributes (must be a subset of the current right attribute names) or NULL. |
Details
toTable(x)
turns Bimap object x
into a
data frame (see section "Flat representation of a bimap" in
?
for a short introduction to this concept).
For simple maps (i.e. no tags and no right attributes),
the resulting data frame has only 2 columns, one for the left
keys and one for the right keys, and each row in the data frame
represents a link (or edge) between a left and a right key.
For maps with tagged links (i.e. a tag is associated to each
link), toTable(x)
has one additional colmun for the tags
and there is still one row per link.
For maps with right attributes (i.e. a set of attributes is
associated to each right key), toTable(x)
has one
additional colmun per attribute. So for example if x
has
tagged links and 2 right attributes, toTable(x)
will
have 5 columns: one for the left keys, one for the right keys,
one for the tags, and one for each right attribute (always the
rightmost columns).
Note that if at least one of the right attributes is multivalued
then more than 1 row can be needed to represent the same link
so the number of rows in toTable(x)
can be strictly
greater than the number of links in the map.
nrow(x)
is equivalent to (but more efficient than)
nrow(toTable(x))
.
ncol(x)
is equivalent to (but more efficient than)
ncol(toTable(x))
.
colnames(x)
is equivalent to (but more efficient than)
colnames(toTable(x))
. Columns are named accordingly to
the names of the SQL columns where the data are coming from.
An important consequence of this that they are not necessarily
unique.
colmetanames(x)
returns the metanames for the column of
x
that are not right attributes. Valid column metanames
are "Lkeyname"
, "Rkeyname"
and "tagname"
.
Lkeyname
, Rkeyname
, tagname
and
Rattribnames
return the name of the column (or columns)
containing the left keys, the right keys, the tags and the right
attributes, respectively.
Like toTable(x)
, links(x)
turns x
into a
data frame but the right attributes (if any) are dropped.
Note that dropping the right attributes produces a data frame
that has eventually less columns than toTable(x)
and also eventually less rows because now exactly 1 row is
needed to represent 1 link.
count.links(x)
is equivalent to (but more efficient than)
nrow(links(x))
.
nhit(x)
returns a named integer vector indicating the
number of "hits" for each key in x
i.e. the number of links
that start from each key.
Value
A data frame for toTable
and links
.
A single integer for nrow
, ncol
and count.links
.
A character vector for colnames
, colmetanames
and Rattribnames
.
A character string for Lkeyname
, Rkeyname
and tagname
.
A named integer vector for nhit
.
Seealso
Bimap , BimapFormatting , Bimap-envirAPI
Author
H. Pagès
Examples
library(GO.db)
x <- GOSYNONYM
x
toTable(x)[1:4, ]
toTable(x["GO:0007322"])
links(x)[1:4, ]
links(x["GO:0007322"])
nrow(x)
ncol(x)
dim(x)
colnames(x)
colmetanames(x)
Lkeyname(x)
Rkeyname(x)
tagname(x)
Rattribnames(x)
links(x)[1:4, ]
count.links(x)
y <- GOBPCHILDREN
nhy <- nhit(y) # 'nhy' is a named integer vector
identical(names(nhy), keys(y)) # TRUE
table(nhy)
sum(nhy == 0) # number of GO IDs with no children
names(nhy)[nhy == max(nhy)] # the GO ID(s) with the most direct children
## Some sanity check
sum(nhy) == count.links(y) # TRUE
## Changing the right attributes of the GOSYNONYM map (advanced
## users only)
class(x) # GOTermsAnnDbBimap
as.list(x)[1:3]
colnames(x)
colmetanames(x)
tagname(x) # untagged map
Rattribnames(x)
Rattribnames(x) <- Rattribnames(x)[3:1]
colnames(x)
class(x) # AnnDbBimap
as.list(x)[1:3]
GOColsAndKeytypes()
Descriptions of available values for columns
and
keytypes
for GO.db.
Description
This manual page enumerates the kinds of data represented by the
values returned when the user calls columns
or keytypes
Details
All the possible values for columns
and keytypes
are listed
below.
list("
", " ", list(list("GOID:"), list("GO Identifiers")), "
", " ", list(list("DEFINITION:"), list("The definition of a GO Term")), "
", " ", list(list("ONTOLOGY:"), list("Which of the three Gene Ontologies (BP, CC, or MF)")), "
", " ", list(list("TERM:"), list("The actual GO term")), "
", " ")
To get the latest information about the date stamps and source URLS for the data used to make an annotation package, please use the metadata method as shown in the example below.
Author
Marc Carlson
Examples
library(GO.db)
## List the possible values for columns
columns(GO.db)
## List the possible values for keytypes
keytypes(GO.db)
## get some values back
keys <- head(keys(GO.db))
keys
select(GO.db, keys=keys, columns=c("TERM","ONTOLOGY"),
keytype="GOID")
## More infomation about the dates and original sources for these data:
metadata(GO.db)
GOFrame()
GOFrame and GOAllFrame objects
Description
These objects each contain a data frame which is required to be composed of 3 columns. The 1st column are GO IDs. The second are evidence codes and the 3rd are the gene IDs that match to the GO IDs using those evidence codes. There is also a slot for the organism that these anotations pertain to.
Details
The GOAllFrame object can only be generated from a GOFrame object and its contructor method does this automatically from a GOFrame argument. The purpose of these objects is to create a safe way for annotation data about GO from non-traditional sources to be used for analysis packages like GSEABase and eventually GOstats.
Examples
## Make up an example
genes = c(1,10,100)
evi = c("ND","IEA","IDA")
GOIds = c("GO:0008150","GO:0008152","GO:0001666")
frameData = data.frame(cbind(GOIds,evi,genes))
library(AnnotationDbi)
frame=GOFrame(frameData,organism="Homo sapiens")
allFrame=GOAllFrame(frame)
getGOFrameData(allFrame)
GOTerms_class()
Class "GOTerms"
Description
A class to represent Gene Ontology nodes
Seealso
makeGOGraph
shows how to make GO mappings into graphNEL objects.
Note
GOTerms objects are used to represent primary GO nodes in the SQLite-based annotation data package GO.db
References
Examples
gonode <- new("GOTerms", GOID="GO:1234567", Term="Test", Ontology="MF",
Definition="just for testing")
GOID(gonode)
Term(gonode)
Ontology(gonode)
##Or you can just use these methods on a GOTermsAnnDbBimap
##I want to show an ex., but don't want to require GO.db
require(GO.db)
FirstTenGOBimap <- GOTERM[1:10] ##grab the 1st ten
Term(FirstTenGOBimap)
##Or you can just use GO IDs directly
ids = keys(FirstTenGOBimap)
Term(ids)
InparanoidColsAndKeytypes()
Descriptions of available values for columns
and
keytypes
for inparanpoid packages.
Description
When the user calls columns
or keytypes
for an inparanoid
package, the columns
and keytypes
methods will give the
full genus and species names of all the organisms that are available.
Details
All the possible values for columns
and keytypes
are listed
below.
list(" ", " ", list(list("ACYRTHOSIPHON_PISUM:"), list("the pea aphid")), " ", " ", list(list("AEDES_AEGYPTI:"), list("a mosquito that can spread the dengue fever, Chikungunya and yellow fever viruses, and other diseases")), " ", " ", list(list("ANOPHELES_GAMBIAE:"), list("a mosquito notorious as a vector for malaria")), " ", " ", list(list("APIS_MELLIFERA:"), list("the western honey bee")), " ", " ", list(list("ARABIDOPSIS_THALIANA:"), list("the thale cress")), " ", " ",
list(list("ASPERGILLUS_FUMIGATUS:"), list("a fungus that causes disease in
", " immunodeficient individuals")), " ", " ", list(list("BATRACHOCHYTRIUM_DENDROBATIDIS:"), list("a chytrid fungus that causes the disease chytridiomycosis")), " ", " ", list(list("BOMBYX_MORI:"), list("the silk worm")), " ", " ", list(list("BOS_TAURUS:"), list("domestic cattle")), " ", " ", list(list("BRANCHIOSTOMA_FLORIDAE:"), list("a lancelet (amphioxus)")), " ", " ", list(list("BRUGIA_MALAYI:"),
list("a nematode (roundworm), one of the three causative agents of lymphatic filariasis")), "
", " ", list(list("CAENORHABDITIS_BRENNERI:"), list("a small nematode, closely related to the model organism Caenorhabditis elegans")), " ", " ", list(list("CAENORHABDITIS_BRIGGSAE:"), list("a small nematode, closely related to Caenorhabditis elegans")), " ", " ", list(list("CAENORHABDITIS_ELEGANS:"), list("a small nematode")), " ", " ", list(list("CAENORHABDITIS_JAPONICA:"), list(
"a gonochoristic (male-female) species related to C. elegans")), "
", " ", list(list("CAENORHABDITIS_REMANEI:"), list("a species of nematode (gonochoristic)")), " ", " ", list(list("CANDIDA_ALBICANS:"), list("a diploid fungus that grows both as yeast and filamentous cells and a causal agent of opportunistic oral and genital infections in humans")), " ", " ", list(list("CANDIDA_GLABRATA:"), list("a haploid yeast of the genus Candida")), " ", " ", list(list("CANIS_FAMILIARIS:"),
list("domestic dog")), "
", " ", list(list("CAPITELLA_SPI:"), list("a polychaete worm")), " ", " ", list(list("CAVIA_PORCELLUS:"), list("Guinea pig")), " ", " ", list(list("CHLAMYDOMONAS_REINHARDTII:"), list("a single celled green alga ")), " ", " ", list(list("CIONA_INTESTINALIS:"), list("a urochordata (sea squirt), a tunicate widely distributed in Northern European waters")), " ", " ", list(list("CIONA_SAVIGNYI:"), list("a urochordata (sea squirt)")), " ", " ", list(
list("COCCIDIOIDES_IMMITIS:"), list("a pathogenic fungus that resides in the soil")), "
", " ", list(list("COPRINOPSIS_CINEREUS:"), list("a species of mushroom")), " ", " ", list(list("CRYPTOCOCCUS_NEOFORMANS:"), list("an encapsulated yeast that can live in both plants and animals")), " ", " ", list(list("CRYPTOSPORIDIUM_HOMINIS:"), list("an obligate parasite of humans that can colonize the gastrointestinal tract")), " ", " ", list(list("CRYPTOSPORIDIUM_PARVUM:"), list("one of several protozoal species that cause cryptosporidiosis, a parasitic disease of the mammalian intestinal tract")),
"
", " ", list(list("CULEX_PIPIENS:"), list("the common house mosquito")), " ", " ", list(list("CYANIDIOSCHYZON_MEROLAE:"), list("a an algae that is the main organism in red tide")), " ", " ", list(list("DANIO_RERIO:"), list("the zebrafish")), " ", " ", list(list("DAPHNIA_PULEX:"), list("the most common species of water flea")), " ", " ", list(list("DEBARYOMYCES_HANSENII:"), list("a yeast that tolerates high concentrations of salt and is related to yeasts that cause disease, including Candida albicans")),
"
", " ", list(list("DICTYOSTELIUM_DISCOIDEUM:"), list("a species of soil-living amoeba, ", " AKA a slime mold")), " ", " ", list(list("DROSOPHILA_ANANASSAE:"), list("a fruit fly")), " ", " ", list(list("DROSOPHILA_GRIMSHAWI:"), list("a fruit fly")), " ", " ", list(list("DROSOPHILA_MELANOGASTER:"), list("a fruit fly")), " ", " ", list(list("DROSOPHILA_MOJAVENSIS:"), list("a fruit fly")), " ", " ", list(list("DROSOPHILA_PSEUDOOBSCURA:"), list("a fruit fly")), " ",
" ", list(list("DROSOPHILA_VIRILIS:"), list("a fruit fly")), "
", " ", list(list("DROSOPHILA_WILLISTONI:"), list("a fruit fly")), " ", " ", list(list("ENTAMOEBA_HISTOLYTICA:"), list("an anaerobic parasitic protozoan")), " ", " ", list(list("EQUUS_CABALLUS:"), list("domestic horse")), " ", " ", list(list("ESCHERICHIA_COLIK12:"), list("a laboratory strain of coliform bacteria")), " ", " ", list(list("FUSARIUM_GRAMINEARUM:"), list("a fungus that attacks cereal grains")), " ",
" ", list(list("GALLUS_GALLUS:"), list("domsticated chicken")), "
", " ", list(list("GASTEROSTEUS_ACULEATUS:"), list("three spined stickleback fish")), " ", " ", list(list("GIARDIA_LAMBLIA:"), list("a flagellated protozoan parasite")), " ", " ", list(list("HELOBDELLA_ROBUSTA:"), list("a leech")), " ", " ", list(list("IXODES_SCAPULARIS:"), list("the black legged deer tick, a vector for ", " lyme disease")), " ", " ", list(list("KLUYVEROMYCES_LACTIS:"), list("yeast commonly used for genetic studies")),
"
", " ", list(list("LEISHMANIA_MAJOR:"), list("a species of Leishmania, associated with zoonotic cutaneous leishmaniasis")), " ", " ", list(list("LOTTIA_GIGANTEA:"), list("a species of sea snail, a true limpet, a marine gastropod mollusc")), " ", " ", list(list("MACACA_MULATTA:"), list("the rhesus Macaque")), " ", " ", list(list("MAGNAPORTHE_GRISEA:"), list("rice blast fungus")), " ", " ", list(list("MONODELPHIS_DOMESTICA:"), list("grey short tailed opossum")), " ", " ",
list(list("MONOSIGA_BREVICOLLIS:"), list("a marine choanoflagellate")), "
", " ", list(list("MUS_MUSCULUS:"), list("lab mouse")), " ", " ", list(list("NASONIA_VITRIPENNIS:"), list("a small pteromalid parasitoid wasp")), " ", " ", list(list("NEMATOSTELLA_VECTENSIS:"), list("the starlet sea anemone")), " ", " ", list(list("NEUROSPORA_CRASSA:"), list("a type of red bread mould")), " ", " ", list(list("ORNITHORHYNCHUS_ANATINUS:"), list("the platypus")), " ", " ", list(list(
"ORYZA_SATIVA:"), list("rice")), "
", " ", list(list("ORYZIAS_LATIPES:"), list("medaka fish")), " ", " ", list(list("OSTREOCOCCUS_TAURI:"), list("a unicellular coccoid or spherically shaped green alga")), " ", " ", list(list("PAN_TROGLODYTES:"), list("chimp")), " ", " ", list(list("PEDICULUS_HUMANUS:"), list("a species of lice that infects humans")), " ", " ", list(list("PHYSCOMITRELLA_PATENS:"), list("a moss (Bryophyta) used as a model organism for studies on plant evolution")),
"
", " ", list(list("PHYTOPHTHORA_RAMORUM:"), list("the oomycete plant pathogen (sudden oak ", " death)")), " ", " ", list(list("PHYTOPHTHORA_SOJAE:"), list("an oomycete and a soil-borne plant pathogen that causes stem and root rot of soybean")), " ", " ", list(list("PLASMODIUM_FALCIPARUM:"), list("a protozoan parasite that causes malaria")), " ", " ", list(list("PLASMODIUM_VIVAX:"), list("a protozoal parasite and a human pathogen ", " that causes a more benign malaria")),
"
", " ", list(list("PONGO_PYGMAEUS:"), list("the Bornean orangutan")), " ", " ", list(list("POPULUS_TRICHOCARPA:"), list("black cottonwood; also known as western balsam poplar or California poplar")), " ", " ", list(list("PRISTIONCHUS_PACIFICUS:"), list("a diplogastrid nematode")), " ", " ", list(list("PUCCINIA_GRAMINIS:"), list("stem, black or cereal rusts")), " ", " ", list(list("RATTUS_NORVEGICUS:"), list("common lab rat")), " ", " ", list(list("RHIZOPUS_ORYZAE:"),
list("a fungus that lives worldwide in dead organic matter. An opportunistic human pathogen")), "
", " ", list(list("SACCHAROMYCES_CEREVISIAE:"), list("brewers yeast")), " ", " ", list(list("SCHISTOSOMA_MANSONI:"), list("a significant parasite of humans, a trematode that is one of the major agents of the disease schistosomiasis")), " ", " ", list(list("SCHIZOSACCHAROMYCES_POMBE:"), list("fission yeast")), " ", " ", list(list("SCLEROTINIA_SCLEROTIORUM:"), list("an omnivorous fungal plant pathogen")),
"
", " ", list(list("SORGHUM_BICOLOR:"), list("sorghum")), " ", " ", list(list("STAGONOSPORA_NODORUM:"), list("a fungal leaf spot disease")), " ", " ", list(list("STRONGYLOCENTROTUS_PURPURATUS:"), list("the purple sea urchin")), " ", " ", list(list("TAKIFUGU_RUBRIPES:"), list("Japanese pufferfish")), " ", " ", list(list("TETRAHYMENA_THERMOPHILA:"), list("a single celled cilliate")), " ", " ", list(list("TETRAODON_NIGROVIRIDIS:"), list("green spotted pufferfish (fresh water)")),
"
", " ", list(list("THALASSIOSIRA_PSEUDONANA:"), list("a species of marine centric diatom")), " ", " ", list(list("THEILERIA_ANNULATA:"), list("a tickborne protozoan pathogen which is a major cause of livestock disease in sub-tropical regions")), " ", " ", list(list("THEILERIA_PARVA:"), list("a parasitic protozoan, that causes East Coast fever (theileriosis) in cattle")), " ", " ", list(list("TRIBOLIUM_CASTANEUM:"), list("the red flour beetle")), " ", " ", list(list("TRICHOMONAS_VAGINALIS:"),
list("an anaerobic, flagellated protozoan")), "
", " ", list(list("TRICHOPLAX_ADHAERENS:"), list("Trichoplax adhaerens represents the simplest known animal, with the smallest known animal genome")), " ", " ", list(list("TRYPANOSOMA_CRUZI:"), list("a species of parasitic euglenoid trypanosomes. This species causes the trypanosomiasis diseases in humans and animals in America.")), " ", " ", list(list("USTILAGO_MAYDIS:"), list("a pathogenic plant fungus that causes smut disease on maize")),
"
", " ", list(list("XENOPUS_TROPICALIS:"), list("Western clawed frog")), " ", " ", list(list("YARROWIA_LIPOLYTICA:"), list("Yarrowia lipolytica is a "non-conventional" species of yeast, often used in genetic research because it differs from other well-studied species")), " ", " ")
To get the latest information about the date stamps and source URLS for the data used to make an annotation package, please use the metadata method as shown in the example below.
Author
Marc Carlson
Examples
library(hom.Hs.inp.db)
## List the possible values for columns
columns(hom.Hs.inp.db)
## List the possible values for keytypes
keytypes(hom.Hs.inp.db)
## get some values back
keys <- head(keys(hom.Hs.inp.db, keytype="HOMO_SAPIENS"))
keys
select(hom.Hs.inp.db, keys=keys, columns=c("BOS_TAURUS","EQUUS_CABALLUS"),
keytype="HOMO_SAPIENS")
## More infomation about the dates and original sources for these data:
metadata(hom.Hs.inp.db)
KEGGFrame()
KEGGFrame objects
Description
These objects each contain a data frame which is required to be
composed of 2 columns. The 1st column are KEGG IDs. The second are
the gene IDs that match to the KEGG IDs. There is also a slot for the
organism that these anotations pertain to. getKEGGFrameData
is
just an accessor method and returns the data.frame contained in the
KEGGFrame object and is mostly used by other code internally.
Details
The purpose of these objects is to create a safe way for annotation data about KEGG from non-traditional sources to be used for analysis packages like GSEABase and eventually Category.
Examples
## Make up an example
genes = c(2,9,9,10)
KEGGIds = c("04610","00232","00983","00232")
frameData = data.frame(cbind(KEGGIds,genes))
library(AnnotationDbi)
frame=KEGGFrame(frameData,organism="Homo sapiens")
getKEGGFrameData(frame)
colsAndKeytypes()
Descriptions of available values for columns
and keytypes
.
Description
This manual page enumerates the kinds of data represented by the
values returned when the user calls columns
or keytypes
Details
All the possible values for columns
and keytypes
are listed
below. Users will have to actually use these methods to learn which
of the following possible values actually apply in their case.
list(" ", " ", list(list("ACCNUM:"), list("GenBank accession numbers")), " ", " ", list(list("ALIAS:"), list("Commonly used gene symbols")), " ", " ", list(list("ARACYC:"), list("KEGG Identifiers for arabidopsis as indicated by aracyc")), " ", " ", list(list("ARACYCENZYME:"), list("Aracyc enzyme names as indicated by aracyc")), " ", " ", list(list("CHR:"), list("Chromosome (deprecated for Bioc > 3.1) For this ", " information you should look at a TxDb or OrganismDb object and ",
" search for an appropriate field like TXCHROM, EXONCHROM or
", " CDSCHROM. This information can also be retrieved from these objects ", " using an appropriate range based accesor like transcripts, ", " transcriptsBy etc.")), " ", " ", list(list("CHRLOC:"), list("Chromosome and starting base of associated gene ", " (deprecated for Bioc > 3.1) For this information you should look at ", " a TxDb or OrganismDb object and search for an appropriate field like ", " TXSTART etc. or even better use the associated range based accessors ",
" like transcripts or transcriptsBy to get back GRanges objects.")), "
", " ", list(list("CHRLOCEND:"), list("Chromosome and ending base of associated gene ", " (deprecated for Bioc > 3.1) For this information you should look at ", " a TxDb or OrganismDb object and search for an appropriate field like ", " TXEND etc. or even better use the associated range based accessors ", " like transcripts or transcriptsBy to get back GRanges objects.")), " ", " ", list(list("COMMON:"),
list("Common name")), "
", " ", list(list("DESCRIPTION:"), list("The description of the associated gene")), " ", " ", list(list("ENSEMBL:"), list("The ensembl ID as indicated by ensembl")), " ", " ", list(list("ENSEMBLPROT:"), list("The ensembl protein ID as indicated by ensembl")), " ", " ", list(list("ENSEMBLTRANS:"), list("The ensembl transcript ID as indicated by ensembl")), " ", " ", list(list("ENTREZID:"), list("Entrez gene Identifiers")), " ", " ", list(list("ENZYME:"),
list("Enzyme Commission numbers")), "
", " ", list(list("EVIDENCE:"), list("Evidence codes for GO associations with a gene of interest")), " ", " ", list(list("EVIDENCEALL:"), list("Evidence codes for GO (includes less specific terms)")), " ", " ", list(list("GENENAME:"), list("The full gene name")), " ", " ", list(list("GO:"), list("GO Identifiers associated with a gene of interest")), " ", " ", list(list("GOALL:"), list("GO Identifiers (includes less specific terms)")),
"
", " ", list(list("INTERPRO:"), list("InterPro identifiers")), " ", " ", list(list("IPI:"), list("IPI accession numbers")), " ", " ", list(list("MAP:"), list("cytoband locations")), " ", " ", list(list("OMIM:"), list("Online Mendelian Inheritance in Man identifiers")), " ", " ", list(list("ONTOLOGY:"), list("For GO Identifiers, which Gene Ontology (BP, CC, or MF)")), " ", " ", list(list("ONTOLOGYALL:"), list("Which Gene Ontology (BP, CC, or MF), (includes less specific terms)")),
"
", " ", list(list("ORF:"), list("Yeast ORF Identifiers")), " ", " ", list(list("PATH:"), list("KEGG Pathway Identifiers")), " ", " ", list(list("PFAM:"), list("PFAM Identifiers")), " ", " ", list(list("PMID:"), list("Pubmed Identifiers")), " ", " ", list(list("PROBEID:"), list("Probe or manufacturer Identifiers for a chip package")), " ", " ", list(list("PROSITE:"), list("Prosite Identifiers")), " ", " ", list(list("REFSEQ:"), list("Refseq Identifiers")), " ", " ",
list(list("SGD:"), list("Saccharomyces Genome Database Identifiers")), "
", " ", list(list("SMART:"), list("Smart Identifiers")), " ", " ", list(list("SYMBOL:"), list("The official gene symbol")), " ", " ", list(list("TAIR:"), list("TAIR Identifiers")), " ", " ", list(list("UNIGENE:"), list("Unigene Identifiers")), " ", " ", list(list("UNIPROT:"), list("Uniprot Identifiers")), " ", " ")
To get the latest information about the date stamps and source URLS for the data used to make an annotation package, please use the metadata method as shown in the example below.
Unless otherwise indicated above, the majority of the data for any one package is taken from the source indicated by either it's name (if it's an org package) OR from the name of it's associated org package. So for example, org.Hs.eg.db is using "eg" in the name to indicate that most of the data in that package comes from NCBI entrez gene based data. And org.At.tair.db uses data that primarily comes from tair. For chip packages, the relevant information is the organism package that they depend on. So for example, hgu95av2.db depends on org.Hs.eg.db, and is thus primarily based on NCBI entrez gene ID information.
Author
Marc Carlson
Examples
library(hgu95av2.db)
## List the possible values for columns
columns(hgu95av2.db)
## List the possible values for keytypes
keytypes(hgu95av2.db)
## get some values back
keys <- head(keys(hgu95av2.db))
keys
select(hgu95av2.db, keys=keys, columns=c("SYMBOL","PFAM"),
keytype="PROBEID")
## More infomation about the dates and original sources for these data:
metadata(hgu95av2.db)
createSimpleBimap()
Creates a simple Bimap from a SQLite database in an situation that is external to AnnotationDbi
Description
This function allows users to easily make a simple Bimap object for extra tables etc that they may wish to add to their annotation packages. For most Bimaps, their definition is stored inside of AnnotationDbi. The addition of this function is to help ensure that this does not become a limitation, by allowing simple extra Bimaps to easily be defined external to AnnotationDbi. Usually, this will be done in the zzz.R source file of a package so that these extra mappings can be seemlessly integrated with the rest of the package. For now, this function assumes that users will want to use data from just one table.
Usage
createSimpleBimap(tablename, Lcolname, Rcolname, datacache, objName,
objTarget)
Arguments
Argument | Description |
---|---|
tablename | The name of the database table to grab the mapping information from. |
Lcolname | The field name from the database table. These will become the Lkeys in the final mapping. |
Rcolname | The field name from the database table. These will become the Rkeys in the final mapping. |
datacache | The datacache object should already exist for every standard Annotation package. It is not exported though, so you will have to access it with ::: . It is needed to provide the connection information to the function. |
objName | This is the name of the mapping. |
objTarget | This is the name of the thing the mapping goes with. For most uses, this will mean the package name that the mapping belongs with. |
Examples
##You simply have to call this function to create a new mapping. For
##example, you could have created a mapping between the gene_name and
##the symbols fields from the gene_info table contained in the hgu95av2
##package by doing this:
library(hgu95av2.db)
hgu95av2NAMESYMBOL <- createSimpleBimap("gene_info",
"gene_name",
"symbol",
hgu95av2.db:::datacache,
"NAMESYMBOL",
"hgu95av2.db")
inpIDMapper()
Convenience functions for mapping IDs through an appropriate set of annotation packages
Description
These are a set of convenience functions that attempt to take a list of IDs along with some addional information about what those IDs are, what type of ID you would like them to be, as well as some information about what species they are from and what species you would like them to be from and then attempts to the simplest possible conversion using the organism and possible inparanoid annotation packages. By default, this function will drop ambiguous matches from the results. Please see the details section for more information about the parameters that can affect this. If a more complex treatment of how to handle multiple matches is required, then it is likely that a less convenient approach will be necessary.
Usage
inpIDMapper(ids, srcSpecies, destSpecies, srcIDType="UNIPROT",
destIDType="EG", keepMultGeneMatches=FALSE, keepMultProtMatches=FALSE,
keepMultDestIDMatches = TRUE)
intraIDMapper(ids, species, srcIDType="UNIPROT", destIDType="EG",
keepMultGeneMatches=FALSE)
idConverter(ids, srcSpecies, destSpecies, srcIDType="UNIPROT",
destIDType="EG", keepMultGeneMatches=FALSE, keepMultProtMatches=FALSE,
keepMultDestIDMatches = TRUE)
Arguments
Argument | Description |
---|---|
ids | a list or vector of original IDs to match |
srcSpecies | The original source species in in paranoid format. In other words, the 3 letters of the genus followed by 2 letters of the species in all caps. Ie. 'HOMSA' is for Homo sapiens etc. |
destSpecies | the destination species in inparanoid format |
species | the species involved |
srcIDType | The source ID type written exactly as it would be used in a mapping name for an eg package. So for example, 'UNIPROT' is how the uniprot mappings are always written, so we keep that convention here. |
destIDType | the destination ID, written the same way as you would write the srcIDType. By default this is set to "EG" for entrez gene IDs |
keepMultGeneMatches | Do you want to try and keep the 1st ID in those ambiguous cases where more than one protein is suggested? (You probably want to filter them out - hence the default is FALSE) |
keepMultProtMatches | Do you want to try and keep the 1st ID in those ambiguous cases where more than one protein is suggested? (default = FALSE) |
keepMultDestIDMatches | If you have mapped to a destination ID OTHER than an entrez gene ID, then it is possible that there may be multiple answers. Do you want to keep all of these or only return the 1st one? (default = TRUE) |
Details
inpIDMapper - This is a convenience function for getting an ID from one species mapped to an ID type of your choice from another organism of your choice. The only mappings used to do this are the mappings that are scored as 100 according to the inparanoid algorithm. This function automatically tries to join IDs by using FIVE different mappings in the sequence that follows:
1) initial IDs -> src organism Entrez Gene IDs 2) src organism Entrez Gene IDs -> sre organism Inparanoid ID 3) src organism Inparanoid ID -> dest organism Inparanoid ID 4) dest organism Inparanoid ID -> dest organism Entrez Gene ID 5) dest organism Entrez Gene ID -> final destination organism ID
You can simplify this mapping as a series of steps like this:
srcIDs ---> srcEGs ---> srcInp ---> destInp ---> destEGs ---> destIDs (1) (2) (3) (4) (5)
There are two steps in this process where multiple mappings can really interfere with getting a clear answer. It's no coincidence that these are also adjacent to the two places where we have to tie the identity to a single gene for each organism. When this happens, any ambiguity is confounding. Preceding step #2, it is critical that we only have ONE entrez gene ID per initial ID, and the parameter keepMultGeneMatches can be used to toggle whether to drop any ambiguous matches (the default) or to keep the 1st one in the hope of getting an additional hit. A similar thing is done preceding step #4, where we have to be sure that the protein IDs we are getting back have all mapped to only one gene. We allow you to use the keepMultProtMatches parameter to make the same kind of decision as in step #2, again, the default is to drop anything that is ambiguous.
intraIDMapper - This is a convenience function to map within an organism and so it has a much simpler job to do. It will either map through one mapping or two depending whether the source ID or destination ID is a central ID for the relevant organism package. If the answer is neither, then two mappings will be needed.
idConverter - This is mostly for convenient usage of these functions by developers. It is just a wrapper function that can pass along all the parameters to the appropriate function (intraIDMapper or inpIDMapper). It decides which function to call based on the source and destination organism. The disadvantage to using this function all the time is just that more of the parameters have to be filled out each time.
Value
a list where the names of each element are the elements of the original list you passed in, and the values are the matching results. Elements that do not have a match are not returned. If you want things to align you can do some bookeeping.
Author
Marc Carlson
Examples
## This has to be in a dontrun block because otherwise I would have to
## expand the DEPENDS field for AnnotationDbi
## library("org.Hs.eg.db")
## library("org.Mm.eg.db")
## library("org.Sc.eg.db")
## library("hom.Hs.inp.db")
## library("hom.Mm.inp.db")
## library("hom.Sc.inp.db")
##Some IDs just for the example
library("org.Hs.eg.db")
ids = as.list(org.Hs.egUNIPROT)[10000:10500] ##get some ragged IDs
## Get entrez gene IDs (default) for uniprot IDs mapping from human to mouse.
MouseEGs = inpIDMapper(ids, "HOMSA", "MUSMU")
##Get yeast uniprot IDs in exchange for uniprot IDs from human
YeastUPs = inpIDMapper(ids, "HOMSA", "SACCE", destIDType="UNIPROT")
##Get yeast uniprot IDs but only return one ID per initial ID
YeastUPSingles = inpIDMapper(ids, "HOMSA", "SACCE", destIDType="UNIPROT", keepMultDestIDMatches = FALSE)
##Test out the intraIDMapper function:
HumanEGs = intraIDMapper(ids, species="HOMSA", srcIDType="UNIPROT",
destIDType="EG")
HumanPATHs = intraIDMapper(ids, species="HOMSA", srcIDType="UNIPROT",
destIDType="PATH")
##Test out the wrapper function
MousePATHs = idConverter(MouseEGs, srcSpecies="MUSMU", destSpecies="MUSMU",
srcIDType="EG", destIDType="PATH")
##Convert from Yeast uniprot IDs to Human entrez gene IDs.
HumanEGs = idConverter(YeastUPSingles, "SACCE", "HOMSA")
makeGOGraph()
A convenience function to generate graphs based on the GO.db package
Description
makeGOGraph
is a function to quickly convert any of the three Gene
Ontologies in GO.db into a graphNEL object where each edge is given a
weight of 1.
Usage
makeGOGraph(ont = c("bp","mf","cc"))
Arguments
Argument | Description |
---|---|
ont | Specifies the ontology: "cc", "bp" or "mf". |
Seealso
Author
Marc Carlson
Examples
## makes a GO graph from the CC ontology
f <- makeGOGraph("cc")
make_eg_to_go_map()
Create GO to Entrez Gene maps for chip-based packages
Description
Create a new map object mapping Entrez ID to GO (or vice versa) given a chip annotation data package.
This is a temporary solution until a more general pluggable map solution comes online.
Usage
make_eg_to_go_map(chip)
Arguments
Argument | Description |
---|---|
chip | The name of the annotation data package. |
Value
Either a Go3AnnDbMap
or a RevGo3AnnDbMap
.
Author
Seth Falcon and Hervé Pagès
Examples
library("hgu95av2.db")
eg2go = make_eg_to_go_map("hgu95av2.db")
sample(eg2go, 2)
go2eg = make_go_to_eg_map("hgu95av2.db")
sample(go2eg, 2)
orgPackageName()
Org package contained in annotation object
Description
Get the name of the org package used by an annotation resource object.
NOTE: This man page is for the orgPackageName
list("S4 generic
", " function") defined in the list("AnnotationDbi") package.
Bioconductor packages can define specific methods for annotation
objects not supported by the default method.
Usage
orgPackageName(x, ...)
Arguments
Argument | Description |
---|---|
x | An annotation resource object. |
... | Additional arguments. |
Value
A character(1)
vector indicating the org package name.
Specific methods defined in Bioconductor packages should behave as consistently as possible with the default method.
printprobetable()
Print method for probetable objects
Description
Prints class(x), nrow(x) and ncol(x), but not the elements of x.
The motivation for having this method is that methods from the package
base
such as
print.data.frame
will try to print the values of all elements of x
, which can
take inconveniently much time and screen space if x
is large.
Usage
list(list("print"), list("probetable"))(x, maxrows, list())
Arguments
Argument | Description |
---|---|
x | an object of S3-class probetable . |
maxrows | maximum number of rows to print. |
list() | further arguments that get ignored. |
Seealso
Examples
a = as.data.frame(matrix(runif(1e6), ncol=1e3))
class(a) = c("probetable", class(a))
print(a)
print(as.matrix(a[2:3, 4:6]))
toSQLStringSet()
Convert a vector to a quoted string for use as a SQL value list
Description
Given a vector, this function returns a string with each element of the input coerced to character, quoted, and separated by ",".
Usage
toSQLStringSet(names)
Arguments
Argument | Description |
---|---|
names | A vector of values to quote |
Details
If names
is a character vector with elements containing single
quotes, these quotes will be doubled so as to escape the quote in SQL.
Value
A character vector of length one that represents the input vector as a SQL value list. Each element is single quoted and elements are comma separated.
Note
Do not use sQuote
for generating SQL as that function is
intended for display purposes only. In some locales, sQuote
will generate fancy quotes which will break your SQL.
Author
Hervé Pagès
Examples
toSQLStringSet(letters[1:4])
toSQLStringSet(c("'foo'", "ab'cd", "bar"))
unlist2()
A replacement for unlist() that does not mangle the names
Description
unlist2
is a replacement for base::unlist()
that
does not mangle the names.
Usage
unlist2(x, recursive=TRUE, use.names=TRUE, what.names="inherited")
Arguments
Argument | Description |
---|---|
x, recursive, use.names | See ?unlist . |
what.names | "inherited" or "full" . |
Details
Use this function if you don't like the mangled names returned
by the standard unlist
function from the base package.
Using unlist
with annotation data is dangerous and it is
highly recommended to use unlist2
instead.
Seealso
Author
Hervé Pagès
Examples
x <- list(A=c(b=-4, 2, b=7), B=3:-1, c(a=1, a=-2), C=list(c(2:-1, d=55), e=99))
unlist(x)
unlist2(x)
library(hgu95av2.db)
egids <- c("10", "100", "1000")
egids2pbids <- mget(egids, revmap(hgu95av2ENTREZID))
egids2pbids
unlist(egids2pbids) # 1001, 1002, 10001 and 10002 are not real
# Entrez ids but are the result of unlist()
# mangling the names!
unlist2(egids2pbids) # much cleaner! yes the names are not unique
# but at least they are correct...