bioconductor v3.9.0 AnnotationDbi

Implements a user-friendly interface for querying SQLite-based annotation data packages.

Link to this section Summary

Functions

AnnDbObj objects

Check the SQL data contained in an SQLite-based annotation package

AnnotationDb objects and their progeny, methods etc.

AnnotationDbi internals

Bimap objects and the Bimap interface

Methods for getting/setting the filters on a Bimap object

Formatting a Bimap as a list or character vector

Methods for getting/setting the direction of a Bimap object, and undirected methods for getting/counting/setting its keys

Environment-like API for Bimap objects

Methods for manipulating the keys of a Bimap object

Methods for manipulating a Bimap object in a data-frame style

Descriptions of available values for columns and keytypes for GO.db.

GOFrame and GOAllFrame objects

Class "GOTerms"

Descriptions of available values for columns and keytypes for inparanpoid packages.

KEGGFrame objects

Descriptions of available values for columns and keytypes .

Creates a simple Bimap from a SQLite database in an situation that is external to AnnotationDbi

Convenience functions for mapping IDs through an appropriate set of annotation packages

A convenience function to generate graphs based on the GO.db package

Create GO to Entrez Gene maps for chip-based packages

Org package contained in annotation object

Print method for probetable objects

Convert a vector to a quoted string for use as a SQL value list

A replacement for unlist() that does not mangle the names

Link to this section Functions

Link to this function

AnnDbObj_class()

AnnDbObj objects

Description

The AnnDbObj class is the most general container for storing any kind of SQLite-based annotation data.

Details

Many classes in AnnotationDbi inherit directly or indirectly from the AnnDbObj class. One important particular case is the AnnDbBimap class which is the lowest class in the AnnDbObj hierarchy to also inherit the Bimap interface.

Seealso

dbConnect , dbListTables , dbListFields , dbGetQuery , Bimap

Examples

library("hgu95av2.db")

dbconn(hgu95av2ENTREZID)              # same as hgu95av2_dbconn()
dbfile(hgu95av2ENTREZID)              # same as hgu95av2_dbfile()

dbmeta(hgu95av2_dbconn(), "ORGANISM")
dbmeta(hgu95av2_dbconn(), "DBSCHEMA")
dbmeta(hgu95av2_dbconn(), "DBSCHEMAVERSION")

library("DBI")
dbListTables(hgu95av2_dbconn())       #lists all tables on connection

## If you use dbSendQuery instead of dbGetQuery
## (NOTE: for ease of use, this is defintitely NOT reccomended)
## Then you may need to know how to list results objects
dbListResults(hgu95av2_dbconn())      #for listing results objects


## You can also list the fields by using this connection
dbListFields(hgu95av2_dbconn(), "probes")
dbListFields(hgu95av2_dbconn(), "genes")
dbschema(hgu95av2ENTREZID)        # same as hgu95av2_dbschema()
## According to the schema, the probes._id column references the genes._id
## column. Note that in all tables, the "_id" column is an internal id with
## no biological meaning (provided for allowing efficient joins between
## tables).
## The information about the probe to gene mapping is in probes:
dbGetQuery(hgu95av2_dbconn(), "SELECT * FROM probes LIMIT 10")
## This mapping is in fact the ENTREZID map:
toTable(hgu95av2ENTREZID)[1:10, ] # only relevant columns are retrieved

dbInfo(hgu95av2GO)                # same as hgu95av2_dbInfo()

##Advanced example:
##Sometimes you may wish to join data from across multiple databases at
##once:
## In the following example we will attach the GO database to the
## hgu95av2 database, and then grab information from separate tables
## in each database that meet a common criteria.
library(hgu95av2.db)
library("GO.db")
attachSql <- paste('ATTACH "', GO_dbfile(), '" as go;', sep = "")
dbGetQuery(hgu95av2_dbconn(), attachSql)
sql <- 'SELECT  DISTINCT a.go_id AS "hgu95av2.go_id",
a._id AS "hgu95av2._id",
g.go_id AS "GO.go_id", g._id AS "GO._id",
g.term, g.ontology, g.definition
FROM go_bp_all AS a, go.go_term AS g
WHERE a.go_id = g.go_id LIMIT 10;'
data <- dbGetQuery(hgu95av2_dbconn(), sql)
data
## For illustration purposes, the internal id "_id" and the "go_id"
## from both tables is included in the output.  This makes it clear
## that the "go_ids" can be used to join these tables but the internal
## ids can NOT.  The internal IDs (which are always shown as _id) are
## suitable for joins within a single database, but cannot be used
## across databases.
Link to this function

AnnDbPkg_checker()

Check the SQL data contained in an SQLite-based annotation package

Description

Check the SQL data contained in an SQLite-based annotation package.

Usage

checkMAPCOUNTS(pkgname)

Arguments

ArgumentDescription
pkgnameThe name of the SQLite-based annotation package to check.

Seealso

AnnDbPkg-maker

Author

H. Pagès

Examples

checkMAPCOUNTS("org.Sc.sgd.db")
Link to this function

AnnotationDb_class()

AnnotationDb objects and their progeny, methods etc.

Description

AnnotationDb is the virtual base class for all annotation packages. It contain a database connection and is meant to be the parent for a set of classes in the Bioconductor annotation packages. These classes will provide a means of dispatch for a widely available set of select methods and thus allow the easy extraction of data from the annotation packages.

select , columns and keys are used together to extract data from an AnnotationDb object (or any object derived from the parent class). Examples of classes derived from the AnnotationDb object include (but are not limited to): ChipDb , OrgDb GODb , InparanoidDb and ReactomeDb .

columns shows which kinds of data can be returned for the AnnotationDb object.

keytypes allows the user to discover which keytypes can be passed in to select or keys and the keytype argument.

keys returns keys for the database contained in the AnnotationDb object . This method is already documented in the keys manual page but is mentioned again here because it's usage with select is so intimate. By default it will return the primary keys for the database, but if used with the keytype argument, it will return the keys from that keytype.

select will retrieve the data as a data.frame based on parameters for selected keys columns and keytype arguments. Users should be warned that if you call select and request columns that have multiple matches for your keys, select will return a data.frame with one row for each possible match. This has the effect that if you request multiple columns and some of them have a many to one relationship to the keys, things will continue to multiply accordingly. So it's not a good idea to request a large number of columns unless you know that what you are asking for should have a one to one relationship with the initial set of keys. In general, if you need to retrieve a column (like GO) that has a many to one relationship to the original keys, it is most useful to extract that separately.

mapIds gets the mapped ids (column) for a set of keys that are of a particular keytype. Usually returned as a named character vector.

saveDb will take an AnnotationDb object and save the database to the file specified by the path passed in to the file argument.

loadDb takes a .sqlite database file as an argument and uses data in the metadata table of that file to return an AnnotationDb style object of the appropriate type.

species shows the genus and species label currently attached to the AnnotationDb objects database.

dbfile gets the database file associated with an object.

dbconn gets the datebase connection associated with an object.

taxonomyId gets the taxonomy ID associated with an object (if available).

Usage

columns(x)
  keytypes(x)
  keys(x, keytype, ...)
  select(x, keys, columns, keytype, ...)
  mapIds(x, keys, column, keytype, ..., multiVals)
  saveDb(x, file)
  loadDb(file, packageName=NA)

Arguments

ArgumentDescription
xthe AnnotationDb object. But in practice this will mean an object derived from an AnnotationDb object such as a OrgDb or ChipDb object.
keysthe keys to select records for from the database. All possible keys are returned by using the keys method.
columnsthe columns or kinds of things that can be retrieved from the database. As with keys , all possible columns are returned by using the columns method.
keytypethe keytype that matches the keys used. For the select methods, this is used to indicate the kind of ID being used with the keys argument. For the keys method this is used to indicate which kind of keys are desired from keys
columnthe column to search on (for mapIds ). Different from columns in that it can only have a single element for the value

|... | other arguments. These include: list(" ", " ", list(list("pattern:"), list("the pattern to match (used by keys)")), " ", " ", list(list("column:"), list("the column to search on. This is used by keys and is ", " for when the thing you want to pattern match is different from ", " the keytype, or when you want to simply want to get keys that ", " have a value for the thing specified by the column argument.")), " ", " ", list(list("fuzzy:"), list("TRUE or FALSE value. Use fuzzy matching? (this is ", " used with pattern by the keys method)")), |

"

", " ")
|multiVals | What should mapIds do when there are multiple values that could be returned? Options include: list(" ", " ", list(list("first:"), list("This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior")), " ", " ", list(list("list:"), list("This will just returns a list object to the end user")), " ", " ", list(list("filter:"), list("This will remove all elements that contain multiple matches and will therefore return a shorter vector than what came in whenever some of the keys match more than one value")), |

"

", " ", list(list("asNA:"), list("This will return an NA value whenever there are multiple matches")), " ", " ", list(list("CharacterList:"), list("This just returns a SimpleCharacterList object")), " ", " ", list(list("FUN:"), list("You can also supply a function to the ", list("multiVals"), " argument for custom behaviors. The function must take a single argument and return a single value. This function will be applied to all the elements and will serve a 'rule' that for which thing to keep when there is more than one element. So for example this example function will always grab the last element in each result: ",

    list(" last <- function(x){x[[length(x)]]} "), "

", " ")), " ", " ")
|file | an sqlite file path. A string the represents the full name you want for your sqlite database and also where to put it.| |packageName | for internal use only|

Value

keys , columns and keytypes each return a character vector or possible values. select returns a data.frame.

Seealso

keys , dbConnect , dbListTables , dbListFields , dbGetQuery , Bimap

Author

Marc Carlson

Examples

require(hgu95av2.db)
## display the columns
columns(hgu95av2.db)
## get the 1st 6 possible keys
keys <- head( keys(hgu95av2.db) )
keys
## lookup gene symbol and unigene ID for the 1st 6 keys
select(hgu95av2.db, keys=keys, columns = c("SYMBOL","UNIGENE"))

## get keys based on unigene
keyunis <- head( keys(hgu95av2.db, keytype="UNIGENE") )
keyunis
## list supported key types
keytypes(hgu95av2.db)
## lookup gene symbol and unigene ID based on unigene IDs by setting
## the keytype to "UNIGENE" and passing in unigene keys:
select(hgu95av2.db, keys=keyunis, columns = c("SYMBOL","UNIGENE"),
keytype="UNIGENE")

keys <- head(keys(hgu95av2.db, 'ENTREZID'))
## get a default result (captures only the 1st element)
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID')
## or use a different option
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID',
multiVals="CharacterList")
## Or define your own function
last <- function(x){x[[length(x)]]}
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID',
multiVals=last)

## For other ways to access the DB, you can use dbfile() or dbconn() to extract
dbconn(hgu95av2.db)
dbfile(hgu95av2.db)

## Try to retrieve an associated taxonomyId
taxonomyId(hgu95av2.db)
Link to this function

AnnotationDbi_internals()

AnnotationDbi internals

Description

AnnotationDbi objects, classes and methods that are not intended to be used directly.

Bimap objects and the Bimap interface

Description

What we usually call "annotation maps" are in fact Bimap objects. In the following sections we present the bimap concept and the Bimap interface as it is defined in AnnotationDbi.

Seealso

Bimap-direction , Bimap-keys , Bimap-toTable , BimapFormatting , Bimap-envirAPI

Examples

library(hgu95av2.db)
ls(2)
hgu95av2GO # calls the "show" method
summary(hgu95av2GO)
hgu95av2GO2PROBE # calls the "show" method
summary(hgu95av2GO2PROBE)
Link to this function

BimapFiltering()

Methods for getting/setting the filters on a Bimap object

Description

These methods are part of the Bimap interface (see ? for a quick overview of the Bimap objects and their interface).

Some of these methods are for getting or setting the filtering status on a Bimap object so that the mapping object can toggle between displaying all probes, only single probes (the defualt) or only multiply matching probes.

Other methods are for viewing or setting the filter threshold value on a InpAnnDbBimap object.

Usage

## Making a Bimap object that does not prefilter to remove probes that
  ## match multiple genes:
  toggleProbes(x, value)
  hasMultiProbes(x) ##T/F test for exposure of single probes
  hasSingleProbes(x) ##T/F test for exposure of mulitply matched probes
  ## Looking at the SQL filter values for a Bimap
  getBimapFilters(x)
  ## Setting the filter on an InpAnnDbBimap object
  setInpBimapFilter(x,value)

Arguments

ArgumentDescription
xA Bimap object.
valueA character vector containing the new value that the Bimap should use as the filter. Or the value to toggle a probe mapping to: "all", "single", or "multiple".

Details

toggleProbes(x) is a methods for creating Bimaps that have an alternate filter for which probes get exposed based upon whether these probes map to multiple genes or not.

hasMultiProbes(x) and hasSingleProbes(x) are provided to give a quick test about whether or not such probes are exposed in a given mapping.

getBimapFilters(x) will list all the SQL filters applied to a Bimap object.

setInpBimapFilters(x) will allow you to pass a value as a character string which will be used as a filter. In order to be useful with the InpAnnDbBimap objects provided in the inparanoid packages, this value needs to be a to digit number written as a percentage. So for example "80 %" or "95%" would be acceptable. This is owing to the nature of the inparanoid data set.

Value

A Bimap object of the same subtype as x for exposeAllProbes(x) , maskMultiProbes(x) and maskSingleProbes(x) .

A TRUE or FALSE value in the case of hasMultiProbes(x) and hasSingleProbes(x) .

Seealso

Bimap , Bimap-keys , Bimap-direction , BimapFormatting , Bimap-envirAPI , nhit

Author

M. Carlson

Examples

## Make a Bimap that contains all the probes
require("hgu95av2.db")
mapWithMultiProbes <- toggleProbes(hgu95av2ENTREZID, "all")
count.mappedLkeys(hgu95av2ENTREZID)
count.mappedLkeys(mapWithMultiProbes)

## Check that it has both multiply and singly matching probes:
hasMultiProbes(mapWithMultiProbes)
hasSingleProbes(mapWithMultiProbes)

## Make it have Multi probes ONLY:
OnlyMultiProbes = toggleProbes(mapWithMultiProbes, "multiple")
hasMultiProbes(OnlyMultiProbes)
hasSingleProbes(OnlyMultiProbes)

## Convert back to a default map with only single probes exposed
OnlySingleProbes = toggleProbes(OnlyMultiProbes, "single")
hasMultiProbes(OnlySingleProbes)
hasSingleProbes(OnlySingleProbes)


## List the filters on the inparanoid mapping
# library(hom.Dm.inp.db)
# getBimapFilters(hom.Dm.inpANOGA)

## Here is how you can make a mapping with a
##different filter than the default:
# f80 = setInpBimapFilter(hom.Dm.inpANOGA, "80%")
# dim(hom.Dm.inpANOGA)
# dim(f80)
Link to this function

BimapFormatting()

Formatting a Bimap as a list or character vector

Description

These functions format a Bimap as a list or character vector.

Usage

## Formatting as a list
  as.list(x, ...)
  ## Formatting as a character vector
  #as.character(x, ...)

Arguments

ArgumentDescription
xA Bimap object.
...Further arguments are ignored.

Seealso

Bimap , Bimap-envirAPI

Author

H. Pagès

Examples

library(hgu95av2.db)
as.list(hgu95av2CHRLOC)[1:9]
as.list(hgu95av2ENTREZID)[1:9]
as.character(hgu95av2ENTREZID)[1:9]
Link to this function

Bimap_direction()

Methods for getting/setting the direction of a Bimap object, and undirected methods for getting/counting/setting its keys

Description

These methods are part of the Bimap interface (see ? for a quick overview of the Bimap objects and their interface).

They are divided in 2 groups: (1) methods for getting or setting the direction of a Bimap object and (2) methods for getting, counting or setting the left or right keys (or mapped keys only) of a Bimap object. Note that all the methods in group (2) are undirected methods i.e. what they return does NOT depend on the direction of the map (more on this below).

Usage

## Getting or setting the direction of a Bimap object
direction(x)
direction(x) <- value
revmap(x, ...)
## Getting, counting or setting the left or right keys (or mapped
## keys only) of a Bimap object
Lkeys(x)
Rkeys(x)
Llength(x)
Rlength(x)
mappedLkeys(x)
mappedRkeys(x)
count.mappedLkeys(x)
count.mappedRkeys(x)
Lkeys(x) <- value
Rkeys(x) <- value
list(list("subset"), list("Bimap"))(x, Lkeys = NULL, Rkeys = NULL, drop.invalid.keys = FALSE)
list(list("subset"), list("AnnDbBimap"))(x, Lkeys = NULL, Rkeys = NULL, drop.invalid.keys = FALSE, 
    objName = NULL)

Arguments

ArgumentDescription
xA Bimap object.
valueA single integer or character string indicating the new direction in direction(x) <- value . A character vector containing the new keys (must be a subset of the current keys) in Lkeys(x) <- value and Rkeys(x) <- value .

|Lkeys, Rkeys, drop.invalid.keys, objName, ... | Extra arguments for revmap and subset . Extra argument for revmap can be: list(" ", " ", list(list(list("objName")), list(" ", " The name to give to the reversed map (only supported if ", list("x"), " is an ", " ", list("AnnDbBimap"), " object). ", " ")), " ", " ") Extra arguments for subset can be: list(" ", " ", list(list(list("Lkeys")), list(" ", " The new Lkeys. ", " ")), " ", " ", list(list(list("Rkeys")), list(" ", " The new Rkeys. ", " ")), " ", " ", list(list(list("drop.invalid.keys")), list(" ", " If ", list("drop.invalid.keys=FALSE"), " (the default), an error will be raised ", " if the new Lkeys or Rkeys contain invalid keys i.e. keys that don't belong ", " to the current Lkeys or Rkeys. ", " If ", list(|

"drop.invalid.keys=TRUE"), ", invalid keys are silently dropped.

", " ")), " ", " ", list(list(list("objName")), list(" ", " The name to give to the submap (only supported if ", list("x"), " is an ", " ", list("AnnDbBimap"), " object). ", " ")), " ", " ")

Details

All Bimap objects have a direction which can be left-to-right (i.e. the mapping goes from the left keys to the right keys) or right-to-left (i.e. the mapping goes from the right keys to the left keys). A Bimap object x that maps from left to right is considered to be a direct map. Otherwise it is considered to be an indirect map (when it maps from right to left).

direction returns 1 on a direct map and -1 otherwise.

The direction of x can be changed with direction(x) <- value where value must be 1 or -1 . An easy way to reverse a map (i.e. to change its direction) is to do direction(x) <- - direction(x) , or, even better, to use revmap(x) which is actually the recommended way for doing it.

The Lkeys and Rkeys methods return respectively the left and right keys of a Bimap object. Unlike the keys method (see ? for more information), these methods are direction-independent i.e. what they return does NOT depend on the direction of the map. Such methods are also said to be "undirected methods" and methods like the keys method are said to be "directed methods".

All the methods described below are also "undirected methods".

Llength(x) and Rlength(x) are equivalent to (but more efficient than) length(Lkeys(x)) and length(Rkeys(x)) , respectively.

The mappedLkeys (or mappedRkeys ) method returns the left keys (or right keys) that are mapped to at least one right key (or one left key).

count.mappedLkeys(x) and count.mappedRkeys(x) are equivalent to (but more efficient than) length(mappedLkeys(x)) and length(mappedRkeys(x)) , respectively. These functions give overall summaries, if you want to know how many Rkeys correspond to a given Lkey you can use the nhit function.

Lkeys(x) <- value and Rkeys(x) <- value are the undirected versions of keys(x) <- value (see ? for more information) and subset(x, Lkeys=new_Lkeys, Rkeys=new_Rkeys) is provided as a convenient way to reduce the sets of left and right keys in one single function call.

Value

1L or -1L for direction .

A Bimap object of the same subtype as x for revmap and subset .

A character vector for Lkeys , Rkeys , mappedLkeys and mappedRkeys .

A single non-negative integer for Llength , Rlength , count.mappedLkeys and count.mappedRkeys .

Seealso

Bimap , Bimap-keys , BimapFormatting , Bimap-envirAPI , nhit

Author

H. Pagès

Examples

library(hgu95av2.db)
ls(2)
x <- hgu95av2GO
x
summary(x)
direction(x)

length(x)
Llength(x)
Rlength(x)

keys(x)[1:4]
Lkeys(x)[1:4]
Rkeys(x)[1:4]

count.mappedkeys(x)
count.mappedLkeys(x)
count.mappedRkeys(x)

mappedkeys(x)[1:4]
mappedLkeys(x)[1:4]
mappedRkeys(x)[1:4]

y <- revmap(x)
y
summary(y)
direction(y)

length(y)
Llength(y)
Rlength(y)

keys(y)[1:4]
Lkeys(y)[1:4]
Rkeys(y)[1:4]

## etc...

## Get rid of all unmapped keys (left and right)
z <- subset(y, Lkeys=mappedLkeys(y), Rkeys=mappedRkeys(y))
Link to this function

Bimap_envirAPI()

Environment-like API for Bimap objects

Description

These methods allow the user to manipulate any Bimap object as if it was an environment. This environment-like API is provided for backward compatibility with the traditional environment-based maps.

Usage

ls(name, pos = -1L, envir = as.environment(pos), all.names = FALSE,
     pattern, sorted = TRUE)
  exists(x, where, envir, frame, mode, inherits)
  get(x, pos, envir, mode, inherits)
  #x[[i]]
  #x$name
  ## Converting to a list
  mget(x, envir, mode, ifnotfound, inherits)
  eapply(env, FUN, ..., all.names, USE.NAMES)
  #contents(object, all.names)
  ## Additional convenience method
  sample(x, size, replace=FALSE, prob=NULL, ...)

Arguments

ArgumentDescription
nameA Bimap object for ls . A key as a literal character string or a name (possibly backtick quoted) for x$name .
pos, all.names, USE.NAMES, where, frame, mode, inheritsIgnored.
envirIgnored for ls . A Bimap object for mget , get and exists .
patternAn optional regular expression. Only keys matching 'pattern' are returned.
xThe key(s) to search for for exists , get and mget . A Bimap object for [[ and x$name . A Bimap object or an environment for sample .
iSingle key specifying the map element to extract.
ifnotfoundA value to be used if the key is not found. Only NA is currently supported.
envA Bimap object.
FUNThe function to be applied (see original eapply for environments for the details).
...Optional arguments to FUN .
sizeNon-negative integer giving the number of map elements to choose.
replaceShould sampling be with replacement?
probA vector of probability weights for obtaining the elements of the map being sampled.
sortedlogical(1) . When TRUE (default), return primary keys in sorted order.

Seealso

ls , exists , get , mget , eapply , contents , sample , BimapFormatting , Bimap

Examples

library(hgu95av2.db)
x <- hgu95av2CHRLOC

ls(x)[1:3]
exists(ls(x)[1], x)
exists("titi", x)
get(ls(x)[1], x)
x[[ls(x)[1]]]
x$titi # NULL

mget(ls(x)[1:3], x)
eapply(x, length)
contents(x)

sample(x, 3)

Methods for manipulating the keys of a Bimap object

Description

These methods are part of the Bimap interface (see ? for a quick overview of the Bimap objects and their interface).

Usage

#length(x)
  isNA(x)
  mappedkeys(x)
  count.mappedkeys(x)
  keys(x) <- value
  #x[i]

Arguments

ArgumentDescription
xA Bimap object. If the method being caled is keys(x) , then x can also be a AnnotationDb object or one of that objects progeny.
valueA character vector containing the new keys (must be a subset of the current keys).
iA character vector containing the keys of the map elements to extract.

Details

keys(x) returns the set of all valid keys for map x . For example, keys(hgu95av2GO) is the set of all probe set IDs for chip hgu95av2 from Affymetrix.

Please Note that in addition to Bimap objest, keys(x) will also work for AnnotationDb objects and related objects such as OrgDb and ChipDb objects.

Note also that the double bracket operator [[ for Bimap objects is guaranteed to work only with a valid key and will raise an error if the key is invalid. (See ?`` for more information about this operator.)length(x)is equivalent to (but more efficient than)length(keys(x)). A valid key is not necessarily mapped ([[will return anNAon an unmapped key).isNA(x)returns a logical vector of the same length asxwhere theTRUEvalue is used to mark keys that are NOT mapped and theFALSEvalue to mark keys that ARE mapped.mappedkeys(x)returns the subset ofkeys(x)where only mapped keys were kept.count.mappedkeys(x)is equivalent to (but more efficient than)length(mappedkeys(x)). Two (almost) equivalent forms of subsetting a [Bimap](#bimap) object are provided: (1) by setting the keys explicitely and (2) by using the single bracket operator[for [Bimap](#bimap) objects. Let's say the user wants to restrict the mapping to the subset of valid keys stored in character vectormykeys. This can be done either withkeys(x) <- mykeys(form (1)) or withy <- x[mykeys](form (2)). Please note that form (1) alters objectxin an irreversible way (the original keys are lost) so form (2) should be preferred. All the methods described on this pages are "directed methods" i.e. what they return DOES depend on the direction of the [Bimap](#bimap) object that they are applied to (see?for more information about this). ## Value A character vector forkeysandmappedkeys. A single non-negative integer forlengthandcount.mappedkeys. A logical vector forisNA. A [Bimap](#bimap) object of the same subtype asxforx[i]` . ## Seealso Bimap , Bimap-envirAPI , Bimap-toTable , BimapFormatting , AnnotationDb , select , columns ## Author H. Pagès ## Examples r library(hgu95av2.db) x <- hgu95av2GO x length(x) count.mappedkeys(x) x[1:3] links(x[1:3]) ## Keep only the mapped keys keys(x) <- mappedkeys(x) length(x) count.mappedkeys(x) x # now it is a submap ## The above subsetting can also be achieved with x <- hgu95av2GO[mappedkeys(hgu95av2GO)] ## mappedkeys() and count.mappedkeys() also work with an environment ## or a list z <- list(k1=NA, k2=letters[1:4], k3="x") mappedkeys(z) count.mappedkeys(z) ## retrieve the set of primary keys for the ChipDb object named 'hgu95av2.db' keys <- keys(hgu95av2.db) head(keys)

Link to this function

Bimap_toTable()

Methods for manipulating a Bimap object in a data-frame style

Description

These methods are part of the Bimap interface (see ? for a quick overview of the Bimap objects and their interface).

Usage

## Extract all the columns of the map (links + right attributes)
  toTable(x, ...)
  nrow(x)
  ncol(x)
  #dim(x)
  list(list("head"), list("FlatBimap"))(x, ...)
  list(list("tail"), list("FlatBimap"))(x, ...)
  ## Extract only the links of the map
  links(x)
  count.links(x)
  nhit(x)
  ## Col names and col metanames
  colnames(x, do.NULL=TRUE, prefix="col")
  colmetanames(x)
  Lkeyname(x)
  Rkeyname(x)
  keyname(x)
  tagname(x)
  Rattribnames(x)
  Rattribnames(x) <- value

Arguments

ArgumentDescription
xA Bimap object (or a list or an environment for nhit ).
...Further arguments to be passed to or from other methods (see head or tail for the details).
do.NULLIgnored.
prefixIgnored.
valueA character vector containing the names of the new right attributes (must be a subset of the current right attribute names) or NULL.

Details

toTable(x) turns Bimap object x into a data frame (see section "Flat representation of a bimap" in ? for a short introduction to this concept). For simple maps (i.e. no tags and no right attributes), the resulting data frame has only 2 columns, one for the left keys and one for the right keys, and each row in the data frame represents a link (or edge) between a left and a right key. For maps with tagged links (i.e. a tag is associated to each link), toTable(x) has one additional colmun for the tags and there is still one row per link. For maps with right attributes (i.e. a set of attributes is associated to each right key), toTable(x) has one additional colmun per attribute. So for example if x has tagged links and 2 right attributes, toTable(x) will have 5 columns: one for the left keys, one for the right keys, one for the tags, and one for each right attribute (always the rightmost columns). Note that if at least one of the right attributes is multivalued then more than 1 row can be needed to represent the same link so the number of rows in toTable(x) can be strictly greater than the number of links in the map.

nrow(x) is equivalent to (but more efficient than) nrow(toTable(x)) .

ncol(x) is equivalent to (but more efficient than) ncol(toTable(x)) .

colnames(x) is equivalent to (but more efficient than) colnames(toTable(x)) . Columns are named accordingly to the names of the SQL columns where the data are coming from. An important consequence of this that they are not necessarily unique.

colmetanames(x) returns the metanames for the column of x that are not right attributes. Valid column metanames are "Lkeyname" , "Rkeyname" and "tagname" .

Lkeyname , Rkeyname , tagname and Rattribnames return the name of the column (or columns) containing the left keys, the right keys, the tags and the right attributes, respectively.

Like toTable(x) , links(x) turns x into a data frame but the right attributes (if any) are dropped. Note that dropping the right attributes produces a data frame that has eventually less columns than toTable(x) and also eventually less rows because now exactly 1 row is needed to represent 1 link.

count.links(x) is equivalent to (but more efficient than) nrow(links(x)) .

nhit(x) returns a named integer vector indicating the number of "hits" for each key in x i.e. the number of links that start from each key.

Value

A data frame for toTable and links .

A single integer for nrow , ncol and count.links .

A character vector for colnames , colmetanames and Rattribnames .

A character string for Lkeyname , Rkeyname and tagname .

A named integer vector for nhit .

Seealso

Bimap , BimapFormatting , Bimap-envirAPI

Author

H. Pagès

Examples

library(GO.db)
x <- GOSYNONYM
x
toTable(x)[1:4, ]
toTable(x["GO:0007322"])
links(x)[1:4, ]
links(x["GO:0007322"])

nrow(x)
ncol(x)
dim(x)
colnames(x)
colmetanames(x)
Lkeyname(x)
Rkeyname(x)
tagname(x)
Rattribnames(x)

links(x)[1:4, ]
count.links(x)

y <- GOBPCHILDREN
nhy <- nhit(y) # 'nhy' is a named integer vector
identical(names(nhy), keys(y)) # TRUE
table(nhy)
sum(nhy == 0) # number of GO IDs with no children
names(nhy)[nhy == max(nhy)] # the GO ID(s) with the most direct children

## Some sanity check
sum(nhy) == count.links(y) # TRUE

## Changing the right attributes of the GOSYNONYM map (advanced
## users only)
class(x) # GOTermsAnnDbBimap
as.list(x)[1:3]
colnames(x)
colmetanames(x)
tagname(x) # untagged map
Rattribnames(x)
Rattribnames(x) <- Rattribnames(x)[3:1]
colnames(x)
class(x) # AnnDbBimap
as.list(x)[1:3]
Link to this function

GOColsAndKeytypes()

Descriptions of available values for columns and keytypes for GO.db.

Description

This manual page enumerates the kinds of data represented by the values returned when the user calls columns or keytypes

Details

All the possible values for columns and keytypes are listed below.

list(" ", " ", list(list("GOID:"), list("GO Identifiers")), " ", " ", list(list("DEFINITION:"), list("The definition of a GO Term")), " ", " ", list(list("ONTOLOGY:"), list("Which of the three Gene Ontologies (BP, CC, or MF)")), " ", " ", list(list("TERM:"), list("The actual GO term")), "
", " ")

To get the latest information about the date stamps and source URLS for the data used to make an annotation package, please use the metadata method as shown in the example below.

Author

Marc Carlson

Examples

library(GO.db)
## List the possible values for columns
columns(GO.db)
## List the possible values for keytypes
keytypes(GO.db)
## get some values back
keys <- head(keys(GO.db))
keys
select(GO.db, keys=keys, columns=c("TERM","ONTOLOGY"),
keytype="GOID")

## More infomation about the dates and original sources for these data:
metadata(GO.db)

GOFrame and GOAllFrame objects

Description

These objects each contain a data frame which is required to be composed of 3 columns. The 1st column are GO IDs. The second are evidence codes and the 3rd are the gene IDs that match to the GO IDs using those evidence codes. There is also a slot for the organism that these anotations pertain to.

Details

The GOAllFrame object can only be generated from a GOFrame object and its contructor method does this automatically from a GOFrame argument. The purpose of these objects is to create a safe way for annotation data about GO from non-traditional sources to be used for analysis packages like GSEABase and eventually GOstats.

Examples

## Make up an example
genes = c(1,10,100)
evi = c("ND","IEA","IDA")
GOIds = c("GO:0008150","GO:0008152","GO:0001666")
frameData = data.frame(cbind(GOIds,evi,genes))

library(AnnotationDbi)
frame=GOFrame(frameData,organism="Homo sapiens")
allFrame=GOAllFrame(frame)

getGOFrameData(allFrame)
Link to this function

GOTerms_class()

Class "GOTerms"

Description

A class to represent Gene Ontology nodes

Seealso

makeGOGraph shows how to make GO mappings into graphNEL objects.

Note

GOTerms objects are used to represent primary GO nodes in the SQLite-based annotation data package GO.db

References

http://www.geneontology.org/

Examples

gonode <- new("GOTerms", GOID="GO:1234567", Term="Test", Ontology="MF",
Definition="just for testing")
GOID(gonode)
Term(gonode)
Ontology(gonode)

##Or you can just use these methods on a GOTermsAnnDbBimap
##I want to show an ex., but don't want to require GO.db
require(GO.db)
FirstTenGOBimap <- GOTERM[1:10] ##grab the 1st ten
Term(FirstTenGOBimap)

##Or you can just use GO IDs directly
ids = keys(FirstTenGOBimap)
Term(ids)
Link to this function

InparanoidColsAndKeytypes()

Descriptions of available values for columns and keytypes for inparanpoid packages.

Description

When the user calls columns or keytypes for an inparanoid package, the columns and keytypes methods will give the full genus and species names of all the organisms that are available.

Details

All the possible values for columns and keytypes are listed below.

list(" ", " ", list(list("ACYRTHOSIPHON_PISUM:"), list("the pea aphid")), " ", " ", list(list("AEDES_AEGYPTI:"), list("a mosquito that can spread the dengue fever, Chikungunya and yellow fever viruses, and other diseases")), " ", " ", list(list("ANOPHELES_GAMBIAE:"), list("a mosquito notorious as a vector for malaria")), " ", " ", list(list("APIS_MELLIFERA:"), list("the western honey bee")), " ", " ", list(list("ARABIDOPSIS_THALIANA:"), list("the thale cress")), " ", " ",

list(list("ASPERGILLUS_FUMIGATUS:"), list("a fungus that causes disease in

", " immunodeficient individuals")), " ", " ", list(list("BATRACHOCHYTRIUM_DENDROBATIDIS:"), list("a chytrid fungus that causes the disease chytridiomycosis")), " ", " ", list(list("BOMBYX_MORI:"), list("the silk worm")), " ", " ", list(list("BOS_TAURUS:"), list("domestic cattle")), " ", " ", list(list("BRANCHIOSTOMA_FLORIDAE:"), list("a lancelet (amphioxus)")), " ", " ", list(list("BRUGIA_MALAYI:"),

    list("a nematode (roundworm), one of the three causative agents of lymphatic filariasis")), "

", " ", list(list("CAENORHABDITIS_BRENNERI:"), list("a small nematode, closely related to the model organism Caenorhabditis elegans")), " ", " ", list(list("CAENORHABDITIS_BRIGGSAE:"), list("a small nematode, closely related to Caenorhabditis elegans")), " ", " ", list(list("CAENORHABDITIS_ELEGANS:"), list("a small nematode")), " ", " ", list(list("CAENORHABDITIS_JAPONICA:"), list(

    "a gonochoristic (male-female) species related to C. elegans")), "

", " ", list(list("CAENORHABDITIS_REMANEI:"), list("a species of nematode (gonochoristic)")), " ", " ", list(list("CANDIDA_ALBICANS:"), list("a diploid fungus that grows both as yeast and filamentous cells and a causal agent of opportunistic oral and genital infections in humans")), " ", " ", list(list("CANDIDA_GLABRATA:"), list("a haploid yeast of the genus Candida")), " ", " ", list(list("CANIS_FAMILIARIS:"),

    list("domestic dog")), "

", " ", list(list("CAPITELLA_SPI:"), list("a polychaete worm")), " ", " ", list(list("CAVIA_PORCELLUS:"), list("Guinea pig")), " ", " ", list(list("CHLAMYDOMONAS_REINHARDTII:"), list("a single celled green alga ")), " ", " ", list(list("CIONA_INTESTINALIS:"), list("a urochordata (sea squirt), a tunicate widely distributed in Northern European waters")), " ", " ", list(list("CIONA_SAVIGNYI:"), list("a urochordata (sea squirt)")), " ", " ", list(

    list("COCCIDIOIDES_IMMITIS:"), list("a pathogenic fungus that resides in the soil")), "

", " ", list(list("COPRINOPSIS_CINEREUS:"), list("a species of mushroom")), " ", " ", list(list("CRYPTOCOCCUS_NEOFORMANS:"), list("an encapsulated yeast that can live in both plants and animals")), " ", " ", list(list("CRYPTOSPORIDIUM_HOMINIS:"), list("an obligate parasite of humans that can colonize the gastrointestinal tract")), " ", " ", list(list("CRYPTOSPORIDIUM_PARVUM:"), list("one of several protozoal species that cause cryptosporidiosis, a parasitic disease of the mammalian intestinal tract")),

"

", " ", list(list("CULEX_PIPIENS:"), list("the common house mosquito")), " ", " ", list(list("CYANIDIOSCHYZON_MEROLAE:"), list("a an algae that is the main organism in red tide")), " ", " ", list(list("DANIO_RERIO:"), list("the zebrafish")), " ", " ", list(list("DAPHNIA_PULEX:"), list("the most common species of water flea")), " ", " ", list(list("DEBARYOMYCES_HANSENII:"), list("a yeast that tolerates high concentrations of salt and is related to yeasts that cause disease, including Candida albicans")),

"

", " ", list(list("DICTYOSTELIUM_DISCOIDEUM:"), list("a species of soil-living amoeba, ", " AKA a slime mold")), " ", " ", list(list("DROSOPHILA_ANANASSAE:"), list("a fruit fly")), " ", " ", list(list("DROSOPHILA_GRIMSHAWI:"), list("a fruit fly")), " ", " ", list(list("DROSOPHILA_MELANOGASTER:"), list("a fruit fly")), " ", " ", list(list("DROSOPHILA_MOJAVENSIS:"), list("a fruit fly")), " ", " ", list(list("DROSOPHILA_PSEUDOOBSCURA:"), list("a fruit fly")), " ",

"    ", list(list("DROSOPHILA_VIRILIS:"), list("a fruit fly")), "

", " ", list(list("DROSOPHILA_WILLISTONI:"), list("a fruit fly")), " ", " ", list(list("ENTAMOEBA_HISTOLYTICA:"), list("an anaerobic parasitic protozoan")), " ", " ", list(list("EQUUS_CABALLUS:"), list("domestic horse")), " ", " ", list(list("ESCHERICHIA_COLIK12:"), list("a laboratory strain of coliform bacteria")), " ", " ", list(list("FUSARIUM_GRAMINEARUM:"), list("a fungus that attacks cereal grains")), " ",

"    ", list(list("GALLUS_GALLUS:"), list("domsticated chicken")), "

", " ", list(list("GASTEROSTEUS_ACULEATUS:"), list("three spined stickleback fish")), " ", " ", list(list("GIARDIA_LAMBLIA:"), list("a flagellated protozoan parasite")), " ", " ", list(list("HELOBDELLA_ROBUSTA:"), list("a leech")), " ", " ", list(list("IXODES_SCAPULARIS:"), list("the black legged deer tick, a vector for ", " lyme disease")), " ", " ", list(list("KLUYVEROMYCES_LACTIS:"), list("yeast commonly used for genetic studies")),

"

", " ", list(list("LEISHMANIA_MAJOR:"), list("a species of Leishmania, associated with zoonotic cutaneous leishmaniasis")), " ", " ", list(list("LOTTIA_GIGANTEA:"), list("a species of sea snail, a true limpet, a marine gastropod mollusc")), " ", " ", list(list("MACACA_MULATTA:"), list("the rhesus Macaque")), " ", " ", list(list("MAGNAPORTHE_GRISEA:"), list("rice blast fungus")), " ", " ", list(list("MONODELPHIS_DOMESTICA:"), list("grey short tailed opossum")), " ", " ",

list(list("MONOSIGA_BREVICOLLIS:"), list("a marine choanoflagellate")), "

", " ", list(list("MUS_MUSCULUS:"), list("lab mouse")), " ", " ", list(list("NASONIA_VITRIPENNIS:"), list("a small pteromalid parasitoid wasp")), " ", " ", list(list("NEMATOSTELLA_VECTENSIS:"), list("the starlet sea anemone")), " ", " ", list(list("NEUROSPORA_CRASSA:"), list("a type of red bread mould")), " ", " ", list(list("ORNITHORHYNCHUS_ANATINUS:"), list("the platypus")), " ", " ", list(list(

    "ORYZA_SATIVA:"), list("rice")), "

", " ", list(list("ORYZIAS_LATIPES:"), list("medaka fish")), " ", " ", list(list("OSTREOCOCCUS_TAURI:"), list("a unicellular coccoid or spherically shaped green alga")), " ", " ", list(list("PAN_TROGLODYTES:"), list("chimp")), " ", " ", list(list("PEDICULUS_HUMANUS:"), list("a species of lice that infects humans")), " ", " ", list(list("PHYSCOMITRELLA_PATENS:"), list("a moss (Bryophyta) used as a model organism for studies on plant evolution")),

"

", " ", list(list("PHYTOPHTHORA_RAMORUM:"), list("the oomycete plant pathogen (sudden oak ", " death)")), " ", " ", list(list("PHYTOPHTHORA_SOJAE:"), list("an oomycete and a soil-borne plant pathogen that causes stem and root rot of soybean")), " ", " ", list(list("PLASMODIUM_FALCIPARUM:"), list("a protozoan parasite that causes malaria")), " ", " ", list(list("PLASMODIUM_VIVAX:"), list("a protozoal parasite and a human pathogen ", " that causes a more benign malaria")),

"

", " ", list(list("PONGO_PYGMAEUS:"), list("the Bornean orangutan")), " ", " ", list(list("POPULUS_TRICHOCARPA:"), list("black cottonwood; also known as western balsam poplar or California poplar")), " ", " ", list(list("PRISTIONCHUS_PACIFICUS:"), list("a diplogastrid nematode")), " ", " ", list(list("PUCCINIA_GRAMINIS:"), list("stem, black or cereal rusts")), " ", " ", list(list("RATTUS_NORVEGICUS:"), list("common lab rat")), " ", " ", list(list("RHIZOPUS_ORYZAE:"),

    list("a fungus that lives worldwide in dead organic matter. An opportunistic human pathogen")), "

", " ", list(list("SACCHAROMYCES_CEREVISIAE:"), list("brewers yeast")), " ", " ", list(list("SCHISTOSOMA_MANSONI:"), list("a significant parasite of humans, a trematode that is one of the major agents of the disease schistosomiasis")), " ", " ", list(list("SCHIZOSACCHAROMYCES_POMBE:"), list("fission yeast")), " ", " ", list(list("SCLEROTINIA_SCLEROTIORUM:"), list("an omnivorous fungal plant pathogen")),

"

", " ", list(list("SORGHUM_BICOLOR:"), list("sorghum")), " ", " ", list(list("STAGONOSPORA_NODORUM:"), list("a fungal leaf spot disease")), " ", " ", list(list("STRONGYLOCENTROTUS_PURPURATUS:"), list("the purple sea urchin")), " ", " ", list(list("TAKIFUGU_RUBRIPES:"), list("Japanese pufferfish")), " ", " ", list(list("TETRAHYMENA_THERMOPHILA:"), list("a single celled cilliate")), " ", " ", list(list("TETRAODON_NIGROVIRIDIS:"), list("green spotted pufferfish (fresh water)")),

"

", " ", list(list("THALASSIOSIRA_PSEUDONANA:"), list("a species of marine centric diatom")), " ", " ", list(list("THEILERIA_ANNULATA:"), list("a tickborne protozoan pathogen which is a major cause of livestock disease in sub-tropical regions")), " ", " ", list(list("THEILERIA_PARVA:"), list("a parasitic protozoan, that causes East Coast fever (theileriosis) in cattle")), " ", " ", list(list("TRIBOLIUM_CASTANEUM:"), list("the red flour beetle")), " ", " ", list(list("TRICHOMONAS_VAGINALIS:"),

    list("an anaerobic, flagellated protozoan")), "

", " ", list(list("TRICHOPLAX_ADHAERENS:"), list("Trichoplax adhaerens represents the simplest known animal, with the smallest known animal genome")), " ", " ", list(list("TRYPANOSOMA_CRUZI:"), list("a species of parasitic euglenoid trypanosomes. This species causes the trypanosomiasis diseases in humans and animals in America.")), " ", " ", list(list("USTILAGO_MAYDIS:"), list("a pathogenic plant fungus that causes smut disease on maize")),

"

", " ", list(list("XENOPUS_TROPICALIS:"), list("Western clawed frog")), " ", " ", list(list("YARROWIA_LIPOLYTICA:"), list("Yarrowia lipolytica is a "non-conventional" species of yeast, often used in genetic research because it differs from other well-studied species")), " ", " ")

To get the latest information about the date stamps and source URLS for the data used to make an annotation package, please use the metadata method as shown in the example below.

Author

Marc Carlson

Examples

library(hom.Hs.inp.db)
## List the possible values for columns
columns(hom.Hs.inp.db)
## List the possible values for keytypes
keytypes(hom.Hs.inp.db)
## get some values back
keys <- head(keys(hom.Hs.inp.db, keytype="HOMO_SAPIENS"))
keys
select(hom.Hs.inp.db, keys=keys, columns=c("BOS_TAURUS","EQUUS_CABALLUS"),
keytype="HOMO_SAPIENS")

## More infomation about the dates and original sources for these data:
metadata(hom.Hs.inp.db)

KEGGFrame objects

Description

These objects each contain a data frame which is required to be composed of 2 columns. The 1st column are KEGG IDs. The second are the gene IDs that match to the KEGG IDs. There is also a slot for the organism that these anotations pertain to. getKEGGFrameData is just an accessor method and returns the data.frame contained in the KEGGFrame object and is mostly used by other code internally.

Details

The purpose of these objects is to create a safe way for annotation data about KEGG from non-traditional sources to be used for analysis packages like GSEABase and eventually Category.

Examples

## Make up an example
genes = c(2,9,9,10)
KEGGIds = c("04610","00232","00983","00232")
frameData = data.frame(cbind(KEGGIds,genes))

library(AnnotationDbi)
frame=KEGGFrame(frameData,organism="Homo sapiens")

getKEGGFrameData(frame)
Link to this function

colsAndKeytypes()

Descriptions of available values for columns and keytypes .

Description

This manual page enumerates the kinds of data represented by the values returned when the user calls columns or keytypes

Details

All the possible values for columns and keytypes are listed below. Users will have to actually use these methods to learn which of the following possible values actually apply in their case.

list(" ", " ", list(list("ACCNUM:"), list("GenBank accession numbers")), " ", " ", list(list("ALIAS:"), list("Commonly used gene symbols")), " ", " ", list(list("ARACYC:"), list("KEGG Identifiers for arabidopsis as indicated by aracyc")), " ", " ", list(list("ARACYCENZYME:"), list("Aracyc enzyme names as indicated by aracyc")), " ", " ", list(list("CHR:"), list("Chromosome (deprecated for Bioc > 3.1) For this ", " information you should look at a TxDb or OrganismDb object and ",

"    search for an appropriate field like TXCHROM, EXONCHROM or

", " CDSCHROM. This information can also be retrieved from these objects ", " using an appropriate range based accesor like transcripts, ", " transcriptsBy etc.")), " ", " ", list(list("CHRLOC:"), list("Chromosome and starting base of associated gene ", " (deprecated for Bioc > 3.1) For this information you should look at ", " a TxDb or OrganismDb object and search for an appropriate field like ", " TXSTART etc. or even better use the associated range based accessors ",

"    like transcripts or transcriptsBy to get back GRanges objects.")), "

", " ", list(list("CHRLOCEND:"), list("Chromosome and ending base of associated gene ", " (deprecated for Bioc > 3.1) For this information you should look at ", " a TxDb or OrganismDb object and search for an appropriate field like ", " TXEND etc. or even better use the associated range based accessors ", " like transcripts or transcriptsBy to get back GRanges objects.")), " ", " ", list(list("COMMON:"),

list("Common name")), "

", " ", list(list("DESCRIPTION:"), list("The description of the associated gene")), " ", " ", list(list("ENSEMBL:"), list("The ensembl ID as indicated by ensembl")), " ", " ", list(list("ENSEMBLPROT:"), list("The ensembl protein ID as indicated by ensembl")), " ", " ", list(list("ENSEMBLTRANS:"), list("The ensembl transcript ID as indicated by ensembl")), " ", " ", list(list("ENTREZID:"), list("Entrez gene Identifiers")), " ", " ", list(list("ENZYME:"),

list("Enzyme Commission numbers")), "

", " ", list(list("EVIDENCE:"), list("Evidence codes for GO associations with a gene of interest")), " ", " ", list(list("EVIDENCEALL:"), list("Evidence codes for GO (includes less specific terms)")), " ", " ", list(list("GENENAME:"), list("The full gene name")), " ", " ", list(list("GO:"), list("GO Identifiers associated with a gene of interest")), " ", " ", list(list("GOALL:"), list("GO Identifiers (includes less specific terms)")),

"

", " ", list(list("INTERPRO:"), list("InterPro identifiers")), " ", " ", list(list("IPI:"), list("IPI accession numbers")), " ", " ", list(list("MAP:"), list("cytoband locations")), " ", " ", list(list("OMIM:"), list("Online Mendelian Inheritance in Man identifiers")), " ", " ", list(list("ONTOLOGY:"), list("For GO Identifiers, which Gene Ontology (BP, CC, or MF)")), " ", " ", list(list("ONTOLOGYALL:"), list("Which Gene Ontology (BP, CC, or MF), (includes less specific terms)")),

"

", " ", list(list("ORF:"), list("Yeast ORF Identifiers")), " ", " ", list(list("PATH:"), list("KEGG Pathway Identifiers")), " ", " ", list(list("PFAM:"), list("PFAM Identifiers")), " ", " ", list(list("PMID:"), list("Pubmed Identifiers")), " ", " ", list(list("PROBEID:"), list("Probe or manufacturer Identifiers for a chip package")), " ", " ", list(list("PROSITE:"), list("Prosite Identifiers")), " ", " ", list(list("REFSEQ:"), list("Refseq Identifiers")), " ", " ",

list(list("SGD:"), list("Saccharomyces Genome Database Identifiers")), "

", " ", list(list("SMART:"), list("Smart Identifiers")), " ", " ", list(list("SYMBOL:"), list("The official gene symbol")), " ", " ", list(list("TAIR:"), list("TAIR Identifiers")), " ", " ", list(list("UNIGENE:"), list("Unigene Identifiers")), " ", " ", list(list("UNIPROT:"), list("Uniprot Identifiers")), " ", " ")

To get the latest information about the date stamps and source URLS for the data used to make an annotation package, please use the metadata method as shown in the example below.

Unless otherwise indicated above, the majority of the data for any one package is taken from the source indicated by either it's name (if it's an org package) OR from the name of it's associated org package. So for example, org.Hs.eg.db is using "eg" in the name to indicate that most of the data in that package comes from NCBI entrez gene based data. And org.At.tair.db uses data that primarily comes from tair. For chip packages, the relevant information is the organism package that they depend on. So for example, hgu95av2.db depends on org.Hs.eg.db, and is thus primarily based on NCBI entrez gene ID information.

Author

Marc Carlson

Examples

library(hgu95av2.db)
## List the possible values for columns
columns(hgu95av2.db)
## List the possible values for keytypes
keytypes(hgu95av2.db)
## get some values back
keys <- head(keys(hgu95av2.db))
keys
select(hgu95av2.db, keys=keys, columns=c("SYMBOL","PFAM"),
keytype="PROBEID")

## More infomation about the dates and original sources for these data:
metadata(hgu95av2.db)
Link to this function

createSimpleBimap()

Creates a simple Bimap from a SQLite database in an situation that is external to AnnotationDbi

Description

This function allows users to easily make a simple Bimap object for extra tables etc that they may wish to add to their annotation packages. For most Bimaps, their definition is stored inside of AnnotationDbi. The addition of this function is to help ensure that this does not become a limitation, by allowing simple extra Bimaps to easily be defined external to AnnotationDbi. Usually, this will be done in the zzz.R source file of a package so that these extra mappings can be seemlessly integrated with the rest of the package. For now, this function assumes that users will want to use data from just one table.

Usage

createSimpleBimap(tablename, Lcolname, Rcolname, datacache, objName,
  objTarget)

Arguments

ArgumentDescription
tablenameThe name of the database table to grab the mapping information from.
LcolnameThe field name from the database table. These will become the Lkeys in the final mapping.
RcolnameThe field name from the database table. These will become the Rkeys in the final mapping.
datacacheThe datacache object should already exist for every standard Annotation package. It is not exported though, so you will have to access it with ::: . It is needed to provide the connection information to the function.
objNameThis is the name of the mapping.
objTargetThis is the name of the thing the mapping goes with. For most uses, this will mean the package name that the mapping belongs with.

Examples

##You simply have to call this function to create a new mapping.  For
##example, you could have created a mapping between the gene_name and
##the symbols fields from the gene_info table contained in the hgu95av2
##package by doing this:
library(hgu95av2.db)
hgu95av2NAMESYMBOL <- createSimpleBimap("gene_info",
"gene_name",
"symbol",
hgu95av2.db:::datacache,
"NAMESYMBOL",
"hgu95av2.db")

Convenience functions for mapping IDs through an appropriate set of annotation packages

Description

These are a set of convenience functions that attempt to take a list of IDs along with some addional information about what those IDs are, what type of ID you would like them to be, as well as some information about what species they are from and what species you would like them to be from and then attempts to the simplest possible conversion using the organism and possible inparanoid annotation packages. By default, this function will drop ambiguous matches from the results. Please see the details section for more information about the parameters that can affect this. If a more complex treatment of how to handle multiple matches is required, then it is likely that a less convenient approach will be necessary.

Usage

inpIDMapper(ids, srcSpecies, destSpecies, srcIDType="UNIPROT",
  destIDType="EG", keepMultGeneMatches=FALSE, keepMultProtMatches=FALSE,
  keepMultDestIDMatches = TRUE)
  intraIDMapper(ids, species, srcIDType="UNIPROT", destIDType="EG",
  keepMultGeneMatches=FALSE)
  idConverter(ids, srcSpecies, destSpecies, srcIDType="UNIPROT",
  destIDType="EG", keepMultGeneMatches=FALSE, keepMultProtMatches=FALSE,
  keepMultDestIDMatches = TRUE)

Arguments

ArgumentDescription
idsa list or vector of original IDs to match
srcSpeciesThe original source species in in paranoid format. In other words, the 3 letters of the genus followed by 2 letters of the species in all caps. Ie. 'HOMSA' is for Homo sapiens etc.
destSpeciesthe destination species in inparanoid format
speciesthe species involved
srcIDTypeThe source ID type written exactly as it would be used in a mapping name for an eg package. So for example, 'UNIPROT' is how the uniprot mappings are always written, so we keep that convention here.
destIDTypethe destination ID, written the same way as you would write the srcIDType. By default this is set to "EG" for entrez gene IDs
keepMultGeneMatchesDo you want to try and keep the 1st ID in those ambiguous cases where more than one protein is suggested? (You probably want to filter them out - hence the default is FALSE)
keepMultProtMatchesDo you want to try and keep the 1st ID in those ambiguous cases where more than one protein is suggested? (default = FALSE)
keepMultDestIDMatchesIf you have mapped to a destination ID OTHER than an entrez gene ID, then it is possible that there may be multiple answers. Do you want to keep all of these or only return the 1st one? (default = TRUE)

Details

inpIDMapper - This is a convenience function for getting an ID from one species mapped to an ID type of your choice from another organism of your choice. The only mappings used to do this are the mappings that are scored as 100 according to the inparanoid algorithm. This function automatically tries to join IDs by using FIVE different mappings in the sequence that follows:

1) initial IDs -> src organism Entrez Gene IDs 2) src organism Entrez Gene IDs -> sre organism Inparanoid ID 3) src organism Inparanoid ID -> dest organism Inparanoid ID 4) dest organism Inparanoid ID -> dest organism Entrez Gene ID 5) dest organism Entrez Gene ID -> final destination organism ID

You can simplify this mapping as a series of steps like this:

srcIDs ---> srcEGs ---> srcInp ---> destInp ---> destEGs ---> destIDs (1) (2) (3) (4) (5)

There are two steps in this process where multiple mappings can really interfere with getting a clear answer. It's no coincidence that these are also adjacent to the two places where we have to tie the identity to a single gene for each organism. When this happens, any ambiguity is confounding. Preceding step #2, it is critical that we only have ONE entrez gene ID per initial ID, and the parameter keepMultGeneMatches can be used to toggle whether to drop any ambiguous matches (the default) or to keep the 1st one in the hope of getting an additional hit. A similar thing is done preceding step #4, where we have to be sure that the protein IDs we are getting back have all mapped to only one gene. We allow you to use the keepMultProtMatches parameter to make the same kind of decision as in step #2, again, the default is to drop anything that is ambiguous.

intraIDMapper - This is a convenience function to map within an organism and so it has a much simpler job to do. It will either map through one mapping or two depending whether the source ID or destination ID is a central ID for the relevant organism package. If the answer is neither, then two mappings will be needed.

idConverter - This is mostly for convenient usage of these functions by developers. It is just a wrapper function that can pass along all the parameters to the appropriate function (intraIDMapper or inpIDMapper). It decides which function to call based on the source and destination organism. The disadvantage to using this function all the time is just that more of the parameters have to be filled out each time.

Value

a list where the names of each element are the elements of the original list you passed in, and the values are the matching results. Elements that do not have a match are not returned. If you want things to align you can do some bookeeping.

Author

Marc Carlson

Examples

## This has to be in a dontrun block because otherwise I would have to
## expand the DEPENDS field for AnnotationDbi
## library("org.Hs.eg.db")
## library("org.Mm.eg.db")
## library("org.Sc.eg.db")
## library("hom.Hs.inp.db")
## library("hom.Mm.inp.db")
## library("hom.Sc.inp.db")

##Some IDs just for the example
library("org.Hs.eg.db")
ids = as.list(org.Hs.egUNIPROT)[10000:10500] ##get some ragged IDs
## Get entrez gene IDs (default) for uniprot IDs mapping from human to mouse.
MouseEGs = inpIDMapper(ids, "HOMSA", "MUSMU")
##Get yeast uniprot IDs in exchange for uniprot IDs from human
YeastUPs = inpIDMapper(ids, "HOMSA", "SACCE", destIDType="UNIPROT")
##Get yeast uniprot IDs but only return one ID per initial ID
YeastUPSingles = inpIDMapper(ids, "HOMSA", "SACCE", destIDType="UNIPROT", keepMultDestIDMatches = FALSE)

##Test out the intraIDMapper function:
HumanEGs = intraIDMapper(ids, species="HOMSA", srcIDType="UNIPROT",
destIDType="EG")
HumanPATHs = intraIDMapper(ids, species="HOMSA", srcIDType="UNIPROT",
destIDType="PATH")

##Test out the wrapper function
MousePATHs = idConverter(MouseEGs, srcSpecies="MUSMU", destSpecies="MUSMU",
srcIDType="EG", destIDType="PATH")
##Convert from Yeast uniprot IDs to Human entrez gene IDs.
HumanEGs = idConverter(YeastUPSingles, "SACCE", "HOMSA")

A convenience function to generate graphs based on the GO.db package

Description

makeGOGraph is a function to quickly convert any of the three Gene Ontologies in GO.db into a graphNEL object where each edge is given a weight of 1.

Usage

makeGOGraph(ont = c("bp","mf","cc"))

Arguments

ArgumentDescription
ontSpecifies the ontology: "cc", "bp" or "mf".

Seealso

GOTerms

Author

Marc Carlson

Examples

## makes a GO graph from the CC ontology
f <- makeGOGraph("cc")
Link to this function

make_eg_to_go_map()

Create GO to Entrez Gene maps for chip-based packages

Description

Create a new map object mapping Entrez ID to GO (or vice versa) given a chip annotation data package.

This is a temporary solution until a more general pluggable map solution comes online.

Usage

make_eg_to_go_map(chip)

Arguments

ArgumentDescription
chipThe name of the annotation data package.

Value

Either a Go3AnnDbMap or a RevGo3AnnDbMap .

Author

Seth Falcon and Hervé Pagès

Examples

library("hgu95av2.db")

eg2go = make_eg_to_go_map("hgu95av2.db")
sample(eg2go, 2)

go2eg = make_go_to_eg_map("hgu95av2.db")
sample(go2eg, 2)
Link to this function

orgPackageName()

Org package contained in annotation object

Description

Get the name of the org package used by an annotation resource object.

NOTE: This man page is for the orgPackageName list("S4 generic ", " function") defined in the list("AnnotationDbi") package. Bioconductor packages can define specific methods for annotation objects not supported by the default method.

Usage

orgPackageName(x, ...)

Arguments

ArgumentDescription
xAn annotation resource object.
...Additional arguments.

Value

A character(1) vector indicating the org package name.

Specific methods defined in Bioconductor packages should behave as consistently as possible with the default method.

Link to this function

printprobetable()

Print method for probetable objects

Description

Prints class(x), nrow(x) and ncol(x), but not the elements of x. The motivation for having this method is that methods from the package base such as print.data.frame will try to print the values of all elements of x , which can take inconveniently much time and screen space if x is large.

Usage

list(list("print"), list("probetable"))(x, maxrows, list())

Arguments

ArgumentDescription
xan object of S3-class probetable .
maxrowsmaximum number of rows to print.
list()further arguments that get ignored.

Seealso

print.data.frame

Examples

a = as.data.frame(matrix(runif(1e6), ncol=1e3))
class(a) = c("probetable", class(a))
print(a)
print(as.matrix(a[2:3, 4:6]))
Link to this function

toSQLStringSet()

Convert a vector to a quoted string for use as a SQL value list

Description

Given a vector, this function returns a string with each element of the input coerced to character, quoted, and separated by ",".

Usage

toSQLStringSet(names)

Arguments

ArgumentDescription
namesA vector of values to quote

Details

If names is a character vector with elements containing single quotes, these quotes will be doubled so as to escape the quote in SQL.

Value

A character vector of length one that represents the input vector as a SQL value list. Each element is single quoted and elements are comma separated.

Note

Do not use sQuote for generating SQL as that function is intended for display purposes only. In some locales, sQuote will generate fancy quotes which will break your SQL.

Author

Hervé Pagès

Examples

toSQLStringSet(letters[1:4])
toSQLStringSet(c("'foo'", "ab'cd", "bar"))

A replacement for unlist() that does not mangle the names

Description

unlist2 is a replacement for base::unlist() that does not mangle the names.

Usage

unlist2(x, recursive=TRUE, use.names=TRUE, what.names="inherited")

Arguments

ArgumentDescription
x, recursive, use.namesSee ?unlist .
what.names"inherited" or "full" .

Details

Use this function if you don't like the mangled names returned by the standard unlist function from the base package. Using unlist with annotation data is dangerous and it is highly recommended to use unlist2 instead.

Seealso

unlist

Author

Hervé Pagès

Examples

x <- list(A=c(b=-4, 2, b=7), B=3:-1, c(a=1, a=-2), C=list(c(2:-1, d=55), e=99))
unlist(x)
unlist2(x)

library(hgu95av2.db)
egids <- c("10", "100", "1000")
egids2pbids <- mget(egids, revmap(hgu95av2ENTREZID))
egids2pbids

unlist(egids2pbids)   # 1001, 1002, 10001 and 10002 are not real
# Entrez ids but are the result of unlist()
# mangling the names!

unlist2(egids2pbids)  # much cleaner! yes the names are not unique
# but at least they are correct...