bioconductor v3.9.0 Oligo

A package to analyze oligonucleotide arrays

Link to this section Summary

Functions

Accessors for PM, MM or background probes indices.

Accessors and replacement methods for the intensity/PM/MM/BG matrices.

MA plots

Probe Sequeces

Sequence Base Contents

Simplified interface to PLM.

Simplified interface to RMA.

Boxplot

Accessor for chromosome information

Create set of colors, interpolating through a set of preferred colors.

Accessors for physical array coordinates.

Genotype Calls

Tool to fit Probe Level Models.

Estimate affinity coefficients.

Compute and plot nucleotide profile.

Get container information for NimbleGen Tiling Arrays.

Function to get CRLMM summaries saved to disk

NetAffx Biological Annotations

Helper function to extract color information for filenames on NimbleGen arrays.

Retrieve Platform Design object

Probe information selector.

Density estimate

Display a pseudo-image of a microarray chip

Summarization of SNP data

List XYS files

Defunct Functions in Package 'oligo'

Class "oligoPLM"

The oligo package: a tool for low-level analysis of oligonucleotide arrays

Methods for P/A Calls

Methods for Log-Ratio plotting

Access the allele information for PM probes.

Access the fragment length for PM probes.

Accessor to position information

Accessor to the strand information

Tools for microarray preprocessing.

Accessor to feature names

Read summaries generated by crlmm

Parser to CEL files

Parser to XYS files

RMA - Robust Multichip Average algorithm

Date of scan

Create design matrix for sequences

Preprocessing SNP Arrays

Link to this section Functions

Link to this function

Index_methods()

Accessors for PM, MM or background probes indices.

Description

Extracts the indexes for PM, MM or background probes.

Usage

mmindex(object, ...)
pmindex(object, ...)
bgindex(object, ...)

Arguments

ArgumentDescription
objectFeatureSet or DBPDInfo object
...Extra arguments, not yet implemented

Details

The indices are ordered by 'fid', i.e. they follow the order that the probes appear in the CEL/XYS files.

Value

A vector of integers representing the rows of the intensity matrix that correspond to PM, MM or background probes.

Examples

## How pm() works
x <- read.celfiles(list.celfiles())
pms0 <- pm(x)
pmi <- pmindex(x)
pms1 <- exprs(x)[pmi,]
identical(pms0, pms1)
Link to this function

IntensityMatrix_methods()

Accessors and replacement methods for the intensity/PM/MM/BG matrices.

Description

Accessors and replacement methods for the PM/MM/BG matrices.

Usage

intensity(object)
mm(object, subset = NULL, target='core')
pm(object, subset = NULL, target='core')
bg(object, subset = NULL)
mm(object, subset = NULL, target='core')<-value
pm(object, subset = NULL, target='core')<-value
bg(object)<-value

Arguments

ArgumentDescription
objectFeatureSet object.
subsetNot implemented yet.
valuematrix object.
targetOne of 'probeset', 'core', 'full', 'extended'. This is ignored if the array design is something other than Gene ST or Exon ST.

Details

For all objects but TilingFeatureSet , these methods will return matrices. In case of TilingFeatureSet objects, the value is a 3-dimensional array (probes x samples x channels).

intensity will return the whole intensity matrix associated to the object. pm , mm , bg will return the respective PM/MM/BG matrix.

When applied to ExonFeatureSet or GeneFeatureSet objects, pm will return the PM matrix at the transcript level ('core' probes) by default. The user should set the target argument accordingly if something else is desired. The valid values are: 'probeset' (Exon and Gene arrays), 'core' (Exon and Gene arrays), 'full' (Exon arrays) and 'extended' (Exon arrays).

The target argument has no effects when used on designs other than Gene and Exon ST.

Examples

if (require(maqcExpression4plex) & require(pd.hg18.60mer.expr)){
xysPath <- system.file("extdata", package="maqcExpression4plex")
xysFiles <- list.xysfiles(xysPath, full.name=TRUE)
ngsExpressionFeatureSet <- read.xysfiles(xysFiles)
pm(ngsExpressionFeatureSet)[1:10,]
}
Link to this function

MAplot_methods()

MA plots

Description

Create MA plots using a reference array (if one channel) or using channel2 as reference (if two channel).

Usage

MAplot(object, ...)
list(list("MAplot"), list("FeatureSet"))(object, what=pm, transfo=log2, groups,
       refSamples, which, pch=".", summaryFun=rowMedians,
       plotFun=smoothScatter, main="vs pseudo-median reference chip",
       pairs=FALSE, ...)
list(list("MAplot"), list("TilingFeatureSet"))(object, what=pm, transfo=log2, groups,
       refSamples, which, pch=".", summaryFun=rowMedians,
       plotFun=smoothScatter, main="vs pseudo-median reference chip",
       pairs=FALSE, ...)
list(list("MAplot"), list("PLMset"))(object, what=coefs, transfo=identity, groups,
       refSamples, which, pch=".", summaryFun=rowMedians,
       plotFun=smoothScatter, main="vs pseudo-median reference chip",
       pairs=FALSE, ...)
list(list("MAplot"), list("matrix"))(object, what=identity, transfo=identity,
       groups, refSamples, which, pch=".", summaryFun=rowMedians,
       plotFun=smoothScatter, main="vs pseudo-median reference chip",
       pairs=FALSE, ...)
list(list("MAplot"), list("ExpressionSet"))(object, what=exprs, transfo=identity,
       groups, refSamples, which, pch=".", summaryFun=rowMedians,
       plotFun=smoothScatter, main="vs pseudo-median reference chip",
       pairs=FALSE, ...)

Arguments

ArgumentDescription
objectFeatureSet , PLMset or ExpressionSet object.
whatfunction to be applied on object that will extract the statistics of interest, from which log-ratios and average log-intensities will be computed.
transfofunction to transform the data prior to plotting.
groupsfactor describing groups of samples that will be combined prior to plotting. If missing, MvA plots are done per sample.
refSamplesintegers (indexing samples) to define which subjects will be used to compute the reference set. If missing, a pseudo-reference chip is estimated using summaryFun .
whichinteger (indexing samples) describing which samples are to be plotted.
pchsame as pch in plot
summaryFunfunction that operates on a matrix and returns a vector that will be used to summarize data belonging to the same group (or reference) on the computation of grouped-stats.
plotFunfunction to be used for plotting. Usually smoothScatter , plot or points .
mainstring to be used in title.
pairslogical flag to determine if a matrix of MvA plots is to be generated
...Other arguments to be passed downstream, like plot arguments.

Details

MAplot will take the following extra arguments:

  • subset : indices of elements to be plotted to reduce impact of plotting 100's thousands points (if pairs=FALSE only);

  • span : see loess ;

  • family.loess : see loess ;

  • addLoess : logical flag (default TRUE) to add a loess estimate;

  • parParams : list of params to be passed to par() (if pairs=TRUE only);

Value

Plot

Seealso

plot , smoothScatter

Author

Benilton Carvalho - based on Ben Bolstad's original MAplot function.

Examples

if(require(oligoData) & require(pd.hg18.60mer.expr)){
data(nimbleExpressionFS)
nimbleExpressionFS
groups <- factor(rep(c('brain', 'UnivRef'), each=3))
data.frame(sampleNames(nimbleExpressionFS), groups)
MAplot(nimbleExpressionFS, pairs=TRUE, ylim=c(-.5, .5), groups=groups)
}
Link to this function

Sequences_methods()

Probe Sequeces

Description

Accessor to the (PM/MM/background) probe sequences.

Usage

mmSequence(object)
pmSequence(object, ...)
bgSequence(object, ...)

Arguments

ArgumentDescription
objectFeatureSet , AffySNPPDInfo or DBPDInfo object
...additional arguments

Value

A DNAStringSet containing the PM/MM/background probe sequence associated to the array.

Sequence Base Contents

Description

Function to compute the amounts of each nucleotide in a sequence.

Usage

basecontent(seq)

Arguments

ArgumentDescription
seqcharacter vector of length n containg a valid sequence (A/T/C/G)

Value

matrix with n rows and 4 columns with the counts for each base.

Examples

sequences <- c("ATATATCCCCG", "TTTCCGAGC")
basecontent(sequences)

Simplified interface to PLM.

Description

Simplified interface to PLM.

Usage

basicPLM(pmMat, pnVec, normalize = TRUE, background = TRUE, transfo =
  log2, method = c('plm', 'plmr', 'plmrr', 'plmrc'), verbose = TRUE)

Arguments

ArgumentDescription
pmMatMatrix of intensities to be processed.
pnVecProbeset names
normalizeLogical flag: normalize?
backgroundLogical flag: background adjustment?
transfofunction: function to be used for data transformation prior to summarization.
methodName of the method to be used for normalization. 'plm' is the usual PLM model; 'plmr' is the (row and column) robust version of PLM; 'plmrr' is the row-robust version of PLM; 'plmrc' is the column-robust version of PLM.
verboseLogical flag: verbose.

Value

A list with the following components:

*

Seealso

rcModelPLM , rcModelPLMr , rcModelPLMrr , rcModelPLMrc , basicRMA

Note

Currently, only RMA-bg-correction and quantile normalization are allowed.

Author

Benilton Carvalho

Examples

set.seed(1)
pms <- 2^matrix(rnorm(1000), nc=20)
colnames(pms) <- paste("sample", 1:20, sep="")
pns <- rep(letters[1:10], each=5)
res <- basicPLM(pms, pns, TRUE, TRUE)
res[['Estimates']][1:4, 1:3]
res[['StdErrors']][1:4, 1:3]
res[['Residuals']][1:20, 1:3]

Simplified interface to RMA.

Description

Simple interface to RMA.

Usage

basicRMA(pmMat, pnVec, normalize = TRUE, background = TRUE, bgversion = 2, destructive = FALSE, verbose = TRUE, ...)

Arguments

ArgumentDescription
pmMatMatrix of intensities to be processed.
pnVecProbeset names.
normalizeLogical flag: normalize?
backgroundLogical flag: background adjustment?
bgversionVersion of background correction.
destructiveLogical flag: use destructive methods?
verboseLogical flag: verbose.
list()Not currently used.

Value

Matrix.

Examples

set.seed(1)
pms <- 2^matrix(rnorm(1000), nc=20)
colnames(pms) <- paste("sample", 1:20, sep="")
pns <- rep(letters[1:10], each=5)
res <- basicRMA(pms, pns, TRUE, TRUE)
res[, 1:3]

Boxplot

Description

Boxplot for observed (log-)intensities in a FeatureSet-like object (ExpressionFeatureSet, ExonFeatureSet, SnpFeatureSet, TilingFeatureSet) and ExpressionSet.

Usage

list(list("boxplot"), list("FeatureSet"))(x, which=c("pm", "mm", "bg", "both", "all"), transfo=log2, nsample=10000, list())
list(list("boxplot"), list("ExpressionSet"))(x, which, transfo=identity, nsample=10000, list())

Arguments

ArgumentDescription
xa FeatureSet -like object or ExpressionSet object.
whichcharacter defining what probe types are to be used in the plot.
transfoa function to transform the data before plotting. See 'Details'.
nsamplenumber of units to sample and build the plot.
list()arguments to be passed to the default boxplot method.

Details

The 'transfo' argument will set the transformation to be used. For raw data, 'transfo=log2' is a common practice. For summarized data (which are often in log2-scale), no transformation is needed (therefore 'transfo=identity').

Seealso

hist , image , sample , set.seed

Note

The boxplot methods for FeatureSet and Expression use a sample (via sample ) of the probes/probesets to produce the plot. Therefore, the user interested in reproducibility is advised to use set.seed .

Accessor for chromosome information

Description

Returns chromosome information.

Usage

%- chromosome(object)pmChr(object)

Arguments

ArgumentDescription
objectTilingFeatureSet or SnpCallSet object

Details

chromosome() returns the chromosomal information for all probes and pmChr() subsets the output to the PM probes only (if a TilingFeatureSet object).

Value

Vector with chromosome information.

Create set of colors, interpolating through a set of preferred colors.

Description

Create set of colors, interpolating through a set of preferred colors.

Usage

darkColors(n)
seqColors(n)
seqColors2(n)
divColors(n)

Arguments

ArgumentDescription
ninteger determining number of colors to be generated

Details

darkColors is based on the Dark2 palette in RColorBrewer, therefore useful to describe qualitative features of the data.

seqColors is based on Blues and generates a gradient of blues, therefore useful to describe quantitative features of the data. seqColors2 behaves similarly, but it is based on OrRd (white-orange-red).

divColors is based on the RdBu pallete in RColorBrewer, therefore useful to describe quantitative features ranging on two extremes.

Examples

x <- 1:10
y <- 1:10
cols1 <- darkColors(10)
cols2 <- seqColors(10)
cols3 <- divColors(10)
cols4 <- seqColors2(10)
plot(x, y, col=cols1, xlim=c(1, 13), pch=19, cex=3)
points(x+1, y, col=cols2, pch=19, cex=3)
points(x+2, y, col=cols3, pch=19, cex=3)
points(x+3, y, col=cols4, pch=19, cex=3)
abline(0, 1, lty=2)
abline(-1, 1, lty=2)
abline(-2, 1, lty=2)
abline(-3, 1, lty=2)

Accessors for physical array coordinates.

Description

Accessors for physical array coordinates.

Usage

getX(object, type)
getY(object, type)

Arguments

ArgumentDescription
objectFeatureSet object
type'character' defining the type of the probes to be queried. Valid options are 'pm', 'mm', 'bg'

Value

A vector with the requested coordinates.

Examples

x <- read.celfiles(list.celfiles())
theXpm <- getX(x, "pm")
theYpm <- getY(x, "pm")

Genotype Calls

Description

Performs genotype calls via CRLMM (Corrected Robust Linear Model with Maximum-likelihood based distances).

Usage

crlmm(filenames, outdir, batch_size=40000, balance=1.5,
      minLLRforCalls=c(5, 1, 5), recalibrate=TRUE,
      verbose=TRUE, pkgname, reference=TRUE)
justCRLMM(filenames, batch_size = 40000, minLLRforCalls = c(5, 1, 5),
recalibrate = TRUE, balance = 1.5, phenoData = NULL, verbose = TRUE,
pkgname = NULL, tmpdir=tempdir())

Arguments

ArgumentDescription
filenamescharacter vector with the filenames.
outdirdirectory where the output (and some tmp files) files will be saved.
batch_sizeinteger defining how many SNPs should be processed at a time.
recalibrateLogical - should recalibration be performed?
balanceControl parameter to balance homozygotes and heterozygotes calls.
minLLRforCallsMinimum thresholds for genotype calls.
verboseLogical.
phenoDataphenoData object or NULL
pkgnamealt. pdInfo package to be used
referencelogical, defaulting to TRUE ...
tmpdirDirectory where temporary files are going to be stored at.

Value

SnpCallSetPlus object.

Link to this function

fitProbeLevelModel()

Tool to fit Probe Level Models.

Description

Fits robust Probe Level linear Models to all the (meta)probesets in an FeatureSet . This is carried out on a (meta)probeset by (meta)probeset basis.

Usage

fitProbeLevelModel(object, background=TRUE, normalize=TRUE, target="core", method="plm", verbose=TRUE, S4=TRUE, ...)

Arguments

ArgumentDescription
objectFeatureSet object.
backgroundDo background correction?
normalizeDo normalization?
targetcharacter vector describing the summarization target. Valid values are: 'probeset', 'core' (Gene/Exon), 'full' (Exon), 'extended' (Exon).
methodsummarization method to be used.
verboseverbosity flag.
S4return final value as an S4 object ( oligoPLM ) if TRUE . If FALSE , final value is returned as a list .
...subset to be passed down to getProbeInfo for subsetting. See subset for details.

Value

fitProbeLevelModel returns an oligoPLM object, if S4=TRUE ; otherwise, it will return a list.

Seealso

rma , summarizationMethods , subset

Note

This is the initial port of fitPLM to oligo. Some features found on the original work by Ben Bolstad (in the affyPLM package) may not be yet available. If you found one of this missing characteristics, please contact Benilton Carvalho.

Author

This is a simplified port from Ben Bolstad's work implemented in the affyPLM package. Problems with the implementation in oligo should be reported to Benilton Carvalho.

References

Bolstad, BM (2004) list("Low Level Analysis of High-density ", " Oligonucleotide Array Data: Background, Normalization and ", " Summarization") . PhD Dissertation. University of California, Berkeley.

Examples

if (require(oligoData)){
data(nimbleExpressionFS)
fit <- fitProbeLevelModel(nimbleExpressionFS)
image(fit)
NUSE(fit)
RLE(fit)
}
Link to this function

getAffinitySplineCoefficients()

Estimate affinity coefficients.

Description

Estimate affinity coefficients using sequence information and splines.

Usage

getAffinitySplineCoefficients(intensities, sequences)

Arguments

ArgumentDescription
intensitiesIntensity matrix
sequencesProbe sequences

Value

Matrix with estimated coefficients.

Seealso

getBaseProfile

Link to this function

getBaseProfile()

Compute and plot nucleotide profile.

Description

Computes and, optionally, lots nucleotide profile, describing the sequence effect on intensities.

Usage

getBaseProfile(coefs, probeLength = 25, plot = FALSE, ...)

Arguments

ArgumentDescription
coefsaffinity spline coefficients.
probeLengthlength of probes
plotlogical. Plots profile?
list()arguments to be passed to matplot.

Value

Invisibly returns a matrix with estimated effects.

Get container information for NimbleGen Tiling Arrays.

Description

Get container information for NimbleGen Tiling Arrays. This is useful for better identification of control probes.

Usage

getContainer(object, probeType)

Arguments

ArgumentDescription
objectA TilingFeatureSet or TilingFeatureSet object.
probeTypeString describing which probes to query ('pm', 'bg')

Value

'character' vector with container information.

Link to this function

getCrlmmSummaries()

Function to get CRLMM summaries saved to disk

Description

This will read the summaries written to disk and return them to the user as a SnpCallSetPlus or SnpCnvCallSetPlus object.

Usage

getCrlmmSummaries(tmpdir)

Arguments

ArgumentDescription
tmpdirdirectory where CRLMM saved the results to.

Value

If the data were from SNP 5.0 or 6.0 arrays, the function will return a SnpCnvCallSetPlus object. It will return a SnpCallSetPlus object, otherwise.

NetAffx Biological Annotations

Description

Gets NetAffx Biological Annotations saved in the annotation package (Exon and Gene ST Affymetrix arrays).

Usage

getNetAffx(object, type = "probeset")

Arguments

ArgumentDescription
object'ExpressionSet' object (eg., result of rma())
typeEither 'probeset' or 'transcript', depending on what type of summaries were obtained.

Details

This retrieves NetAffx annotation saved in the (pd) annotation package

  • annotation(object). It is only available for Exon ST and Gene ST arrays.

The 'type' argument should match the summarization target used to generate 'object'. The 'rma' method allows for two targets: 'probeset' (target='probeset') and 'transcript' (target='core', target='full', target='extended').

Value

'AnnotatedDataFrame' that can be used as featureData(object)

Author

Benilton Carvalho

Link to this function

getNgsColorsInfo()

Helper function to extract color information for filenames on NimbleGen arrays.

Description

This function will (try to) extract the color information for NimbleGen arrays. This is useful when using read.xysfiles2 to parse XYS files for Tiling applications.

Usage

getNgsColorsInfo(path = ".", pattern1 = "_532", pattern2 = "_635", ...)

Arguments

ArgumentDescription
pathpath where to look for files
pattern1pattern to match files supposed to go to the first channel
pattern2pattern to match files supposed to go to the second channel
list()extra arguments for list.xysfiles

Details

Many NimbleGen samples are identified following the pattern sampleID_532.XYS / sampleID_635.XYS.

The function suggests sample names if all the filenames follow the standard above.

Value

A data.frame with, at least, two columns: 'channel1' and 'channel2'. A third column, 'sampleNames', is returned if the filenames follow the sampleID_532.XYS / sampleID_635.XYS standard.

Author

Benilton Carvalho bcarvalh@jhsph.edu

Link to this function

getPlatformDesign()

Retrieve Platform Design object

Description

Retrieve platform design object.

Usage

getPlatformDesign(object)
getPD(object)

Arguments

ArgumentDescription
objectFeatureSet object

Details

Retrieve platform design object.

Value

platformDesign or PDInfo object.

Probe information selector.

Description

A tool to simplify the selection of probe information, so user does not need to use the SQL approaches.

Usage

getProbeInfo(object, field, probeType = "pm", target = "core", sortBy = c("fid", "man_fsetid", "none"), ...)

Arguments

ArgumentDescription
objectFeatureSet object.
fieldcharacter string with names of field(s) of interest to be obtained from database.
probeTypecharacter string: 'pm' or 'mm'
targetUsed only for Exon or Gene ST arrays: 'core', 'full', 'extended', 'probeset'.
sortByField to be used for sorting.
...Arguments to be passed to subset

Value

A data.frame with the probe level information.

Note

The code allows for querying info on MM probes, however it has been used mostly on PM probes.

Author

Benilton Carvalho

Examples

if (require(oligoData)){
data(affyGeneFS)
availProbeInfo(affyGeneFS)
probeInfo <- getProbeInfo(affyGeneFS, c('fid', 'x', 'y', 'chrom'))
head(probeInfo)
## Selecting antigenomic background probes
agenGene <- getProbeInfo(affyGeneFS, field=c('fid', 'fsetid', 'type'), target='probeset', subset= type == 'control->bgp->antigenomic')
head(agenGene)
}

Density estimate

Description

Plot the density estimates for each sample

Usage

list(list("hist"), list("FeatureSet"))(x, transfo=log2, which=c("pm", "mm", "bg", "both", "all"),
                   nsample=10000, ...)
list(list("hist"), list("ExpressionSet"))(x, transfo=identity, nsample=10000, ...)

Arguments

ArgumentDescription
xFeatureSet or ExpressionSet object
transfoa function to transform the data before plotting. See 'Details'.
nsamplenumber of units to sample and build the plot.
whichset of probes to be plotted ("pm", "mm", "bg", "both", "all").
list()arguments to be passed to matplot

Details

The 'transfo' argument will set the transformation to be used. For raw data, 'transfo=log2' is a common practice. For summarized data (which are often in log2-scale), no transformation is needed (therefore 'transfo=identity').

Note

The hist methods for FeatureSet and Expression use a sample (via sample ) of the probes/probesets to produce the plot (unless nsample > nrow(x)). Therefore, the user interested in reproducibility is advised to use set.seed .

Display a pseudo-image of a microarray chip

Description

Produces a pseudo-image ( graphics::image ) for each sample.

Usage

list(list("image"), list("FeatureSet"))(x, which, transfo=log2, ...)
list(list("image"), list("PLMset"))(x, which=0,
                   type=c("weights","resids", "pos.resids","neg.resids","sign.resids"),
                   use.log=TRUE, add.legend=FALSE, standardize=FALSE,
                   col=NULL, main, ...)

Arguments

ArgumentDescription
xFeatureSet object
whichinteger indices of samples to be plotted (optional).
transfofunction to be applied to the data prior to plotting.
typeType of statistics to be used.
use.logUse log.
add.legendAdd legend.
standardizeStandardize residuals.
colColors to be used.
mainMain title.
list()parameters to be passed to image

Examples

if(require(oligoData) & require(pd.hg18.60mer.expr)){
data(nimbleExpressionFS)
par(mfrow=c(1, 2))
image(nimbleExpressionFS, which=4)
##  fit <- fitPLM(nimbleExpressionFS)
##  image(fit, which=4)
plot(1) ## while fixing fitPLM TODO
}

Summarization of SNP data

Description

This function implements the SNPRMA method for summarization of SNP data. It works directly with the CEL files, saving memory.

Usage

justSNPRMA(filenames, verbose = TRUE, phenoData = NULL, normalizeToHapmap = TRUE)

Arguments

ArgumentDescription
filenamescharacter vector with the filenames.
verboselogical flag for verbosity.
phenoDataa phenoData object or NULL
normalizeToHapmapNormalize to Hapmap? Should always be TRUE, but it's kept here for future use.

Value

SnpQSet or a SnpCnvQSet , depending on the array type.

Examples

## snprmaResults <- justSNPRMA(list.celfiles())

List XYS files

Description

Lists the XYS files.

Usage

list.xysfiles(...)

Arguments

ArgumentDescription
list()parameters to be passed to list.files

Details

The functions interface list.files and the user is asked to check that function for further details.

Value

Character vector with the filenames.

Seealso

list.files

Examples

list.xysfiles()

Defunct Functions in Package 'oligo'

Description

The functions or variables listed here are no longer part of 'oligo'

Usage

fitPLM(...)
coefs(...)
resids(...)

Arguments

ArgumentDescription
...Arguments.

Details

fitPLM was replaced by fitProbeLevelModel , allowing faster execution and providing more specific models. fitPLM was based in the code written by Ben Bolstad in the affyPLM package. However, all the model-fitting functions are now in the package preprocessCore , on which fitProbeLevelModel depends.

coefs and resids , like fitPLM , were inherited from the affyPLM package. They were replaced respectively by coef and residuals , because this is how these statistics are called everywhere else in R .

Link to this function

oligoPLM_class()

Class "oligoPLM"

Description

A class to represent Probe Level Models.

Seealso

rma , summarize

Author

This is a port from Ben Bolstad's work implemented in the affyPLM package. Problems with the implementation in oligo should be reported to the package's maintainer.

References

Bolstad, BM (2004) list("Low Level Analysis of High-density ", " Oligonucleotide Array Data: Background, Normalization and ", " Summarization") . PhD Dissertation. University of California, Berkeley.

Examples

## TODO: review code and fix broken
if (require(oligoData)){
data(nimbleExpressionFS)
fit <- fitProbeLevelModel(nimbleExpressionFS)
image(fit)
NUSE(fit)
RLE(fit)
}
Link to this function

oligo_package()

The oligo package: a tool for low-level analysis of oligonucleotide arrays

Description

The oligo package provides tools to preprocess different oligonucleotide arrays types: expression, tiling, SNP and exon chips. The supported manufacturers are Affymetrix and NimbleGen.

It offers support to large datasets (when the bigmemory is loaded) and can execute preprocessing tasks in parallel (if, in addition to bigmemory , the snow package is also loaded).

Details

The package will read the raw intensity files (CEL for Affymetrix; XYS for NimbleGen) and allow the user to perform analyses starting at the feature-level.

Reading in the intensity files require the existence of data packages that contain the chip specific information (X/Y coordinates; feature types; sequence). These data packages packages are built using the pdInfoBuilder package.

For Affymetrix SNP arrays, users are asked to download the already built annotation packages from BioConductor. This is because these packages contain metadata that are not automatically created. The following annotation packages are available:

50K Xba - pd.mapping50kxba.240 50K Hind - pd.mapping50khind.240 250K Sty - pd.mapping250k.sty 250K Nsp - pd.mapping250k.nsp GenomeWideSnp 5 (SNP 5.0) - pd.genomewidesnp.5 GenomeWideSnp 6 (SNP 6.0) - pd.genomewidesnp.6

For users interested in genotype calls for SNP 5.0 and 6.0 arrays, we strongly recommend the use use the crlmm package, which implements a more efficient version of CRLMM.

Author

Benilton Carvalho - carvalho@bclab.org

References

Carvalho, B.; Bengtsson, H.; Speed, T. P. & Irizarry, R. A. Exploration, Normalization, and Genotype Calls of High Density Oligonucleotide SNP Array Data. Biostatistics, 2006.

Methods for P/A Calls

Description

Methods for Present/Absent Calls are meant to provide means of assessing whether or not each of the (PM) intensities are compatible with observations generated by background probes.

Usage

paCalls(object, method, ..., verbose=TRUE)
list(list("paCalls"), list("ExonFeatureSet"))(object, method, verbose = TRUE)
list(list("paCalls"), list("GeneFeatureSet"))(object, method, verbose = TRUE)
list(list("paCalls"), list("ExpressionFeatureSet"))(object, method, ..., verbose = TRUE)

Arguments

ArgumentDescription
objectExon/Gene/Expression-FeatureSet object.
methodString defining what method to use. See 'Details'.
...Additional arguments passed to MAS5. See 'Details'
verboseLogical flag for verbosity.

Details

For Whole Transcript arrays (Exon/Gene) the valid options for method are 'DABG' (p-values for each probe) and 'PSDABG' (p-values for each probeset). For Expression arrays, the only option currently available for method is 'MAS5'.

ABOUT MAS5 CALLS:

The additional arguments that can be passed to MAS5 are:

  • alpha1 : a significance threshold in (0, alpha2);

  • alpha2 : a significance threshold in (alpha1, 0.5);

  • tau : a small positive constant;

  • ignore.saturated : if TRUE, do the saturation correction described in the paper, with a saturation level of 46000;

This function performs the hypothesis test:

H0: median(Ri) = tau, corresponding to absence of transcript H1: median(Ri) > tau, corresponding to presence of transcript

where Ri = (PMi - MMi) / (PMi + MMi) for each i a probe-pair in the probe-set represented by data.

The p-value that is returned estimates the usual quantity:

| Pr(observing a more "present looking" probe-set than data | data is absent)|

So that small p-values imply presence while large ones imply absence of transcript. The detection call is computed by thresholding the p-value as in:

call "P" if p-value < alpha1 call "M" if alpha1 <= p-value < alpha2 call "A" if alpha2 <= p-value

Value

A matrix (of dimension dim(PM) if method="DABG" or "MAS5"; of dimension length(unique(probeNames(object))) x ncol(object) if method="PSDABG") with p-values for P/A Calls.

Author

Benilton Carvalho

References

Clark et al. Discovery of tissue-specific exons using comprehensive human exon microarrays. Genome Biol (2007) vol. 8 (4) pp. R64

Liu, W. M. and Mei, R. and Di, X. and Ryder, T. B. and Hubbell, E. and Dee, S. and Webster, T. A. and Harrington, C. A. and Ho, M. H. and Baid, J. and Smeekens, S. P. (2002) Analysis of high density expression microarrays with signed-rank call algorithms, Bioinformatics, 18(12), pp. 1593--1599.

Liu, W. and Mei, R. and Bartell, D. M. and Di, X. and Webster, T. A. and Ryder, T. (2001) Rank-based algorithms for analysis of microarrays, Proceedings of SPIE, Microarrays: Optical Technologies and Informatics, 4266.

Affymetrix (2002) Statistical Algorithms Description Document, Affymetrix Inc., Santa Clara, CA, whitepaper. http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf

Examples

if (require(oligoData) & require(pd.huex.1.0.st.v2)){
data(affyExonFS)
## Get only 2 samples for example
dabgP = paCalls(affyExonFS[, 1:2])
dabgPS = paCalls(affyExonFS[, 1:2], "PSDABG")
head(dabgP) ## for probe
head(dabgPS) ## for probeset
}
Link to this function

plotM_methods()

Methods for Log-Ratio plotting

Description

The plotM methods are meant to plot log-ratios for different classes of data.

Access the allele information for PM probes.

Description

Accessor to the allelic information for PM probes.

Usage

pmAllele(object)

Arguments

ArgumentDescription
objectSnpFeatureSet or PDInfo object.
Link to this function

pmFragmentLength()

Access the fragment length for PM probes.

Description

Accessor to the fragment length for PM probes.

Usage

pmFragmentLength(object, enzyme, type=c('snp', 'cn'))

Arguments

ArgumentDescription
objectPDInfo or SnpFeatureSet object.
enzymeEnzyme to be used for query. If missing, all enzymes are used.
typeType of probes to be used: 'snp' for SNP probes; 'cn' for Copy Number probes.

Value

A list of length equal to the number of enzymes used for digestion. Each element of the list is a data.frame containing:

  • row : the row used to link to the PM matrix;

  • length : expected fragment length.

Note

There is not a 1:1 relationship between probes and expected fragment length. For one enzyme, a given probe may be associated to multiple fragment lengths. Therefore, the number of rows in the data.frame may not match the number of PM probes and the row column should be used to match the fragment length with the PM matrix.

Accessor to position information

Description

pmPosition will return the genomic position for the (PM) probes.

Usage

%position(object)pmPosition(object)
pmOffset(object)

Arguments

ArgumentDescription
objectAffySNPPDInfo , TilingFeatureSet or SnpCallSet object

Details

pmPosition will return genomic position for PM probes on a tiling array.

pmOffset will return the offset information for PM probes on SNP arrays.

Accessor to the strand information

Description

Returns the strand information for PM probes (0 - sense / 1 - antisense).

Usage

pmStrand(object)

Arguments

ArgumentDescription
objectAffySNPPDInfo or TilingFeatureSet object
Link to this function

preprocessTools()

Tools for microarray preprocessing.

Description

These are tools to preprocess microarray data. They include background correction, normalization and summarization methods.

Usage

backgroundCorrectionMethods()
normalizationMethods()
summarizationMethods()
backgroundCorrect(object, method=backgroundCorrectionMethods(), copy=TRUE, extra, subset=NULL, target='core', verbose=TRUE)
summarize(object, probes=rownames(object), method="medianpolish", verbose=TRUE, ...)
list(list("normalize"), list("FeatureSet"))(object, method=normalizationMethods(), copy=TRUE, subset=NULL,target='core', verbose=TRUE, ...)
list(list("normalize"), list("matrix"))(object, method=normalizationMethods(), copy=TRUE, verbose=TRUE, ...)
list(list("normalize"), list("ff_matrix"))(object, method=normalizationMethods(), copy=TRUE, verbose=TRUE, ...)
normalizeToTarget(object, targetDist, method="quantile", copy=TRUE, verbose=TRUE)

Arguments

ArgumentDescription
objectObject containing probe intensities to be preprocessed.
methodString determining which method to use at that preprocessing step.
targetDistVector with the target distribution
probesCharacter vector that identifies the name of the probes represented by the rows of object .
copyLogical flag determining if data must be copied before processing (TRUE), or if data can be overwritten (FALSE).
subsetNot yet implemented.
targetOne of the following values: 'core', 'full', 'extended', 'probeset'. Used only with Gene ST and Exon ST designs.
extraExtra arguments to be passed to other methods.
verboseLogical flag for verbosity.
list()Arguments to be passed to methods.

Details

Number of rows of object must match the length of probes .

Value

backgroundCorrectionMethods and normalizationMethods will return a character vector with the methods implemented currently.

backgroundCorrect , normalize and normalizeToTarget will return a matrix with same dimensions as the input matrix. If they are applied to a FeatureSet object, the PM matrix will be used as input.

The summarize method will return a matrix with length(unique(probes)) rows and ncol(object) columns.

Examples

ns <- 100
nps <- 1000
np <- 10
intensities <- matrix(rnorm(ns*nps*np, 8000, 400), nc=ns)
ids <- rep(as.character(1:nps), each=np)
bgCorrected <- backgroundCorrect(intensities)
normalized <- normalize(bgCorrected)
summarizationMethods()
expression <- summarize(normalized, probes=ids)
intensities[1:20, 1:3]
expression[1:20, 1:3]
target <- rnorm(np*nps)
normalizedToTarget <- normalizeToTarget(intensities, target)

if (require(oligoData) & require(pd.hg18.60mer.expr)){
## Example of normalization with real data
data(nimbleExpressionFS)
boxplot(nimbleExpressionFS, main='Original')
for (mtd in normalizationMethods()){
message('Normalizing with ', mtd)
res <- normalize(nimbleExpressionFS, method=mtd, verbose=FALSE)
boxplot(res, main=mtd)
}
}

Accessor to feature names

Description

Accessors to featureset names.

Usage

probeNames(object, subset = NULL, ...)
probesetNames(object, ...)

Arguments

ArgumentDescription
objectFeatureSet or DBPDInfo
subsetnot implemented yet.
list()Arguments (like 'target') passed to downstream methods.

Value

probeNames returns a string with the probeset names for each probe on the array. probesetNames , on the other hand, returns the unique probeset names.

Link to this function

readSummaries()

Read summaries generated by crlmm

Description

This function read the different summaries generated by crlmm.

Usage

readSummaries(type, tmpdir)

Arguments

ArgumentDescription
typetype of summary of character class: 'alleleA', 'alleleB', 'alleleA-sense', 'alleleA-antisense', 'alleleB-sense', 'alleleB-antisense', 'calls', 'llr', 'conf'.
tmpdirdirectory containing the output saved by crlmm

Details

On the 50K and 250K arrays, given a SNP, there are probes on both strands (sense and antisense). For this reason, the options 'alleleA-sense', 'alleleA-antisense', 'alleleB-sense' and 'alleleB-antisense' should be used only with such arrays (XBA, HIND, NSP or STY).

On the SNP 5.0 and SNP 6.0 platforms, this distinction does not exist in terms of algorithm (note that the actual strand could be queried from the annotation package). For these arrays, options 'alleleA', 'alleleB' are the ones to be used.

The options calls , llr and conf will return, respectivelly, the CRLMM calls, log-likelihood ratios (for devel purpose only) and CRLMM confidence calls matrices.

Value

Matrix with values of summaries.

Parser to CEL files

Description

Reads CEL files.

Usage

read.celfiles(..., filenames, pkgname, phenoData, featureData,
experimentData, protocolData, notes, verbose=TRUE, sampleNames,
rm.mask=FALSE, rm.outliers=FALSE, rm.extra=FALSE, checkType=TRUE)
read.celfiles2(channel1, channel2, pkgname, phenoData, featureData,
experimentData, protocolData, notes, verbose=TRUE, sampleNames,
rm.mask=FALSE, rm.outliers=FALSE, rm.extra=FALSE, checkType=TRUE)

Arguments

ArgumentDescription
...names of files to be read.
filenamesa character vector with the CEL filenames.
channel1a character vector with the CEL filenames for the first 'channel' on a Tiling application
channel2a character vector with the CEL filenames for the second 'channel' on a Tiling application
pkgnamealternative data package to be loaded.
phenoDataphenoData
featureDatafeatureData
experimentDataexperimentData
protocolDataprotocolData
notesnotes
verboselogical
sampleNamescharacter vector with sample names (usually better descriptors than the filenames)
rm.masklogical . Read masked?
rm.outlierslogical . Remove outliers?
rm.extralogical . Remove extra?
checkTypelogical . Check type of each file? This can be time consuming.

Details

When using 'affyio' to read in CEL files, the user can read compressed CEL files (CEL.gz). Additionally, 'affyio' is much faster than 'affxparser'.

The function guesses which annotation package to use from the header of the CEL file. The user can also provide the name of the annotaion package to be used (via the pkgname argument). If the annotation package cannot be loaded, the function returns an error. If the annotation package is not available from BioConductor, one can use the pdInfoBuilder package to build one.

Value

*

Seealso

list.celfiles , read.xysfiles

Examples

if(require(pd.mapping50k.xba240) & require(hapmap100kxba)){
celPath <- system.file("celFiles", package="hapmap100kxba")
celFiles <- list.celfiles(celPath, full.name=TRUE)
affySnpFeatureSet <- read.celfiles(celFiles)
}

Parser to XYS files

Description

NimbleGen provides XYS files which are read by this function.

Usage

read.xysfiles(..., filenames, pkgname, phenoData, featureData,
experimentData, protocolData, notes, verbose=TRUE, sampleNames,
checkType=TRUE)
read.xysfiles2(channel1, channel2, pkgname, phenoData, featureData,
experimentData, protocolData, notes, verbose=TRUE, sampleNames,
checkType=TRUE)

Arguments

ArgumentDescription
...file names
filenamescharacter vector with filenames.
channel1a character vector with the XYS filenames for the first 'channel' on a Tiling application
channel2a character vector with the XYS filenames for the second 'channel' on a Tiling application
pkgnamecharacter vector with alternative PD Info package name
phenoDataphenoData
featureDatafeatureData
experimentDataexperimentData
protocolDataprotocolData
notesnotes
verboseverbose
sampleNamescharacter vector with sample names (usually better descriptors than the filenames)
checkTypelogical . Check type of each file? This can be time consuming.

Details

The function will read the XYS files provided by NimbleGen Systems and return an object of class FeatureSet.

The function guesses which annotation package to use from the header of the XYS file. The user can also provide the name of the annotaion package to be used (via the pkgname argument). If the annotation package cannot be loaded, the function returns an error. If the annotation package is not available from BioConductor, one can use the pdInfoBuilder package to build one.

Value

*

Seealso

list.xysfiles , read.celfiles

Examples

if (require(maqcExpression4plex) & require(pd.hg18.60mer.expr)){
xysPath <- system.file("extdata", package="maqcExpression4plex")
xysFiles <- list.xysfiles(xysPath, full.name=TRUE)
ngsExpressionFeatureSet <- read.xysfiles(xysFiles)
}

RMA - Robust Multichip Average algorithm

Description

Robust Multichip Average preprocessing methodology. This strategy allows background subtraction, quantile normalization and summarization (via median-polish).

Usage

list(list("rma"), list("ExonFeatureSet"))(object, background=TRUE, normalize=TRUE, subset=NULL, target="core")
  list(list("rma"), list("HTAFeatureSet"))(object, background=TRUE, normalize=TRUE, subset=NULL, target="core")
  list(list("rma"), list("ExpressionFeatureSet"))(object, background=TRUE, normalize=TRUE, subset=NULL)
  list(list("rma"), list("GeneFeatureSet"))(object, background=TRUE, normalize=TRUE, subset=NULL, target="core")
  list(list("rma"), list("SnpCnvFeatureSet"))(object, background=TRUE, normalize=TRUE, subset=NULL)

Arguments

ArgumentDescription
objectExon/HTA/Expression/Gene/SnpCnv-FeatureSet object.
backgroundLogical - perform RMA background correction?
normalizeLogical - perform quantile normalization?
subsetTo be implemented.
targetLevel of summarization (only for Exon/Gene arrays)

Seealso

snprma

References

Rafael. A. Irizarry, Benjamin M. Bolstad, Francois Collin, Leslie M. Cope, Bridget Hobbs and Terence P. Speed (2003), Summaries of Affymetrix GeneChip probe level data Nucleic Acids Research 31(4):e15

Bolstad, B.M., Irizarry R. A., Astrand M., and Speed, T.P. (2003), A Comparison of Normalization Methods for High Density O ligonucleotide Array Data Based on Bias and Variance. Bioinformatics 19(2):185-193

Irizarry, RA, Hobbs, B, Collin, F, Beazer-Barclay, YD, Antonellis, KJ, Scherf, U, Speed, TP (2003) Exploration, Normalizati on, and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics. Vol. 4, Number 2: 249-264

Examples

if (require(maqcExpression4plex) & require(pd.hg18.60mer.expr)){
xysPath <- system.file("extdata", package="maqcExpression4plex")
xysFiles <- list.xysfiles(xysPath, full.name=TRUE)
ngsExpressionFeatureSet <- read.xysfiles(xysFiles)
summarized <- rma(ngsExpressionFeatureSet)
show(summarized)
}

Date of scan

Description

Retrieves date information in CEL/XYS files.

Usage

runDate(object)

Arguments

ArgumentDescription
object'FeatureSet' object.
Link to this function

sequenceDesignMatrix()

Create design matrix for sequences

Description

Creates design matrix for sequences.

Usage

sequenceDesignMatrix(seqs)

Arguments

ArgumentDescription
seqscharacter vector of 25-mers.

Details

This assumes all sequences are 25bp long.

The design matrix is often used when the objecive is to adjust intensities by sequence.

Value

Matrix with length(seqs) rows and 75 columns.

Examples

genSequence <- function(x)
paste(sample(c("A", "T", "C", "G"), 25, rep=TRUE), collapse="", sep="")
seqs <- sapply(1:10, genSequence)
X <- sequenceDesignMatrix(seqs)
Y <- rnorm(10, mean=12, sd=2)
Ydemean <- Y-mean(Y)
X[1:10, 1:3]
fit <- lm(Ydemean~X)
coef(fit)

Preprocessing SNP Arrays

Description

This function preprocess SNP arrays.

Usage

snprma(object, verbose = TRUE, normalizeToHapmap = TRUE)

Arguments

ArgumentDescription
objectSnpFeatureSet object
verboseVerbosity flag. logical
normalizeToHapmapinternal

Value

A SnpQSet object.