# bioconductor v3.9.0 Vsn

The package implements a method for normalising microarray intensities, and works

# Link to this section Summary

## Functions

Class to contain result of a vsn fit

Class to contain input data and parameters for vsn functions

Wrapper functions for vsn

Intensity data for one cDNA slide with two adjacent tissue samples from a nephrectomy (kidney)

Intensity data for 8 cDNA slides with CLL and DLBL samples from the Alizadeh et al. paper in Nature 2000

Plot row standard deviations versus row means

Wrapper for vsn to be used as a normalization method with expresso

Simulate data and assess vsn's parameter estimation

The transformation that is applied to the scaling parameter of the vsn model

Fit the vsn model

Apply the vsn transformation to data

Calculate the log likelihood and its gradient for the vsn model

vsn

# Link to this section Functions

# classvsn()

Class to contain result of a vsn fit

## Description

Class to contain result of a vsn fit

## Seealso

## Author

Wolfgang Huber

## Examples

```
data("kidney")
v = vsn2(kidney)
show(v)
dim(v)
v[1:10, ]
```

# classvsnInput()

Class to contain input data and parameters for vsn functions

## Description

Class to contain input data and parameters for vsn functions

## Seealso

## Author

Wolfgang Huber

# justvsn()

Wrapper functions for vsn

## Description

`justvsn`

is equivalent to calling
list("
", " fit = vsn2(x, ...)
", " nx = predict(fit, newdata=x, useDataInFit = TRUE)
")
`vsnrma`

is a wrapper around `vsn2`

and `rma`

.

## Usage

```
justvsn(x, ...)
vsnrma(x, ...)
```

## Arguments

Argument | Description |
---|---|

`x` | For `justvsn` , any kind of object for which `vsn2` methods exist. For `vsnrma` , an `AffyBatch` . |

`list()` | Further arguments that get passed on to `vsn2` . |

## Details

`vsnrma`

does probe-wise
background correction and between-array normalization by calling
`vsn2`

on the perfect match (PM) values only. Probeset
summaries are calculated with the medianpolish algorithm of
`rma`

.

## Value

`justvsn`

returns the vsn-normalised intensities in
an object generally of the same class as its first
argument (see the man page of `predict`

for
details). It preserves the metadata.

`vsnrma`

returns an `ExpressionSet`

.

## Seealso

## Author

Wolfgang Huber

## Examples

```
##--------------------------------------------------
## use "vsn2" to produce a "vsn" object
##--------------------------------------------------
data("kidney")
fit = vsn2(kidney)
nkid = predict(fit, newdata=kidney)
##--------------------------------------------------
## justvsn on ExpressionSet
##--------------------------------------------------
nkid2 = justvsn(kidney)
stopifnot(identical(exprs(nkid), exprs(nkid2)))
##--------------------------------------------------
## justvsn on RGList
##--------------------------------------------------
rg = new("RGList", list(R=exprs(kidney)[,1,drop=FALSE], G=exprs(kidney)[,2,drop=FALSE]))
erge = justvsn(rg)
```

# kidney()

Intensity data for one cDNA slide with two adjacent tissue samples from a nephrectomy (kidney)

## Description

Intensity data for one cDNA slide with two adjacent tissue samples from a nephrectomy (kidney)

## Format

`kidney`

is an
`ExpressionSet`

containing the data from one cDNA
chip. The 8704x2 matrix `exprs(kidney)`

contains the
spot intensities for the red (635 nm) and green color channels
(532 nm) respectively. For each spot, a background estimate from a
surrounding region was subtracted.

## Usage

`data(kidney)`

## Details

The chip was produced in 2001 by Holger Sueltmann at the Division of Molecular Genome Analysis at the German Cancer Research Center in Heidelberg.

## References

Huber W, Boer JM, von Heydebreck A, Gunawan B, Vingron M, Fuzesi L, Poustka A, Sueltmann H. Transcription profiling of renal cell carcinoma. Verh Dtsch Ges Pathol. 2002;86:153-64. PMID: 12647365

## Examples

```
data("kidney")
plot(exprs(kidney), pch = ".", log = "xy")
abline(a = 0, b = 1, col = "blue")
```

# lymphoma()

Intensity data for 8 cDNA slides with CLL and DLBL samples from the Alizadeh et al. paper in Nature 2000

## Description

8 cDNA chips from Alizadeh lymphoma paper

## Format

`lymphoma`

is an `ExpressionSet`

containing the data from 8 chips
from the lymphoma data set by Alizadeh et al. (see references). Each
chip represents two samples: on color channel 1 (CH1, Cy3, green) the
common reference sample, and on color channel 2 (CH2, Cy5, red) the
various disease samples. See `pData(lymphoma)`

. The 9216x16
matrix `exprs(lymphoma)`

contains the background-subtracted spot
intensities (CH1I-CH1B and CH2I-CH2B, respectively).

## Usage

`data(lymphoma)`

## Details

The chip intensity files were downloaded from the Stanford
microarray database. Starting from the link below, this was done by
following the links list("Published Data") ->
list("Alizadeh AA, et al. (2000) Nature 403(6769):503-11") ->
list("Data in SMD") -> list("Display Data") , and selecting the following
8 slides:
list(list("l"), list("
", "lc7b019", list(), "
", "lc7b047", list(), "
", "lc7b048", list(), "
", "lc7b056", list(), "
", "lc7b057", list(), "
", "lc7b058", list(), "
", "lc7b069", list(), "
", "lc7b070
"))
Then, the script `makedata.R`

from the `scripts`

subdirectory
of this package was run to generate the list() data object.

## References

A. Alizadeh et al., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503-11, Feb 3, 2000.

## Examples

```
data("lymphoma")
lymphoma
pData(lymphoma)
```

# meanSdPlot()

Plot row standard deviations versus row means

## Description

Methods for objects of classes
`matrix`

,
`ExpressionSet`

,
vsn and
`MAList`

to plot row standard deviations versus row means.

## Usage

```
meanSdPlot(x,
ranks = TRUE,
xlab = ifelse(ranks, "rank(mean)", "mean"),
ylab = "sd",
pch,
plot = TRUE,
bins = 50,
list())
```

## Arguments

Argument | Description |
---|---|

`x` | An object of class `matrix` , `ExpressionSet` , vsn or `MAList` . |

`ranks` | Logical, indicating whether the x-axis (means) should be plotted on the original scale ( `FALSE` ) or on the rank scale ( `TRUE` ). The latter distributes the data more evenly along the x-axis and allows a better visual assessment of the standard deviation as a function of the mean. |

`xlab` | Character, label for the x-axis. |

`ylab` | Character, label for the y-axis. |

`pch` | Ignored - exists for backward compatibility. |

`plot` | Logical. If `TRUE` (default), a plot is produced. Calling the function with `plot=FALSE` can be useful if only its return value is of interest. |

`bins` | Gets passed on to `stat_binhex` . |

`list()` | Further arguments that get passed on to `stat_binhex` . |

## Details

Standard deviation and mean are calculated row-wise from the
expression matrix (in) `x`

. The scatterplot of these versus each other
allows you to visually verify whether there is a dependence of the standard
deviation (or variance) on the mean.
The red line depicts the running median estimator (window-width 10%).
If there is no variance-mean dependence, then the line should be approximately horizontal.

## Value

A named list with five components: its elements `px`

and
`py`

are the x- and y-coordinates of the individual data points
in the plot; its first and second element are the x-coordinates and values of
the running median estimator (the red line in the plot).
Its element `gg`

is the plot object (see examples).
Depending on the value of `plot`

, the method can (and by default does) have a side effect,
which is to print `gg`

on the active graphics device.

## Author

Wolfgang Huber

## Examples

```
data("kidney")
log.na <- function(x) log(ifelse(x>0, x, NA))
exprs(kidney) <- log.na(exprs(kidney))
msd <- meanSdPlot(kidney)
## The `ggplot` object is returned in list element `gg`, here is an example of how to modify the plot
library("ggplot2")
msd$gg + ggtitle("Hello world") + scale_fill_gradient(low = "yellow", high = "darkred") + scale_y_continuous(limits = c(0, 7))
## Try this out with not log-transformed data, vsn2-transformed data, the lymphoma data, your data ...
```

# normalizeAffyBatchvsn()

Wrapper for vsn to be used as a normalization method with expresso

## Description

Wrapper for `vsn2`

to be used as a normalization
method with the expresso function of the package affy. The expresso
function is deprecated, consider using `justvsn`

instead. The normalize.AffyBatch.vsn can still be useful on its own,
as it provides some additional control of the normalization process
(fitting on subsets, alternate transform parameters).

## Usage

```
normalize.AffyBatch.vsn(
abatch,
reference,
strata = NULL,
subsample = if (nrow(exprs(abatch))>30000L) 30000L else 0L,
subset,
log2scale = TRUE,
log2asymp=FALSE,
...)
```

## Arguments

Argument | Description |
---|---|

`abatch` | An object of type `AffyBatch` . |

`reference` | Optional, a 'vsn' object from a previous fit. If this argument is specified, the data in 'x' are normalized "towards" an existing set of reference arrays whose parameters are stored in the object 'reference'. If this argument is not specified, then the data in 'x' are normalized "among themselves". See `vsn2` for details. |

`strata` | The 'strata' functionality is not supported, the parameter is ignored. |

`subsample` | Is passed on to `vsn2` . |

`subset` | This allows the specification of a subset of expression measurements to be used for the vsn fit. The transformation with the parameters of this fit is then, however, applied to the whole dataset. This is useful for excluding expression measurements that are known to be differentially expressed or control probes that may not match the vsn model, thus avoiding that they influence the normalization process. This operates at the level of probesets, not probes. Both 'subset' and 'subsample' can be used together. |

`log2scale` | If TRUE, this will perform a global affine transform on the data to put them on a similar scale as the original non-transformed data. Many users prefer this. Fold-change estimates are not affected by this transform. In some situations, however, it may be helpful to turn this off, e.g., when comparing independently normalized subsets of the data. |

`log2asymp` | If TRUE, this will perform a global affine transform on the data to make the generalized log (asinh) transform be asymptotically identical to a log base 2 transform. Some people find this helpful. Only one of 'log2scale' or 'log2asymp' can be set to TRUE. Fold-change estimates are not affected by this transform. |

`...` | Further parameters for `vsn2` . |

## Details

Please refer to the Details and References
sections of the
man page for `vsn2`

for more details about this method.

Important note : after calling `vsn2`

, the function
`normalize.AffyBatch.vsn`

exponentiates the data (base 2).
This is done in order to make the behavior of this function similar to the
other normalization methods in affy. That packages uses the convention
of taking the logarithm to base in
subsequent analysis steps (e.g. in `medpolish`

).

## Value

An object of class `AffyBatch`

.
The `vsn`

object returned, which can be used as `reference`

for
subsequent fits, is provided by
`description(abatch)@preprocessing$vsnReference`

.

## Seealso

## Author

D. P. Kreil http://bioinf.boku.ac.at/ , Wolfgang Huber

## Examples

`## Please see vignette.`

# sagmbSimulateData()

Simulate data and assess vsn's parameter estimation

## Description

Functions to validate and assess the performance of vsn through simulation of data.

## Usage

```
sagmbSimulateData(n=8064, d=2, de=0, up=0.5, nrstrata=1, miss=0, log2scale=FALSE)
sagmbAssess(h1, sim)
```

## Arguments

Argument | Description |
---|---|

`n` | Numeric. Number of probes (rows). |

`d` | Numeric. Number of arrays (columns). |

`de` | Numeric. Fraction of differentially expressed genes. |

`up` | Numeric. Fraction of up-regulated genes among the differentially expressed genes. |

`nrstrata` | Numeric. Number of probe strata. |

`miss` | Numeric. Fraction of data points that is randomly sampled and set to `NA` . |

`log2scale` | Logical. If `TRUE` , glog on base 2 is used, if `FALSE` , (the default), then base e. |

`h1` | Matrix. Calibrated and transformed data, according, e.g., to vsn |

`sim` | List. The output of a previous call to `sagmbSimulateData` , see Value |

## Details

Please see the vignette.

## Value

For `sagmbSimulateData`

, a list with four components:
`hy`

, an `n x d`

matrix with the true (=simulated)
calibrated, transformed data;
`y`

, an `n x d`

matrix with the simulated
uncalibrated raw data - this is intended to be fed into
`vsn2`

;
`is.de`

, a logical vector of length `n`

, specifying
which probes are simulated to be differentially expressed.
`strata`

, a factor of length `n`

.

For `sagmbSimulateData`

, a number: the root mean squared
difference between true and estimated transformed data.

## Author

Wolfgang Huber

## References

Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, and Martin Vingron (2003) "Parameter estimation for the calibration and variance stabilization of microarray data", Statistical Applications in Genetics and Molecular Biology: Vol. 2: No. 1, Article 3. http://www.bepress.com/sagmb/vol2/iss1/art3

## Examples

```
sim <- sagmbSimulateData(nrstrata = 4)
ny <- vsn2(sim$y, strata = sim$strata)
res <- sagmbAssess(exprs(ny), sim)
res
```

# scalingFactorTransformation()

The transformation that is applied to the scaling parameter of the vsn model

## Description

The transformation that is applied to the scaling parameter of the vsn model

## Usage

`scalingFactorTransformation(b)`

## Arguments

Argument | Description |
---|---|

`b` | Real vector. |

## Value

A real vector of same length as b, with transformation `f`

applied (see
vignette Likelihood Calculations for vsn ).

## Author

Wolfgang Huber

## Examples

```
b = seq(-3, 2, length=20)
fb = scalingFactorTransformation(b)
if(interactive())
plot(b, fb, type="b", pch=16)
```

# vsn2()

Fit the vsn model

## Description

`vsn2`

fits the vsn model to the data
in `x`

and returns a list("vsn") object with
the fit parameters and the transformed data matrix.
The data are, typically, feature intensity readings from a
microarray, but this function may also be useful for other kinds of
intensity data that obey an additive-multiplicative error model.
To obtain an object of the same class as `x`

, containing
the normalised data and the same metdata as `x`

, use
list("
", " fit = vsn2(x, ...)
", " nx = predict(fit, newdata=x)
", " ")
or the wrapper `justvsn`

.
Please see the vignette list("Introduction to vsn") .

## Usage

```
vsnMatrix(x,
reference,
strata,
lts.quantile = 0.9,
subsample = 0L,
verbose = interactive(),
returnData = TRUE,
calib = "affine",
pstart,
minDataPointsPerStratum = 42L,
optimpar = list(),
defaultpar = list(factr=5e7, pgtol=2e-4, maxit=60000L,
trace=0L, cvg.niter=7L, cvg.eps=0))
list(list("vsn2"), list("ExpressionSet"))(x, reference, strata, ...)
list(list("vsn2"), list("AffyBatch"))(x, reference, strata, subsample, ...)
list(list("vsn2"), list("NChannelSet"))(x, reference, strata, backgroundsubtract=FALSE,
foreground=c("R","G"), background=c("Rb", "Gb"), ...)
list(list("vsn2"), list("RGList"))(x, reference, strata, ...)
```

## Arguments

Argument | Description |
---|---|

`x` | An object containing the data to which the model is fitted. |

`reference` | Optional, a vsn object from a previous fit. If this argument is specified, the data in `x` are normalized "towards" an existing set of reference arrays whose parameters are stored in the object `reference` . If this argument is not specified, then the data in `x` are normalized "among themselves". See Details for a more precise explanation. |

`strata` | Optional, a `factor` or `integer` whose length is `nrow(x)` . It can be used for stratified normalization (i.e. separate offsets $a$ and factors $b$ for each level of `strata` ). If missing, all rows of `x` are assumed to come from one stratum. If `strata` is an integer, its values must cover the range $1,ldots,n$ , where $n$ is the number of strata. |

`lts.quantile` | Numeric of length 1. The quantile that is used for the resistant least trimmed sum of squares regression. Allowed values are between 0.5 and 1. A value of 1 corresponds to ordinary least sum of squares regression. |

`subsample` | Integer of length 1. If its value is greater than 0, the model parameters are estimated from a subsample of the data of size `subsample` only, yet the fitted transformation is then applied to all data. For large datasets, this can substantially reduce the CPU time and memory consumption at a negligible loss of precision. Note that the `AffyBatch` method of `vsn2` sets a value of `30000` for this parameter if it is missing from the function call - which is different from the behaviour of the other methods. |

`backgroundsubtract` | Logical of length 1: should local background estimates be subtracted before fitting vsn? |

`foreground, background` | Aligned character vectors of the same length, naming the channels of `x` that should be used as foreground and background values. |

`verbose` | Logical. If TRUE, some messages are printed. |

`returnData` | Logical. If TRUE, the transformed data are returned in a slot of the resulting vsn object. Setting this option to `FALSE` allows saving memory if the data are not needed. |

`calib` | Character of length 1. Allowed values are `affine` and `none` . The default, `affine` , corresponds to the behaviour in package versions <= 3.9, and to what is described in references [1] and [2]. The option `none` is an experimental new feature, in which no affine calibration is performed and only two global variance stabilisation transformation parameters `a` and `b` are fitted. This functionality might be useful in conjunction with other calibration methods, such as quantile normalisation - see the vignette Introduction to vsn . |

`pstart` | Optional, a three-dimensional numeric array that specifies start values for the iterative parameter estimation algorithm. If not specified, the function tries to guess useful start values. The first dimension corresponds to the levels of `strata` , the second dimension to the columns of `x` and the third dimension must be 2, corresponding to offsets and factors. |

`minDataPointsPerStratum` | The minimum number of data points per stratum. Normally there is no need for the user to change this; refer to the vignette for further documentation. |

`optimpar` | Optional, a list with parameters for the likelihood optimisation algorithm. Default parameters are taken from `defaultpar` . See details. |

`defaultpar` | The default parameters for the likelihood optimisation algorithm. Values in `optimpar` take precedence over those in `defaultpar` . The purpose of this argument is to expose the default values in this manual page - it is not intended to be changed, please use `optimpar` for that. |

`...` | Arguments that get passed on to `vsnMatrix` . |

## Value

An object of class vsn .

## Seealso

## Author

Wolfgang Huber

## References

[1] Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, Martin Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.

[2] Parameter estimation for the calibration and variance stabilization of microarray data, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, and Martin Vingron; Statistical Applications in Genetics and Molecular Biology (2003) Vol. 2 No. 1, Article 3. http://www.bepress.com/sagmb/vol2/iss1/art3.

[3] L-BFGS-B: Fortran Subroutines for Large-Scale Bound Constrained Optimization, C. Zhu, R.H. Byrd, P. Lu and J. Nocedal, Technical Report, Northwestern University (1996).

[4] Package vignette: Likelihood Calculations for vsn

## Examples

```
data("kidney")
fit = vsn2(kidney) ## fit
nkid = predict(fit, newdata=kidney) ## apply fit
plot(exprs(nkid), pch=".")
abline(a=0, b=1, col="red")
```

# vsn2trsf()

Apply the vsn transformation to data

## Description

Apply the vsn transformation to data.

## Usage

`list(list("predict"), list("vsn"))(object, newdata, strata=object@strata, log2scale=TRUE, useDataInFit=FALSE)`

## Arguments

Argument | Description |
---|---|

`object` | An object of class vsn that contains transformation parameters and strata information, typically this is the result of a previous call to `vsn2` . |

`newdata` | Object of class `ExpressionSet` , `NChannelSet` , `AffyBatch` (from the `affy` package), `RGList` (from the `limma` package), `matrix` or `numeric` , with the data to which the fit is to be applied to. |

`strata` | Optional, a `factor` or `integer` that aligns with the rows of `newdata` ; see the `strata` argument of `vsn2` . |

`log2scale` | If `TRUE` , the data are returned on the glog scale to base 2, and an overall offset c is added (see Value section of the `vsn2` manual page). If `FALSE` , the data are returned on the glog scale to base e, and no offset is added. |

|`useDataInFit`

| If `TRUE`

, then no transformation is attempted and the data stored in `object`

is transferred appropriately into resulting object, which otherwise preserves the class and metadata of `newdata`

. This option exists to increase performance in constructs like list("
", " fit = vsn2(x, ...)
", " nx = predict(fit, newdata=x)
", " ") and is used, for example, in the `justvsn`

function. |

## Value

An object typically of the same class as `newdata`

. There are two
exceptions: if `newdata`

is an
`RGList`

, the return value is an
`NChannelSet`

, and
if `newdata`

is numeric, the return value is a `matrix`

with 1
column.

## Author

Wolfgang Huber

## Examples

```
data("kidney")
## nb: for random subsampling, the 'subsample' argument of vsn
## provides an easier way to do this
fit = vsn2(kidney[sample(nrow(kidney), 500), ])
tn = predict(fit, newdata=exprs(kidney))
```

# vsnLikelihood()

Calculate the log likelihood and its gradient for the vsn model

## Description

`logLik`

calculates the log likelihood and its gradient
for the vsn model. `plotVsnLogLik`

makes a false color plot for
a 2D section of the likelihood landscape.

## Usage

```
list(list("logLik"), list("vsnInput"))(object, p, mu = numeric(0), sigsq=as.numeric(NA), calib="affine")
plotVsnLogLik(object,
p,
whichp = 1:2,
expand = 1,
ngrid = 31L,
fun = logLik,
main = "log likelihood",
...)
```

## Arguments

Argument | Description |
---|---|

`object` | A `vsnInput` object. |

`p` | For `plotVsnLogLik` , a vector or a 3D array with the point in parameter space around which to plot the likelihood. For `logLik` , a matrix whose columns are the set of parameters at which the likelihoods are to be evaluated. |

`mu` | Numeric vector of length 0 or `nrow(object)` . If the length is 0, there is no reference and `sigsq` must be `NA` (the default value). See `vsn2` . |

`sigsq` | Numeric scalar. |

`calib` | as in `vsn2` . |

`whichp` | Numeric vector of length 2, with the indices of those two parameters in `p` along which the section is to be taken. |

`expand` | Numeric vector of length 1 or 2 with expansion factors for the plot range. The range is auto-calculated using a heuristic, but manual adjustment can be useful; see example. |

`ngrid` | Integer scalar, the grid size. |

`fun` | Function to use for log-likelihood calculation. This parameter is exposed only for testing purposes. |

`main` | This parameter is passed on `levelplot` . |

`...` | Arguments that get passed on to `fun` , use this for `mu` , `sigsq` , `calib` . |

## Details

`logLik`

is an R interface to the likelihood computations in vsn (which are done in C).

## Value

For `logLik`

, a numeric matrix of size `nrow(p)+1`

by `ncol(p)`

.
Its columns correspond to the columns of `p`

.
Its first row are the likelihood values, its rows `2...nrow(p)+1`

contain the gradients.
If `mu`

and `sigsq`

are
specified, the ordinary negative log likelihood is calculated using these
parameters as given. If they are not specified, the profile negative log likelihood
is calculated.

For `plotVsnLogLik`

, a dataframe with the 2D grid coordinates and
log likelihood values.

## Seealso

## Author

Wolfgang Huber

## Examples

```
data("kidney")
v = new("vsnInput", x=exprs(kidney),
pstart=array(as.numeric(NA), dim=c(1, ncol(kidney), 2)))
fit = vsn2(kidney)
print(coef(fit))
p = sapply(seq(-1, 1, length=31), function(f) coef(fit)+c(0,0,f,0))
ll = logLik(v, p)
plot(p[3, ], ll[1, ], type="l", xlab=expression(b[1]), ylab=expression(-log(L)))
abline(v=coef(fit)[3], col="red")
plotVsnLogLik(v, coef(fit), whichp=c(1,3), expand=0.2)
```

# vsn_package()

vsn

## Description

vsn

## Details

The main function of the package is `vsn2`

.
Interesting for its applications are also
`predict`

and the wrapper function `justvsn`

.

`vsn2`

can be applied to objects of class
`ExpressionSet`

,
`NChannelSet`

,
`AffyBatch`

(from the `affy`

package) and
`RGList`

(from the `limma`

package),
`matrix`

and `vector`

. It returns an object of class
list("vsn") , which contains the results of fitting the
`vsn`

model to the data.

The most common use case is that you will want to construct a new data object with the vsn-normalized data whose class is the same as that of the input data and which preserves the metadata. This can be achieved by

list(" ", " fit = vsn2(x, ...) ", " nx = predict(fit, newdata=x) ", " ")

To simplify this, there exists also a simple wrapper
`justvsn`

.

## Author

Wolfgang Huber