bioconductor v3.9.0 Sva
The sva package contains functions for removing batch
Link to this section Summary
Functions
Adjust for batch effects using an empirical Bayes framework
A function for estimating the probability that each gene is an empirical control
A function for quickly calculating f statistic p-values for use in sva
A function for quickly calculating f statistics for use in sva
A function for performing frozen surrogate variable analysis as proposed in Parker, Corrada Bravo and Leek 2013
A function for estimating surrogate variables by estimating empirical control probes
A function for calculating the number of surrogate variables to estimate in a model
A function for estimating surrogate variables with the two step approach of Leek and Storey 2007
A function for computing quality surrogate variables (qSVs)
A function for reading in coverage data from degradation-susceptible regions
A function for estimating surrogate variables using a supervised approach
sva: a package for removing artifacts from microarray and sequencing data
A function to adjust gene expression data before network inference
A function for post-hoc checking of an sva object to check for degenerate cases.
A function for estimating surrogate variables for count based RNA-seq data.
A function for estimating surrogate variables with the two step approach of Leek and Storey 2007
Link to this section Functions
ComBat()
Adjust for batch effects using an empirical Bayes framework
Description
ComBat allows users to adjust for batch effects in datasets where the batch covariate is known, using methodology described in Johnson et al. 2007. It uses either parametric or non-parametric empirical Bayes frameworks for adjusting data for batch effects. Users are returned an expression matrix that has been corrected for batch effects. The input data are assumed to be cleaned and normalized before batch effect removal.
Usage
ComBat(dat, batch, mod = NULL, par.prior = TRUE, prior.plots = FALSE,
mean.only = FALSE, ref.batch = NULL, BPPARAM = bpparam("SerialParam"))
Arguments
Argument | Description |
---|---|
dat | Genomic measure matrix (dimensions probe x sample) - for example, expression matrix |
batch | Batch covariate (only one batch allowed) |
mod | Model matrix for outcome of interest and other covariates besides batch |
par.prior | (Optional) TRUE indicates parametric adjustments will be used, FALSE indicates non-parametric adjustments will be used |
prior.plots | (Optional) TRUE give prior plots with black as a kernel estimate of the empirical batch effect density and red as the parametric |
mean.only | (Optional) FALSE If TRUE ComBat only corrects the mean of the batch effect (no scale adjustment) |
ref.batch | (Optional) NULL If given, will use the selected batch as a reference for batch adjustment. |
BPPARAM | (Optional) BiocParallelParam for parallel operation |
Value
data A probe x sample genomic measure matrix, adjusted for batch effects.
Examples
library(bladderbatch)
data(bladderdata)
dat <- bladderEset[1:50,]
pheno = pData(dat)
edata = exprs(dat)
batch = pheno$batch
mod = model.matrix(~as.factor(cancer), data=pheno)
# parametric adjustment
combat_edata1 = ComBat(dat=edata, batch=batch, mod=NULL, par.prior=TRUE, prior.plots=FALSE)
# non-parametric adjustment, mean-only version
combat_edata2 = ComBat(dat=edata, batch=batch, mod=NULL, par.prior=FALSE, mean.only=TRUE)
# reference-batch version, with covariates
combat_edata3 = ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE, ref.batch=3)
empiricalcontrols()
A function for estimating the probability that each gene is an empirical control
Description
This function uses the iteratively reweighted surrogate variable analysis approach to estimate the probability that each gene is an empirical control.
Usage
empirical.controls(dat, mod, mod0 = NULL, n.sv, B = 5, type = c("norm",
"counts"))
Arguments
Argument | Description |
---|---|
dat | The transformed data matrix with the variables in rows and samples in columns |
mod | The model matrix being used to fit the data |
mod0 | The null model being compared when fitting the data |
n.sv | The number of surogate variables to estimate |
B | The number of iterations of the irwsva algorithm to perform |
type | If type is norm then standard irwsva is applied, if type is counts, then the moderated log transform is applied first |
Value
pcontrol A vector of probabilites that each gene is a control.
Examples
library(bladderbatch)
data(bladderdata)
dat <- bladderEset[1:5000,]
pheno = pData(dat)
edata = exprs(dat)
mod = model.matrix(~as.factor(cancer), data=pheno)
n.sv = num.sv(edata,mod,method="leek")
pcontrol <- empirical.controls(edata,mod,mod0=NULL,n.sv=n.sv,type="norm")
fpvalue()
A function for quickly calculating f statistic p-values for use in sva
Description
This function does simple linear algebra to calculate f-statistics for each row of a data matrix comparing the nested models defined by the design matrices for the alternative (mod) and and null (mod0) cases. The columns of mod0 must be a subset of the columns of mod.
Usage
f.pvalue(dat, mod, mod0)
Arguments
Argument | Description |
---|---|
dat | The transformed data matrix with the variables in rows and samples in columns |
mod | The model matrix being used to fit the data |
mod0 | The null model being compared when fitting the data |
Value
p A vector of F-statistic p-values one for each row of dat.
Examples
library(bladderbatch)
data(bladderdata)
dat <- bladderEset[1:50,]
pheno = pData(dat)
edata = exprs(dat)
mod = model.matrix(~as.factor(cancer), data=pheno)
mod0 = model.matrix(~1,data=pheno)
pValues = f.pvalue(edata,mod,mod0)
qValues = p.adjust(pValues,method="BH")
fstats()
A function for quickly calculating f statistics for use in sva
Description
This function does simple linear algebra to calculate f-statistics for each row of a data matrix comparing the nested models defined by the design matrices for the alternative (mod) and and null (mod0) cases. The columns of mod0 must be a subset of the columns of mod.
Usage
fstats(dat, mod, mod0)
Arguments
Argument | Description |
---|---|
dat | The transformed data matrix with the variables in rows and samples in columns |
mod | The model matrix being used to fit the data |
mod0 | The null model being compared when fitting the data |
Value
fstats A vector of F-statistics one for each row of dat.
Examples
library(bladderbatch)
data(bladderdata)
dat <- bladderEset[1:50,]
pheno = pData(dat)
edata = exprs(dat)
mod = model.matrix(~as.factor(cancer), data=pheno)
mod0 = model.matrix(~1,data=pheno)
fs <- fstats(edata, mod, mod0)
fsva()
A function for performing frozen surrogate variable analysis as proposed in Parker, Corrada Bravo and Leek 2013
Description
This function performs frozen surrogate variable analysis as described in Parker, Corrada Bravo and Leek 2013.
The approach uses a training database to create surrogate variables which are then used
to remove batch effects both from the training database and a new data set for prediction purposes.
For inferential analysis see sva
, svaseq
, with low level functionality available
through irwsva.build
and ssva
.
Usage
fsva(dbdat, mod, sv, newdat = NULL, method = c("fast", "exact"))
Arguments
Argument | Description |
---|---|
dbdat | A m genes by n arrays matrix of expression data from the database/training data |
mod | The model matrix for the terms included in the analysis for the training data |
sv | The surrogate variable object created by running sva on dbdat using mod. |
newdat | (optional) A set of test samples to be adjusted using the training database |
method | If method ="fast" then the SVD is calculated using an online approach, this may introduce slight bias. If method="exact" the exact SVD is calculated, but will be slower |
Value
db An adjusted version of the training database where the effect of batch/expression heterogeneity has been removed
new An adjusted version of the new samples, adjusted one at a time using the fsva methodology.
newsv Surrogate variables for the new samples
Examples
library(bladderbatch)
library(pamr)
data(bladderdata)
dat <- bladderEset[1:50,]
pheno = pData(dat)
edata = exprs(dat)
set.seed(1234)
trainIndicator = sample(1:57,size=30,replace=FALSE)
testIndicator = (1:57)[-trainIndicator]
trainData = edata[,trainIndicator]
testData = edata[,testIndicator]
trainPheno = pheno[trainIndicator,]
testPheno = pheno[testIndicator,]
mydata = list(x=trainData,y=trainPheno$cancer)
mytrain = pamr.train(mydata)
table(pamr.predict(mytrain,testData,threshold=2),testPheno$cancer)
trainMod = model.matrix(~cancer,data=trainPheno)
trainMod0 = model.matrix(~1,data=trainPheno)
trainSv = sva(trainData,trainMod,trainMod0)
fsvaobj = fsva(trainData,trainMod,trainSv,testData)
mydataSv = list(x=fsvaobj$db,y=trainPheno$cancer)
mytrainSv = pamr.train(mydataSv)
table(pamr.predict(mytrainSv,fsvaobj$new,threshold=1),testPheno$cancer)
irwsvabuild()
A function for estimating surrogate variables by estimating empirical control probes
Description
This function is the implementation of the iteratively re-weighted least squares
approach for estimating surrogate variables. As a buy product, this function
produces estimates of the probability of being an empirical control. See the function
empirical.controls
for a direct estimate of the empirical controls.
Usage
irwsva.build(dat, mod, mod0 = NULL, n.sv, B = 5)
Arguments
Argument | Description |
---|---|
dat | The transformed data matrix with the variables in rows and samples in columns |
mod | The model matrix being used to fit the data |
mod0 | The null model being compared when fitting the data |
n.sv | The number of surogate variables to estimate |
B | The number of iterations of the irwsva algorithm to perform |
Value
sv The estimated surrogate variables, one in each column
pprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneity
pprob.b A vector of the posterior probabilities each gene is affected by mod
n.sv The number of significant surrogate variables
Examples
library(bladderbatch)
data(bladderdata)
dat <- bladderEset[1:5000,]
pheno = pData(dat)
edata = exprs(dat)
mod = model.matrix(~as.factor(cancer), data=pheno)
n.sv = num.sv(edata,mod,method="leek")
res <- irwsva.build(edata, mod, mod0 = NULL,n.sv,B=5)
numsv()
A function for calculating the number of surrogate variables to estimate in a model
Description
This function estimates the number of surrogate variables that should be included in a differential expression model. The default approach is based on a permutation procedure originally prooposed by Buja and Eyuboglu 1992. The function also provides an interface to the asymptotic approach proposed by Leek 2011 Biometrics.
Usage
num.sv(dat, mod, method = c("be", "leek"), vfilter = NULL, B = 20,
seed = NULL)
Arguments
Argument | Description |
---|---|
dat | The transformed data matrix with the variables in rows and samples in columns |
mod | The model matrix being used to fit the data |
method | One of "be" or "leek" as described in the details section |
vfilter | You may choose to filter to the vfilter most variable rows before performing the analysis |
B | The number of permutaitons to use if method = "be" |
seed | Set a seed when using the permutation approach |
Value
n.sv The number of surrogate variables to use in the sva software
Examples
library(bladderbatch)
data(bladderdata)
dat <- bladderEset[1:5000,]
pheno = pData(dat)
edata = exprs(dat)
mod = model.matrix(~as.factor(cancer), data=pheno)
n.sv = num.sv(edata,mod,method="leek")
psva()
A function for estimating surrogate variables with the two step approach of Leek and Storey 2007
Description
This function is the implementation of the two step approach for estimating surrogate
variables proposed by Leek and Storey 2007 PLoS Genetics. This function is primarily
included for backwards compatibility. Newer versions of the sva algorithm are available
through sva
, svaseq
, with low level functionality available
through irwsva.build
and ssva
.
Usage
psva(dat, batch, ...)
Arguments
Argument | Description |
---|---|
dat | The transformed data matrix with the variables in rows and samples in columns |
batch | A factor variable giving the known batch levels |
... | Other arguments to the sva function. |
Value
psva.D Data with batch effect removed but biological heterogeneity preserved
Author
Elana J. Fertig
Examples
library(bladderbatch)
library(limma)
data(bladderdata)
dat <- bladderEset[1:50,]
pheno = pData(dat)
edata = exprs(dat)
batch = pheno$batch
batch.fac = as.factor(batch)
psva_data <- psva(edata,batch.fac)
qsva()
A function for computing quality surrogate variables (qSVs)
Description
This function computes quality surrogate variables (qSVs) from the library-size- and read-length-normalized degradation matrix for subsequent RNA quality correction
Usage
qsva(degradationMatrix, mod = matrix(1, ncol = 1, nrow =
ncol(degradationMatrix)))
Arguments
Argument | Description |
---|---|
degradationMatrix | the normalized degradation matrix, region by sample |
mod | (Optional) statistical model used in DE analysis |
Value
the qSV adjustment variables
Examples
## Find files
bwPath <- system.file('extdata', 'bwtool', package = 'sva')
## Read the data
degCovAdj = read.degradation.matrix(
covFiles = list.files(bwPath,full.names=TRUE),
sampleNames = list.files(bwPath), readLength = 76,
totalMapped = rep(100e6,5),type="bwtool")
## Input data
head(degCovAdj)
## Results
qsva(degCovAdj)
readdegradationmatrix()
A function for reading in coverage data from degradation-susceptible regions
Description
This function reads in degradation regions to form a library-size- and read-length-normalized degradation matrix for subsequent RNA quality correction
Usage
read.degradation.matrix(covFiles, sampleNames, totalMapped, readLength = 100,
normFactor = 8e+07, type = c("bwtool", "region_matrix_single",
"region_matrix_all"), BPPARAM = bpparam())
Arguments
Argument | Description |
---|---|
covFiles | coverage file(s) for degradation regions |
sampleNames | sample names; creates column names of degradation matrix |
totalMapped | how many reads per sample (library size normalization) |
readLength | read length in base pairs (read length normalization) |
normFactor | common library size to normalize to; 80M reads as default |
type | whether input are individual bwtool output, region_matrix run on individual samples, or region_matrix run on all samples together |
BPPARAM | (Optional) BiocParallelParam for parallel operation |
Value
the normalized degradation matrix, region by sample
Examples
# bwtool
bwPath = system.file('extdata', 'bwtool', package = 'sva')
degCovAdj = read.degradation.matrix(
covFiles = list.files(bwPath,full.names=TRUE),
sampleNames = list.files(bwPath), readLength = 76,
totalMapped = rep(100e6,5),type="bwtool")
head(degCovAdj)
# region_matrix: each sample
r1Path = system.file('extdata', 'region_matrix_one', package = 'sva')
degCovAdj1 = read.degradation.matrix(
covFiles = list.files(r1Path,full.names=TRUE),
sampleNames = list.files(r1Path), readLength = 76,
totalMapped = rep(100e6,5),type="region_matrix_single")
head(degCovAdj1)
r2Path = system.file('extdata', 'region_matrix_all', package = 'sva')
degCovAdj2 = read.degradation.matrix(
covFiles = list.files(r2Path,full.names=TRUE),
sampleNames = list.files(r1Path), readLength = 76,
totalMapped = rep(100e6,5),type="region_matrix_all")
head(degCovAdj2)
ssva()
A function for estimating surrogate variables using a supervised approach
Description
This function implements a supervised surrogate variable analysis approach
where genes/probes known to be affected by artifacts but not by the biological
variables of interest are assumed to be known in advance. This supervised sva
approach can be called through the sva
and svaseq
functions
by specifying controls.
Usage
ssva(dat, controls, n.sv)
Arguments
Argument | Description |
---|---|
dat | The transformed data matrix with the variables in rows and samples in columns |
controls | A vector of probabilities (between 0 and 1, inclusive) that each gene is a control. A value of 1 means the gene is certainly a control and a value of 0 means the gene is certainly not a control. |
n.sv | The number of surogate variables to estimate |
Value
sv The estimated surrogate variables, one in each column
pprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneity (exactly equal to controls for ssva)
pprob.b A vector of the posterior probabilities each gene is affected by mod (always null for ssva)
n.sv The number of significant surrogate variables
Examples
library(bladderbatch)
data(bladderdata)
dat <- bladderEset[1:5000,]
pheno = pData(dat)
edata = exprs(dat)
mod = model.matrix(~as.factor(cancer), data=pheno)
n.sv = num.sv(edata,mod,method="leek")
set.seed(1234)
controls <- runif(nrow(edata))
ssva_res <- ssva(edata,controls,n.sv)
sva()
sva: a package for removing artifacts from microarray and sequencing data
Description
sva has functionality to estimate and remove artifacts from high dimensional data
the sva
function can be used to estimate artifacts from microarray data
the svaseq
function can be used to estimate artifacts from count-based
RNA-sequencing (and other sequencing) data. The ComBat
function can be
used to remove known batch effecs from microarray data. The fsva
function
can be used to remove batch effects for prediction problems.
This function is the implementation of the iteratively re-weighted least squares
approach for estimating surrogate variables. As a by product, this function
produces estimates of the probability of being an empirical control. See the function
empirical.controls
for a direct estimate of the empirical controls.
Usage
sva(dat, mod, mod0 = NULL, n.sv = NULL, controls = NULL,
method = c("irw", "two-step", "supervised"), vfilter = NULL, B = 5,
numSVmethod = "be")
Arguments
Argument | Description |
---|---|
dat | The transformed data matrix with the variables in rows and samples in columns |
mod | The model matrix being used to fit the data |
mod0 | The null model being compared when fitting the data |
n.sv | The number of surogate variables to estimate |
controls | A vector of probabilities (between 0 and 1, inclusive) that each gene is a control. A value of 1 means the gene is certainly a control and a value of 0 means the gene is certainly not a control. |
method | For empirical estimation of control probes use "irw". If control probes are known use "supervised" |
vfilter | You may choose to filter to the vfilter most variable rows before performing the analysis. vfilter must be NULL if method is "supervised" |
B | The number of iterations of the irwsva algorithm to perform |
numSVmethod | If n.sv is NULL, sva will attempt to estimate the number of needed surrogate variables. This should not be adapted by the user unless they are an expert. |
Details
A vignette is available by typing browseVignettes("sva")
in the R prompt.
Value
sv The estimated surrogate variables, one in each column
pprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneity
pprob.b A vector of the posterior probabilities each gene is affected by mod
n.sv The number of significant surrogate variables
Author
Jeffrey T. Leek, W. Evan Johnson, Hilary S. Parker, Andrew E. Jaffe, John D. Storey, Yuqing Zhang
References
For the package: Leek JT, Johnson WE, Parker HS, Jaffe AE, and Storey JD. (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics DOI:10.1093/bioinformatics/bts034
For sva: Leek JT and Storey JD. (2008) A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences , 105: 18718-18723.
For sva: Leek JT and Storey JD. (2007) Capturing heterogeneity in gene expression studies by `Surrogate Variable Analysis'. PLoS Genetics, 3: e161.
For Combat: Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8 (1), 118-127
For svaseq: Leek JT (2014) svaseq: removing batch and other artifacts from count-based sequencing data. bioRxiv doi: TBD
For fsva: Parker HS, Bravo HC, Leek JT (2013) Removing batch effects for prediction problems with frozen surrogate variable analysis arXiv:1301.3947
For psva: Parker HS, Leek JT, Favorov AV, Considine M, Xia X, Chavan S, Chung CH, Fertig EJ (2014) Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction Bioinformatics doi: 10.1093/bioinformatics/btu375
## Examples
r library(bladderbatch) data(bladderdata) dat <- bladderEset[1:5000,] pheno = pData(dat) edata = exprs(dat) mod = model.matrix(~as.factor(cancer), data=pheno) mod0 = model.matrix(~1,data=pheno) n.sv = num.sv(edata,mod,method="leek") svobj = sva(edata,mod,mod0,n.sv=n.sv)
sva_network()
A function to adjust gene expression data before network inference
Description
This function corrects a gene expression matrix prior to network inference by returning the residuals after regressing out the top principal components. The number of principal components to remove can be determined using a permutation-based approach using the "num.sv" function with method = "be"
Usage
sva_network(dat, n.pc)
Arguments
Argument | Description |
---|---|
dat | The raw gene expression data matrix (with variables in rows and samples in columns) |
n.pc | The number of principal components to remove |
Value
dat.adjusted Cleaned gene expression data matrix with the top prinicpal components removed
Examples
library(bladderbatch)
data(bladderdata)
dat <- bladderEset[1:5000,]
pheno = pData(dat)
edata = exprs(dat)
mod = model.matrix(~as.factor(cancer), data=pheno)
n.pc = num.sv(edata, mod, method="be")
dat.adjusted = sva_network(edata, n.pc)
svacheck()
A function for post-hoc checking of an sva object to check for degenerate cases.
Description
This function is designed to check for degenerate cases in the sva fit and fix the sva object where possible.
Usage
sva.check(svaobj, dat, mod, mod0)
Arguments
Argument | Description |
---|---|
svaobj | The transformed data matrix with the variables in rows and samples in columns |
dat | The data set that was used to build the surrogate variables |
mod | The model matrix being used to fit the data |
mod0 | The null model matrix being used to fit the data |
Details
empirical.controls
for a direct estimate of the empirical controls.
Value
sv The estimated surrogate variables, one in each column
pprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneity
pprob.b A vector of the posterior probabilities each gene is affected by mod
n.sv The number of significant surrogate variables
Examples
library(bladderbatch)
data(bladderdata)
#dat <- bladderEset
dat <- bladderEset[1:5000,]
pheno = pData(dat)
edata = exprs(dat)
mod = model.matrix(~as.factor(cancer), data=pheno)
mod0 = model.matrix(~1,data=pheno)
n.sv = num.sv(edata,mod,method="leek")
svobj = sva(edata,mod,mod0,n.sv=n.sv)
svacheckobj = sva.check(svobj,edata,mod,mod0)
svaseq()
A function for estimating surrogate variables for count based RNA-seq data.
Description
This function is the implementation of the iteratively re-weighted least squares
approach for estimating surrogate variables. As a by product, this function
produces estimates of the probability of being an empirical control. This function first
applies a moderated log transform as described in Leek 2014 before calculating the surrogate
variables. See the function empirical.controls
for a direct estimate of the empirical controls.
Usage
svaseq(dat, mod, mod0 = NULL, n.sv = NULL, controls = NULL,
method = c("irw", "two-step", "supervised"), vfilter = NULL, B = 5,
numSVmethod = "be", constant = 1)
Arguments
Argument | Description |
---|---|
dat | The transformed data matrix with the variables in rows and samples in columns |
mod | The model matrix being used to fit the data |
mod0 | The null model being compared when fitting the data |
n.sv | The number of surogate variables to estimate |
controls | A vector of probabilities (between 0 and 1, inclusive) that each gene is a control. A value of 1 means the gene is certainly a control and a value of 0 means the gene is certainly not a control. |
method | For empirical estimation of control probes use "irw". If control probes are known use "supervised" |
vfilter | You may choose to filter to the vfilter most variable rows before performing the analysis. vfilter must be NULL if method is "supervised" |
B | The number of iterations of the irwsva algorithm to perform |
numSVmethod | If n.sv is NULL, sva will attempt to estimate the number of needed surrogate variables. This should not be adapted by the user unless they are an expert. |
constant | The function takes log(dat + constant) before performing sva. By default constant = 1, all values of dat + constant should be positive. |
Value
sv The estimated surrogate variables, one in each column
pprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneity
pprob.b A vector of the posterior probabilities each gene is affected by mod
n.sv The number of significant surrogate variables
Examples
library(zebrafishRNASeq)
data(zfGenes)
filter = apply(zfGenes, 1, function(x) length(x[x>5])>=2)
filtered = zfGenes[filter,]
genes = rownames(filtered)[grep("^ENS", rownames(filtered))]
controls = grepl("^ERCC", rownames(filtered))
group = as.factor(rep(c("Ctl", "Trt"), each=3))
dat0 = as.matrix(filtered)
mod1 = model.matrix(~group)
mod0 = cbind(mod1[,1])
svseq = svaseq(dat0,mod1,mod0,n.sv=1)$sv
plot(svseq,pch=19,col="blue")
twostepsvabuild()
A function for estimating surrogate variables with the two step approach of Leek and Storey 2007
Description
This function is the implementation of the two step approach for estimating surrogate
variables proposed by Leek and Storey 2007 PLoS Genetics. This function is primarily
included for backwards compatibility. Newer versions of the sva algorithm are available
through sva
, svaseq
, with low level functionality available
through irwsva.build
and ssva
.
Usage
twostepsva.build(dat, mod, n.sv)
Arguments
Argument | Description |
---|---|
dat | The transformed data matrix with the variables in rows and samples in columns |
mod | The model matrix being used to fit the data |
n.sv | The number of surogate variables to estimate |
Value
sv The estimated surrogate variables, one in each column
pprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneity
pprob.b A vector of the posterior probabilities each gene is affected by mod (this is always null for the two-step approach)
n.sv The number of significant surrogate variables
Examples
library(bladderbatch)
library(limma)
data(bladderdata)
dat <- bladderEset
pheno = pData(dat)
edata = exprs(dat)
mod = model.matrix(~as.factor(cancer), data=pheno)
n.sv = num.sv(edata,mod,method="leek")
svatwostep <- twostepsva.build(edata,mod,n.sv)