bioconductor v3.9.0 Qvalue

This package takes a list of p-values resulting from the

Link to this section Summary

Functions

Calculate p-values from a set of observed test statistics and simulated null test statistics

P-values and test-statistics from the Hedenfalk et al. (2001) gene expression dataset

Histogram of p-values

Estimate local False Discovery Rate (FDR)

Proportion of true null p-values

Plotting function for q-value object

Estimate the q-values for a given set of p-values

Display q-value object

Write results to file

Link to this section Functions

Calculate p-values from a set of observed test statistics and simulated null test statistics

Description

Calculates p-values from a set of observed test statistics and simulated null test statistics

Usage

empPvals(stat, stat0, pool = TRUE)

Arguments

ArgumentDescription
statA vector of calculated test statistics.
stat0A vector or matrix of simulated or data-resampled null test statistics.
poolIf FALSE, stat0 must be a matrix with the number of rows equal to the length of stat . Default is TRUE.

Details

The argument stat must be such that the larger the value is the more deviated (i.e., "more extreme") from the null hypothesis it is. Examples include an F-statistic or the absolute value of a t-statistic. The argument stat0 should be calculated analogously on data that represents observations from the null hypothesis distribution. The p-values are calculated as the proportion of values from stat0 that are greater than or equal to that from stat . If pool=TRUE is selected, then all of stat0 is used in calculating the p-value for a given entry of stat . If pool=FALSE , then it is assumed that stat0 is a matrix, where stat0[i,] is used to calculate the p-value for stat[i] . The function empPvals calculates "pooled" p-values faster than using a for-loop.

See page 18 of the Supporting Information in Storey et al. (2005) PNAS ( http://www.pnas.org/content/suppl/2005/08/26/0504609102.DC1/04609SuppAppendix.pdf ) for an explanation as to why calculating p-values from pooled empirical null statistics and then estimating FDR on these p-values is equivalent to directly thresholding the test statistics themselves and utilizing an analogous FDR estimator.

Value

A vector of p-values calculated as described above.

Seealso

qvalue

Author

John D. Storey

References

Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide experiments. Proceedings of the National Academy of Sciences, 100: 9440-9445. list() http://www.pnas.org/content/100/16/9440.full

Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW. (2005) Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences, 102 (36), 12837-12842. list() http://www.pnas.org/content/102/36/12837.full.pdf?with-ds=yes

Examples

# import data
data(hedenfalk)
stat <- hedenfalk$stat
stat0 <- hedenfalk$stat0 #vector from null distribution

# calculate p-values
p.pooled <- empPvals(stat=stat, stat0=stat0)
p.testspecific <- empPvals(stat=stat, stat0=stat0, pool=FALSE)

# compare pooled to test-specific p-values
qqplot(p.pooled, p.testspecific); abline(0,1)

P-values and test-statistics from the Hedenfalk et al. (2001) gene expression dataset

Description

The data from the breast cancer gene expression study of Hedenfalk et al. (2001) were obtained and analyzed. A comparison was made between 3,226 genes of two mutation types, BRCA1 (7 arrays) and BRCA2 (8 arrays). The data included here are p-values, test-statistics, and permutation null test-statistics obtained from a two-sample t-test analysis on a set of 3170 genes, as described in Storey and Tibshirani (2003).

Usage

data(hedenfalk)

Value

A list called hendfalk containing:

*

Seealso

qvalue , empPvals

References

Hedenfalk I et al. (2001). Gene expression profiles in hereditary breast cancer. New England Journal of Medicine, 344: 539-548.

Storey JD and Tibshirani R. (2003). Statistical significance for genome-wide studies. Proceedings of the National Academy of Sciences, 100: 9440-9445. list() http://www.pnas.org/content/100/16/9440.full

Examples

# import data
data(hedenfalk)
stat <- hedenfalk$stat
stat0 <- hedenfalk$stat0 #vector from null distribution

p.pooled <- empPvals(stat=stat, stat0=stat0)
p.testspecific <- empPvals(stat=stat, stat0=stat0, pool=FALSE)

#compare pooled to test-specific p-values
qqplot(p.pooled, p.testspecific); abline(0,1)

# calculate q-values and view results
qobj <- qvalue(p.pooled)
summary(qobj)
hist(qobj)
plot(qobj)

Histogram of p-values

Description

Histogram of p-values

Usage

list(list("hist"), list("qvalue"))(x, ...)

Arguments

ArgumentDescription
xA q-value object.
...Additional arguments, currently unused.

Details

This function allows one to view a histogram of the p-values along with line plots of the q-values and local FDR values versus p-values. The $pi_0$ estimate is also displayed.

Value

Nothing of interest.

Seealso

qvalue , plot.qvalue , summary.qvalue

Author

Andrew J. Bass

References

Storey JD. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B, 64: 479-498. list() http://onlinelibrary.wiley.com/doi/10.1111/1467-9868.00346/abstract

Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide experiments. Proceedings of the National Academy of Sciences, 100: 9440-9445. list() http://www.pnas.org/content/100/16/9440.full

Storey JD. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31: 2013-2035. list() http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aos/1074290335

conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach. Journal of the Royal Statistical Society, Series B, 66: 187-205. list() http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2004.00439.x/abstract

Storey JD. (2011) False discovery rates. In list("International Encyclopedia of Statistical Science") . list() http://genomine.org/papers/Storey_FDR_2011.pdf list() http://www.springer.com/statistics/book/978-3-642-04897-5

Examples

# import data
data(hedenfalk)
p <- hedenfalk$p

# make histogram
qobj <- qvalue(p)
hist(qobj)

Estimate local False Discovery Rate (FDR)

Description

Estimate the local FDR values from p-values.

Usage

lfdr(p, pi0 = NULL, trunc = TRUE, monotone = TRUE, transf = c("probit",
  "logit"), adj = 1.5, eps = 10^-8, ...)

Arguments

ArgumentDescription
pA vector of p-values (only necessary input).
pi0Estimated proportion of true null p-values. If NULL, then pi0est is called.
truncIf TRUE, local FDR values >1 are set to 1. Default is TRUE.
monotoneIf TRUE, local FDR values are non-decreasing with increasing p-values. Default is TRUE; this is recommended.
transfEither a "probit" or "logit" transformation is applied to the p-values so that a local FDR estimate can be formed that does not involve edge effects of the [0,1] interval in which the p-values lie.
adjNumeric value that is applied as a multiple of the smoothing bandwidth used in the density estimation. Default is adj=1.0 .
epsNumeric value that is threshold for the tails of the empirical p-value distribution. Default is 10^-8.
list()Additional arguments, passed to pi0est .

Details

It is assumed that null p-values follow a Uniform(0,1) distribution. The estimated proportion of true null hypotheses $hat{pi}_0$ is either a user-provided value or the value calculated via pi0est . This function works by forming an estimate of the marginal density of the observed p-values, say $hat{f}(p)$ . Then the local FDR is estimated as ${ m lFDR}(p) = hat{pi}_0/hat{f}(p)$ , with adjustments for monotonicity and to guarantee that ${ m lFDR}(p) leq$$ 1$ . See the Storey (2011) reference below for a concise mathematical definition of local FDR.

Value

A vector of estimated local FDR values, with each entry corresponding to the entries of the input p-value vector p .

Seealso

qvalue , pi0est , hist.qvalue

Author

John D. Storey

References

Efron B, Tibshirani R, Storey JD, and Tisher V. (2001) Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96: 1151-1160. list() http://www.tandfonline.com/doi/abs/10.1198/016214501753382129

Storey JD. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31: 2013-2035. list() http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aos/1074290335

Storey JD. (2011) False discovery rates. In list("International Encyclopedia of Statistical Science") . list() http://genomine.org/papers/Storey_FDR_2011.pdf list() http://www.springer.com/statistics/book/978-3-642-04897-5

Examples

# import data
data(hedenfalk)
p <- hedenfalk$p
lfdrVals <- lfdr(p)

# plot local FDR values
qobj = qvalue(p)
hist(qobj)

Proportion of true null p-values

Description

Estimates the proportion of true null p-values, i.e., those following the Uniform(0,1) distribution.

Usage

pi0est(p, lambda = seq(0.05, 0.95, 0.05), pi0.method = c("smoother",
  "bootstrap"), smooth.df = 3, smooth.log.pi0 = FALSE, ...)

Arguments

ArgumentDescription
pA vector of p-values (only necessary input).
lambdaThe value of the tuning parameter to estimate $pi_0$ . Must be in [0,1). Optional, see Storey (2002).
pi0.methodEither "smoother" or "bootstrap"; the method for automatically choosing tuning parameter in the estimation of $pi_0$ , the proportion of true null hypotheses.
smooth.dfNumber of degrees-of-freedom to use when estimating $pi_0$ with a smoother. Optional.
smooth.log.pi0If TRUE and pi0.method = "smoother", $pi_0$ will be estimated by applying a smoother to a scatterplot of $log(pi_0)$ estimates against the tuning parameter $lambda$ . Optional.
list()Arguments passed from qvalue function.

Details

If no options are selected, then the method used to estimate $pi_0$ is the smoother method described in Storey and Tibshirani (2003). The bootstrap method is described in Storey, Taylor & Siegmund (2004). A closed form solution of the bootstrap method is used in the package and is significantly faster.

Value

Returns a list:

*

Seealso

qvalue

Author

John D. Storey

References

Storey JD. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B, 64: 479-498. list() http://onlinelibrary.wiley.com/doi/10.1111/1467-9868.00346/abstract

Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide experiments. Proceedings of the National Academy of Sciences, 100: 9440-9445. list()

Storey JD. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31: 2013-2035. list() http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aos/1074290335

Storey JD, Taylor JE, and Siegmund D. (2004) Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach. Journal of the Royal Statistical Society, Series B, 66: 187-205. list() http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2004.00439.x/abstract

Storey JD. (2011) False discovery rates. In list("International Encyclopedia of Statistical Science") . list() http://genomine.org/papers/Storey_FDR_2011.pdf list() http://www.springer.com/statistics/book/978-3-642-04897-5

Examples

# import data
data(hedenfalk)
p <- hedenfalk$p

# proportion of null p-values
nullRatio <- pi0est(p)
nullRatioS <- pi0est(p, lambda=seq(0.40, 0.95, 0.05), smooth.log.pi0="TRUE")
nullRatioM <- pi0est(p, pi0.method="bootstrap")

# check behavior of estimate over lambda
# also, pi0est arguments can be passed to qvalue
qobj = qvalue(p, lambda=seq(0.05, 0.95, 0.1), smooth.log.pi0="TRUE")
hist(qobj)
plot(qobj)

Plotting function for q-value object

Description

Graphical display of the q-value object

Usage

list(list("plot"), list("qvalue"))(x, rng = c(0, 0.1), ...)

Arguments

ArgumentDescription
xA q-value object.
rngRange of q-values to show. Optional
list()Additional arguments. Currently unused.

Details

The function plot allows one to view several plots:

  • The estimated $pi_0$ versus the tuning parameter $lambda$ .

  • The q-values versus the p-values.

  • The number of significant tests versus each q-value cutoff.

  • The number of expected false positives versus the number of significant tests.

This function makes four plots. The first is a plot of the estimate of $pi_0$ versus its tuning parameter $lambda$ . In most cases, as $lambda$ gets larger, the bias of the estimate decreases, yet the variance increases. Various methods exist for balancing this bias-variance trade-off (Storey 2002, Storey & Tibshirani 2003, Storey, Taylor & Siegmund 2004). Comparing your estimate of $pi_0$ to this plot allows one to guage its quality. The remaining three plots show how many tests are called significant and how many false positives to expect for each q-value cut-off. A thorough discussion of these plots can be found in Storey & Tibshirani (2003).

Value

Nothing of interest.

Seealso

qvalue , write.qvalue , summary.qvalue

Author

John D. Storey, Andrew J. Bass

References

Storey JD. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B, 64: 479-498. list() http://onlinelibrary.wiley.com/doi/10.1111/1467-9868.00346/abstract

Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide experiments. Proceedings of the National Academy of Sciences, 100: 9440-9445. list() http://www.pnas.org/content/100/16/9440.full

Storey JD. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31: 2013-2035. list() http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aos/1074290335

Storey JD, Taylor JE, and Siegmund D. (2004) Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach. Journal of the Royal Statistical Society, Series B, 66: 187-205. list()

Storey JD. (2011) False discovery rates. In list("International Encyclopedia of Statistical Science") . list() http://genomine.org/papers/Storey_FDR_2011.pdf list() http://www.springer.com/statistics/book/978-3-642-04897-5

Examples

# import data
data(hedenfalk)
p <- hedenfalk$p
qobj <- qvalue(p)

plot(qobj, rng=c(0.0, 0.3))

Estimate the q-values for a given set of p-values

Description

Estimate the q-values for a given set of p-values. The q-value of a test measures the proportion of false positives incurred (called the false discovery rate) when that particular test is called significant.

Usage

qvalue(p, fdr.level = NULL, pfdr = FALSE, lfdr.out = TRUE, pi0 = NULL,
  ...)

Arguments

ArgumentDescription
pA vector of p-values (only necessary input).
fdr.levelA level at which to control the FDR. Must be in (0,1]. Optional; if this is selected, a vector of TRUE and FALSE is returned that specifies whether each q-value is less than fdr.level or not.
pfdrAn indicator of whether it is desired to make the estimate more robust for small p-values and a direct finite sample estimate of pFDR -- optional.
lfdr.outIf TRUE then local false discovery rates are returned. Default is TRUE.
pi0It is recommended to not input an estimate of pi0. Experienced users can use their own methodology to estimate the proportion of true nulls or set it equal to 1 for the BH procedure.
list()Additional arguments passed to pi0est and lfdr .

Details

The function pi0est is called internally and calculates the estimate of $pi_0$ , the proportion of true null hypotheses. The function lfdr is also called internally and calculates the estimated local FDR values. Arguments for these functions can be included via ... and will be utilized in the internal calls made in qvalue . See http://genomine.org/papers/Storey_FDR_2011.pdf for a brief introduction to FDRs and q-values.

Value

A list of object type "qvalue" containing:

*

Seealso

pi0est , lfdr , summary.qvalue , plot.qvalue , hist.qvalue , write.qvalue

Author

John D. Storey

References

Storey JD. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B, 64: 479-498. list() http://onlinelibrary.wiley.com/doi/10.1111/1467-9868.00346/abstract Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide experiments. Proceedings of the National Academy of Sciences, 100: 9440-9445. list() http://www.pnas.org/content/100/16/9440.full

Storey JD. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31: 2013-2035. list() http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aos/1074290335

Storey JD, Taylor JE, and Siegmund D. (2004) Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach. Journal of the Royal Statistical Society, Series B, 66: 187-205. list() http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2004.00439.x/abstract

Storey JD. (2011) False discovery rates. In list("International Encyclopedia of Statistical Science") . list() http://genomine.org/papers/Storey_FDR_2011.pdf list() http://www.springer.com/statistics/book/978-3-642-04897-5

Examples

# import data
data(hedenfalk)
p <- hedenfalk$p

# get q-value object
qobj <- qvalue(p)
plot(qobj)
hist(qobj)

# options available
qobj <- qvalue(p, lambda=0.5, pfdr=TRUE)
qobj <- qvalue(p, fdr.level=0.05, pi0.method="bootstrap", adj=1.2)
Link to this function

summaryqvalue()

Display q-value object

Description

Display summary information for a q-value object.

Usage

list(list("summary"), list("qvalue"))(object, cuts = c(1e-04, 0.001, 0.01, 0.025, 0.05,
  0.1, 1), digits = getOption("digits"), ...)

Arguments

ArgumentDescription
objectA q-value object.
cutsVector of significance values to use for table (optional).
digitsSignificant digits to display (optional).
list()Additional arguments; currently unused.

Details

summary shows the original call, estimated proportion of true null hypotheses, and a table comparing the number of significant calls for the p-values, estimated q-values, and estimated local FDR values using a set of cutoffs given by cuts .

Value

Invisibly returns the original object.

Seealso

qvalue , plot.qvalue , write.qvalue

Author

John D. Storey, Andrew J. Bass, Alan Dabney

References

Storey JD. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B, 64: 479-498. list() http://onlinelibrary.wiley.com/doi/10.1111/1467-9868.00346/abstract

Storey JD and Tibshirani R. (2003) Statistical significance for genome-wide experiments. Proceedings of the National Academy of Sciences, 100: 9440-9445. list() http://www.pnas.org/content/100/16/9440.full

Storey JD. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31: 2013-2035. list() http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aos/1074290335

Storey JD, Taylor JE, and Siegmund D. (2004) Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach. Journal of the Royal Statistical Society, Series B, 66: 187-205. list() http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2004.00439.x/abstract

Storey JD. (2011) False discovery rates. In list("International Encyclopedia of Statistical Science") . list() http://genomine.org/papers/Storey_FDR_2011.pdf list() http://www.springer.com/statistics/book/978-3-642-04897-5

Examples

# import data
data(hedenfalk)
p <- hedenfalk$p

# get summary results from q-value object
qobj <- qvalue(p)
summary(qobj, cuts=c(0.01, 0.05))

Write results to file

Description

Write the results of the q-value object to a file.

Usage

write.qvalue(x, file = NULL, sep = " ", eol = "
", na = "NA",
  row.names = FALSE, col.names = TRUE)

Arguments

ArgumentDescription
xA q-value object.
fileOutput filename (optional).
sepSeparation between columns.
eolCharacter to print at the end of each line.
naString to use when there are missing values.
row.nameslogical. Specify whether row names are to be printed.
col.nameslogical. Specify whether column names are to be printed.

Details

The output file includes: (i) p-values, (ii) q-values (iii) local FDR values, and (iv) the estimate of $pi_0$ , one per line. If an FDR significance level was specified in the call to qvalue , the significance level is printed and an indicator of significance is included.

Value

Nothing of interest.

Seealso

qvalue , plot.qvalue , summary.qvalue

Author

John D. Storey, Andrew J. Bass

Examples

# import data
data(hedenfalk)
p <- hedenfalk$p

# write q-value object
qobj <- qvalue(p)
write.qvalue(qobj, file="myresults.txt")