bioconductor v3.9.0 Minfi

Tools to analyze & visualize Illumina Infinium methylation arrays.

Link to this section Summary

Functions

Stubs for internal functions

GenomicMethylSet instances

GenomicRatioSet instances

Class IlluminaMethylationAnnotation

Class "IlluminaMethylationManifest"

MethylSet instances

Class "RGChannelSet"

RatioSet instances

Finds blocks of methylation differences for Illumina methylation arrays

Methods for function bumphunter in Package minfi

A method for combining different types of methylation arrays into a virtual array.

Estimates A/B compartments from Illumina methylation arrays

Plot control probe signals.

A method for converting a type of methylation arrays into a virtual array of another type.

Collapse methylation values of adjacent CpGs into a summary value.

Density bean plots of methylation Beta values.

Density plots of methylation Beta values.

Detection p-values for all probed genomic positions.

Find differentially methylated positions

Cell Proportion Estimation

Fix methylation outliers

Find gap signals in 450k data

Accessing annotation for Illumina methylation objects

Reading Illumina methylation array data from GEO.

Estimate sample-specific quality control (QC) for methylation data

Estimating sample sex based on methylation data

logit in base 2.

Make a GenomicRatioSet from a matrix

Mapping methylation data to the genome

Multi-dimensional scaling plots giving an overview of similarities and differences between samples.

easy one-step QC of methylation object

Defunct functions in package minfi

Deprecated functions in package minfi

Analyze Illumina's methylation arrays

Plot the overall distribution of beta values and the distributions of the Infinium I and II probe types.

Plot methylation values at an single genomic position

Functional normalization for Illumina 450k arrays

Perform preprocessing as Genome Studio.

The Noob/ssNoob preprocessing method for Infinium methylation microarrays.

Stratified quantile normalization for an Illumina methylation array.

Creation of a MethylSet without normalization

Subset-quantile Within Array Normalisation for Illumina Infinium HumanMethylation450 BeadChips

QC report for Illumina Infinium Human Methylation 450k arrays

Converting methylation signals to ratios (Beta or M-values)

Read in Unmethylated and Methylated signals from a GEO raw file.

Read in tab deliminited file in the TCGA format

Parsing IDAT files from Illumina methylation arrays.

Reads an entire metharray experiment using a sample sheet

Reading an Illumina methylation sample sheet

Subset an RGChannelset by CpG loci.

Various utilities

Link to this section Functions

Link to this function

DelayedArray_utils()

Stubs for internal functions

Description

Stubs for internal functions

Link to this function

GenomicMethylSet_class()

GenomicMethylSet instances

Description

This class holds preprocessed data for Illumina methylation microarrays, mapped to a genomic location.

Usage

## Constructor
GenomicMethylSet(gr = GRanges(), Meth = new("matrix"),
                 Unmeth = new("matrix"), annotation = "",
                 preprocessMethod = "", ...)
## Data extraction / Accessors
list(list("getMeth"), list("GenomicMethylSet"))(object)
list(list("getUnmeth"), list("GenomicMethylSet"))(object)
list(list("getBeta"), list("GenomicMethylSet"))(object, type = "", offset = 0, betaThreshold = 0)
list(list("getM"), list("GenomicMethylSet"))(object, type = "", list())
list(list("getCN"), list("GenomicMethylSet"))(object, list())
list(list("pData"), list("GenomicMethylSet"))(object)
list(list("sampleNames"), list("GenomicMethylSet"))(object)
list(list("featureNames"), list("GenomicMethylSet"))(object)
list(list("annotation"), list("GenomicMethylSet"))(object)
list(list("preprocessMethod"), list("GenomicMethylSet"))(object)
list(list("mapToGenome"), list("GenomicMethylSet"))(object, list())

Arguments

ArgumentDescription
objectA GenomicMethylSet .
grA GRanges object.
MethA matrix of methylation values (between zero and infinity) with each row being a methylation loci and each column a sample.
UnmethSee the Meth argument.
annotationAn annotation character string.
preprocessMethodA preprocess method character string.
typeHow are the values calculated? For getBeta setting type="Illumina" sets offset=100 as per Genome Studio. For getM setting type="" computes M-values as the logarithm of Meth / Unmeth , otherwise it is computed as the logit of getBeta(object) .
offsetOffset in the beta ratio, see detail.
betaThresholdConstrains the beta values to be in the inverval betwen betaThreshold and 1- betaThreshold .
list()For the constructor, additional arguments to be passed to SummarizedExperiment ; of particular interest are colData and metadata . For getM these values gets passed onto getBeta . For mapToGenome , this is ignored.

Details

For a detailed discussion of getBeta and getM see the deatils section of MethylSet .

Value

An object of class GenomicMethylSet for the constructor.

Seealso

RangedSummarizedExperiment in the SummarizedExperiment package for the basic class structure. Objects of this class are typically created by using the function mapToGenome on a MethylSet .

Author

Kasper Daniel Hansen khansen@jhsph.edu

Examples

showClass("GenomicMethylSet")
Link to this function

GenomicRatioSet_class()

GenomicRatioSet instances

Description

This class holds preprocessed data for Illumina methylation microarrays, mapped to a genomic location.

Usage

## Constructor
GenomicRatioSet(gr = GRanges(), Beta = NULL, M = NULL,
                CN = NULL, annotation = "",
                preprocessMethod = "", ...)
## Data extraction / Accessors
list(list("getBeta"), list("GenomicRatioSet"))(object)
list(list("getM"), list("GenomicRatioSet"))(object)
list(list("getCN"), list("GenomicRatioSet"))(object)
list(list("pData"), list("GenomicRatioSet"))(object)
list(list("sampleNames"), list("GenomicRatioSet"))(object)
list(list("featureNames"), list("GenomicRatioSet"))(object)
list(list("annotation"), list("GenomicRatioSet"))(object)
list(list("preprocessMethod"), list("GenomicRatioSet"))(object)
list(list("mapToGenome"), list("GenomicRatioSet"))(object, list())

Arguments

ArgumentDescription
objectA GenomicRatioSet .
grA GRanges object.
BetaA matrix of beta values (optional, see details).
MA matrix of M values (optional, see details).
CNA matrix of copy number values.
annotationAn annotation character string.
preprocessMethodA preprocess method character string.
list()For the constructor, additional arguments to be passed to SummarizedExperiment ; of particular interest are colData and metadata . For mapToGenome , this is ignored.

Details

This class holds M or Beta values (or both) together with associated genomic coordinates. It is not possible to get Meth or Unmeth values from this object. The intention is to use this kind of object as an analysis end point.

In case one of M or Beta is missing, the other is computed on the fly. For example, M is computed from Beta as the logit (base 2) of the Beta values.

Value

An object of class GenomicRatioSet for the constructor.

Seealso

RangedSummarizedExperiment in the SummarizedExperiment package for the basic class structure.

Author

Kasper Daniel Hansen khansen@jhsph.edu

Examples

showClass("GenomicRatioSet")
Link to this function

IlluminaMethylationAnnotation_class()

Class IlluminaMethylationAnnotation

Description

This is a class for representing annotation associated with an Illumina methylation microarray. Annotation is transient in the sense that it may change over time, wheres the information stored in the IlluminaMethylationManifest class only depends on the array design.

Usage

## Constructor
IlluminaMethylationAnnotation(objectNames, annotation = "",
                              defaults = "", packageName = "")
## Data extraction
list(list("getManifest"), list("IlluminaMethylationAnnotation"))(object)

Arguments

ArgumentDescription
objectAn object of class IlluminaMethylationAnnotation .
annotationAn annotation character .
defaultsA vector of default choices for getAnnotation(what = "everything") .
objectNamesa character with object names used in the package.
packageNameThe name of the package this object will be contained in.

Value

An object of class IlluminaMethylationAnnotation .

Seealso

IlluminaMethylationManifest

Author

Kasper Daniel Hansen khansen@jhsph.edu .

Link to this function

IlluminaMethylationManifest_class()

Class "IlluminaMethylationManifest"

Description

This is a class for representing an Illumina methylation microarray design, ie. the physical location and the probe sequences. This information should be independent of genome build and annotation.

Usage

## Constructor
IlluminaMethylationManifest(TypeI = new("DataFrame"),
                            TypeII = new("DataFrame"),
                            TypeControl = new("DataFrame"),
                            TypeSnpI = new("DataFrame"),
                            TypeSnpII = new("DataFrame"),
                            annotation = "")
## Data extraction
list(list("getManifest"), list("IlluminaMethylationManifest"))(object)
list(list("getManifest"), list("character"))(object)
getProbeInfo(object, type = c("I", "II", "Control",
                              "I-Green", "I-Red", "SnpI", "SnpII"))
getManifestInfo(object, type = c("nLoci", "locusNames"))
getControlAddress(object, controlType = c("NORM_A", "NORM_C",
                                          "NORM_G", "NORM_T"),
                  asList = FALSE)

Arguments

ArgumentDescription
objectEither an object of class IlluminaMethylationManifest or class character for getManifest . For getProbeInfo , getManifestInfo and getControlAddress an object of either class RGChannelSet , IlluminaMethylationManifest .
TypeIA DataFrame of type I probes.
TypeIIA DataFrame of type II probes.
TypeControlA DataFrame of control probes.
TypeSnpIA DataFrame of SNP type I probes.
TypeSnpIIA DataFrame of SNP type II probes.
annotationAn annotation character .
typeA single character describing what kind of information should be returned. For getProbeInfo it represents the following subtypes of probes on the array: Type I, Type II, Controls as well as Type I (methylation measured in the Green channel) and Type II (methylation measured in the Red channel). For getManifestInfo it represents either the number of methylation loci (approx. number of CpGs) on the array or the locus names.
controlTypeA character vector of control types.
asListIf TRUE the return object is a list with one component for each controlType .

Value

An object of class IlluminaMethylationManifest for the constructor.

Seealso

IlluminaMethylationAnnotation for annotation information for the array (information depending on a specific genome build).

Author

Kasper Daniel Hansen khansen@jhsph.edu .

Examples

if(require(IlluminaHumanMethylation450kmanifest)) {

show(IlluminaHumanMethylation450kmanifest)
head(getProbeInfo(IlluminaHumanMethylation450kmanifest, type = "I"))
head(IlluminaHumanMethylation450kmanifest@data$TypeI)
head(IlluminaHumanMethylation450kmanifest@data$TypeII)
head(IlluminaHumanMethylation450kmanifest@data$TypeControl)

}
Link to this function

MethylSet_class()

MethylSet instances

Description

This class holds preprocessed data for Illumina methylation microarrays.

Usage

## Constructor
MethylSet(Meth = new("matrix"), Unmeth = new("matrix"),
          annotation = "", preprocessMethod = "", ...)
## Data extraction / Accessors
list(list("getMeth"), list("MethylSet"))(object)
list(list("getUnmeth"), list("MethylSet"))(object)
list(list("getBeta"), list("MethylSet"))(object, type = "", offset = 0, betaThreshold = 0)
list(list("getM"), list("MethylSet"))(object, type = "", list())
list(list("getCN"), list("MethylSet"))(object, list())
list(list("getManifest"), list("MethylSet"))(object)
list(list("preprocessMethod"), list("MethylSet"))(object)
list(list("annotation"), list("MethylSet"))(object)
list(list("pData"), list("MethylSet"))(object)
list(list("sampleNames"), list("MethylSet"))(object)
list(list("featureNames"), list("MethylSet"))(object)
## Utilities
dropMethylationLoci(object, dropRS = TRUE, dropCH = TRUE)

Arguments

ArgumentDescription
objectA MethylSet .
MethA matrix of methylation values (between zero and infinity) with each row being a methylation loci and each column a sample.
UnmethSee the Meth argument.
annotationAn annotation string, optional.
preprocessMethodA character , optional.
typeHow are the values calculated? For getBeta setting type="Illumina" sets offset=100 as per Genome Studio. For getM setting type="" computes M-values as the logarithm of Meth / Unmeth , otherwise it is computed as the logit of getBeta(object) .
offsetOffset in the beta ratio, see detail.
betaThresholdConstrains the beta values to be in the inverval betwen betaThreshold and 1- betaThreshold .
dropRSShould SNP probes be dropped?
dropCHShould CH probes be dropped
list()For the constructor, additional arguments to be passed to SummarizedExperiment ; of particular interest are colData , rowData and metadata . For getM these values gets passed onto getBeta .

Details

This class inherits from eSet . Essentially the class is a representation of a Meth matrix and a Unmeth matrix linked to a pData data frame.

In addition, an annotation and a preprocessMethod slot is present. The annotation slot describes the type of array and also which annotation package to use. The preprocessMethod slot describes the kind of preprocessing that resulted in this dataset.

A MethylSet stores meth and Unmeth . From these it is easy to compute Beta values, defined as

$$eta = rac{ extrm{Meth}}{ extrm{Meth} + extrm{Unmeth} + extrm{offset}}$$

The offset is chosen to avoid dividing with small values. Illumina uses a default of 100. M-values (an unfortunate bad name) are defined as

This formula has problems if either Meth or Unmeth is zero. For this reason, we can use betaThreshold to make sure Beta is neither 0 nor 1, before taken the logit. What makes sense for the offset and betaThreshold depends crucially on how the data was preprocessed. Do not expect the default values to be particular good.

Value

An object of class MethylSet for the constructor.

Seealso

eSet for the basic class structure. Objects of this class are typically created from an RGChannelSet using preprocessRaw or another preprocessing function.

Author

Kasper Daniel Hansen khansen@jhsph.edu

Examples

showClass("MethylSet")
Link to this function

RGChannelSet_class()

Class "RGChannelSet"

Description

These classes represents raw (unprocessed) data from a two color micro array; specifically an Illumina methylation array.

Usage

## Constructors
RGChannelSet(Green = new("matrix"), Red = new("matrix"),
             annotation = "", ...)
RGChannelSetExtended(Green = new("matrix"), Red = new("matrix"),
                    GreenSD = new("matrix"), RedSD = new("matrix"),
                    NBeads = new("matrix"), annotation = "", ...)
## Accessors
list(list("annotation"), list("RGChannelSet"))(object)
list(list("pData"), list("RGChannelSet"))(object)
list(list("sampleNames"), list("RGChannelSet"))(object)
list(list("featureNames"), list("RGChannelSet"))(object)
list(list("getBeta"), list("RGChannelSet"))(object, list())
getGreen(object)
getRed(object)
getNBeads(object)
list(list("getManifest"), list("RGChannelSet"))(object)
## Convenience functions
getOOB(object)
getSnpBeta(object)

Arguments

ArgumentDescription
objectAn RGChannelSet (or RGChannelSetExtended ).
GreenA matrix of Green channel values (between zero and infinity) with each row being a methylation loci and each column a sample.
RedSee the Green argument, but for the Green channel.
GreenSDSee the Green argument, but for standard deviations of the Green channel summaries.
RedSDSee the Green , but for standard deviations of the Red channel summaries.
NBeadsSee the Green argument, but contains the number of beads used to summarize the Green and Red channels.
annotationAn annotation string, optional.
list()For the constructor(s), additional arguments to be passed to SummarizedExperiment ; of particular interest are colData , rowData and metadata . For getBeta these values gets passed onto getBeta .

Value

An object of class RGChannelSet or RGChannelSetExtended for the constructors.

Seealso

See SummarizedExperiment for the basic class that is used as a building block for "RGChannelSet(Extended)" . See IlluminaMethylationManifest for a class representing the design of the array.

Author

Kasper Daniel Hansen khansen@jhsph.edu

Examples

showClass("RGChannelSet")
Link to this function

RatioSet_class()

RatioSet instances

Description

This class holds preprocessed data for Illumina methylation microarrays.

Usage

## Constructor
RatioSet(Beta = NULL, M = NULL, CN = NULL,
        annotation = "", preprocessMethod = "", ...)
## Data extraction / Accessors
list(list("getBeta"), list("RatioSet"))(object)
list(list("getM"), list("RatioSet"))(object)
list(list("getCN"), list("RatioSet"))(object)
list(list("preprocessMethod"), list("RatioSet"))(object)
list(list("annotation"), list("RatioSet"))(object)
list(list("pData"), list("RatioSet"))(object)
list(list("sampleNames"), list("RatioSet"))(object)
list(list("featureNames"), list("RatioSet"))(object)

Arguments

ArgumentDescription
objectA RatioSet .
BetaA matrix of beta values (between zero and one) with each row being a methylation loci and each column a sample.
MA matrix of log-ratios (between minus infinity and infinity) with each row being a methylation loci and each column a sample.
CNAn optional matrix of copy number estimates with each row being a methylation loci and each column a sample.
annotationAn annotation string, optional.
preprocessMethodA character , optional.
list()For the constructor, additional arguments to be passed to SummarizedExperiment ; of particular interest are colData , rowData and metadata . For getM these values gets passed onto getBeta .

Details

This class inherits from eSet . Essentially the class is a representation of a Beta matrix and/or a M matrix and optionally a CN (copy number) matrix linked to a pData data frame.

In addition, an annotation and a preprocessMethod slot is present. The annotation slot describes the type of array and also which annotation package to use. The preprocessMethod slot describes the kind of preprocessing that resulted in this dataset.

For a RatioSet , M-values are defined as logit2 of the Beta-values if the M-values are not present in the object. Similarly, if only M-values are present in the object, Beta-values are ilogit2 of the M-values.

Value

An object of class RatioSet for the constructor.

Seealso

eSet for the basic class structure. Objects of this class are typically created from an MethylSet using ratioConvert .

Author

Kasper Daniel Hansen khansen@jhsph.edu

Examples

showClass("RatioSet")

Finds blocks of methylation differences for Illumina methylation arrays

Description

Finds blocks (large scale regions) of methylation differences for Illumina methylation arrays

Usage

blockFinder(object, design, coef = 2, what = c("Beta", "M"),
                        cluster = NULL, cutoff = NULL,
                        pickCutoff = FALSE, pickCutoffQ = 0.99,
                        nullMethod = c("permutation","bootstrap"),
                        smooth = TRUE, smoothFunction = locfitByCluster,
                        B = ncol(permutations), permutations = NULL,
                        verbose = TRUE, bpSpan = 2.5*10^5,list())

Arguments

ArgumentDescription
objectAn object of class GenomicRatioSet.
designDesign matrix with rows representing samples and columns representing covariates. Regression is applied to each row of mat.
coefAn integer denoting the column of the design matrix containing the covariate of interest. The hunt for bumps will be only be done for the estimate of this coefficient.
whatShould blockfinding be performed on M-values or Beta values?
clusterThe clusters of locations that are to be analyzed together. In the case of microarrays, the clusters are many times supplied by the manufacturer. If not available the function clusterMaker can be used to cluster nearby locations.
cutoffA numeric value. Values of the estimate of the genomic profile above the cutoff or below the negative of the cutoff will be used as candidate regions. It is possible to give two separate values (upper and lower bounds). If one value is given, the lower bound is minus the value.
pickCutoffShould a cutoff be picked automatically?
pickCutoffQThe quantile used for picking the cutoff using the permutation distribution.
nullMethodMethod used to generate null candidate regions, must be one of bootstrap or permutation (defaults to permutation ). However, if covariates in addition to the outcome of interest are included in the design matrix (ncol(design)>2), the permutation approach is not recommended. See vignette and original paper for more information.
smoothA logical value. If TRUE the estimated profile will be smoothed with the smoother defined by smoothFunction
smoothFunctionA function to be used for smoothing the estimate of the genomic profile. Two functions are provided by the package: loessByCluster and runmedByCluster .
BAn integer denoting the number of resamples to use when computing null distributions. This defaults to 0. If permutations is supplied that defines the number of permutations/bootstraps and B is ignored.
permutationsis a matrix with columns providing indexes to be used to scramble the data and create a null distribution. If this matrix is not supplied and B >0 then these indexes created using the function sample .
verboseShould the function be verbose?
bpSpanSmoothing span. Note that this defaults to a large value becuase we are searching for large scale changes.
list()further arguments sent to bumphunterEngine .

Details

The approximately 170,000 open sea probes on the 450k can be used to detect long-range changes in methylation status. These large scale changes that can range up to several Mb have typically been identified only through whole-genome bisulfite sequencing. blockFinder groups the averaged methylation values in open-sea probe clusters (See cpgCollapse ) into large regions in which the bumphunter procedure is applied with a large (250KB+) smoothing window.

Note that estimating the precise boundaries of these blocks are constrained by the resolution of the array.

Value

FIXME

Seealso

cpgCollapse , and bumphunter

Methods for function bumphunter in Package minfi

Description

Estimate regions for which a genomic profile deviates from its baseline value. Originally implemented to detect differentially methylated genomic regions between two populations, but can be applied to any CpG-level coefficient of interest.

Usage

list(list("bumphunter"), list("GenomicRatioSet"))(object, design, cluster=NULL,
          coef=2,  cutoff=NULL, pickCutoff=FALSE, pickCutoffQ=0.99,
          maxGap=500,  nullMethod=c("permutation","bootstrap"),
          smooth=FALSE, smoothFunction=locfitByCluster,
          useWeights=FALSE,   B=ncol(permutations), permutations=NULL,
          verbose=TRUE, type = c("Beta","M"), list())

Arguments

ArgumentDescription
objectAn object of class GenomicRatioSet.
designDesign matrix with rows representing samples and columns representing covariates. Regression is applied to each row of mat.
clusterThe clusters of locations that are to be analyzed together. In the case of microarrays, the clusters are many times supplied by the manufacturer. If not available the function clusterMaker can be used to cluster nearby locations.
coefAn integer denoting the column of the design matrix containing the covariate of interest. The hunt for bumps will be only be done for the estimate of this coefficient.
cutoffA numeric value. Values of the estimate of the genomic profile above the cutoff or below the negative of the cutoff will be used as candidate regions. It is possible to give two separate values (upper and lower bounds). If one value is given, the lower bound is minus the value.
pickCutoffShould bumphunter attempt to pick a cutoff using the permutation distribution?
pickCutoffQThe quantile used for picking the cutoff using the permutation distribution.
maxGapIf cluster is not provided this maximum location gap will be used to define cluster via the clusterMaker function.

|nullMethod | Method used to generate null candidate regions, must be one of list("boots ", " trap") or list("permutation") (defaults to list("permutation") ). However, if covariates in addition to the outcome of interest are included in the design matrix (ncol(design)>2), the list("permutation") approach is not recommended. See vignette and original paper for more information.| |smooth | A logical value. If TRUE the estimated profile will be smoothed with the smoother defined by smoothFunction| |smoothFunction | A function to be used for smoothing the estimate of the genomic profile. Two functions are provided by the package: loessByCluster and runmedByCluster .| |useWeights | A logical value. If TRUE then the standard errors of the point-wise estimates of the profile function will be used as weights in the loess smoother loessByCluster . If the runmedByCluster smoother is used this argument is ignored.| |B | An integer denoting the number of resamples to use when computing null distributions. This defaults to 0. If permutations is supplied that defines the number of permutations/bootstraps and B is ignored.| |permutations | is a matrix with columns providing indexes to be used to scramble the data and create a null distribution when nullMethod is set to permutations. If the bootstrap approach is used this argument is ignored. If this matrix is not supplied and B >0 then these indexes are created using the function sample .| |verbose | logical value. If TRUE , it writes out some messages indicating progress. If FALSE nothing should be printed. | |type | Should bumphunting be performed on M-values ("M") or Beta values ("Beta")?| |list() | further arguments to be passed to the smoother functions.|

Details

See help file for bumphunter method in the bumphunter package for for details.

Value

An object of class bumps with the following components:

*

Seealso

bumphunter

Author

Rafael A. Irizarry, Martin J. Aryee and Kasper D. Hansen

References

AE Jaffe, P Murakami, H Lee, JT Leek, MD Fallin, AP Feinberg, and RA Irizarry. list("Bump hunting to identify differentially methylated regions in ", " epigenetic epidemiology studies.") International Journal of Epidemiology (2012) 41(1):200-209. doi: 10.1093/ije/dyr238

Examples

if(require(minfiData)) {
gmSet <- preprocessQuantile(MsetEx)
design <- model.matrix(~ gmSet$status)
bumps <- bumphunter(gmSet, design = design, B = 0,
type = "Beta", cutoff = 0.25)
}
Link to this function

combineArrays()

A method for combining different types of methylation arrays into a virtual array.

Description

A method for combining different types of methylation arrays into a virtual array. The three generations of Illumina methylation arrays are supported: the 27k, the 450k and the EPIC arrays. Specifically, the 450k array and the EPIC array share many probes in common. This function combines data from the two different array types and outputs a data object of the user-specified type. Essentially, this new object will be like (for example) an EPIC array with many probes missing.

Usage

list(list("combineArrays"), list("RGChannelSet,RGChannelSet"))(object1, object2,
                  outType = c("IlluminaHumanMethylation450k",
                              "IlluminaHumanMethylationEPIC"),
                  verbose = TRUE)
list(list("combineArrays"), list("MethylSet,MethylSet"))(object1, object2,
                  outType = c("IlluminaHumanMethylation450k",
                              "IlluminaHumanMethylationEPIC",
                              "IlluminaHumanMethylation27k"),
                  verbose = TRUE)
list(list("combineArrays"), list("RatioSet,RatioSet"))(object1, object2,
                  outType = c("IlluminaHumanMethylation450k",
                              "IlluminaHumanMethylationEPIC",
                              "IlluminaHumanMethylation27k"),
                  verbose = TRUE)
list(list("combineArrays"), list("GenomicMethylSet,GenomicMethylSet"))(object1, object2,
                  outType = c("IlluminaHumanMethylation450k",
                              "IlluminaHumanMethylationEPIC",
                              "IlluminaHumanMethylation27k"),
                  verbose = TRUE)
list(list("combineArrays"), list("GenomicRatioSet,GenomicRatioSet"))(object1, object2,
                  outType = c("IlluminaHumanMethylation450k",
                              "IlluminaHumanMethylationEPIC",
                              "IlluminaHumanMethylation27k"),
                  verbose = TRUE)

Arguments

ArgumentDescription
object1The first object.
object2The second object.
outTypeThe array type of the output.
verboseShould the function be verbose?

Details

FIXME: describe the RCChannelSet combination.

Value

The output object has the same class as the two input objects, that is either an RGChannelSet , a MethylSet , a RatioSet , a GemomicMethylSet or a GenomicRatioSet , with the type of the array given by the outType argument.

Author

Jean-Philippe Fortin and Kasper D. Hansen.

Examples

if(require(minfiData) && require(minfiDataEPIC)) {
data(RGsetEx.sub)
data(RGsetEPIC)
rgSet <- combineArrays(RGsetEPIC, RGsetEx.sub)
rgSet
}

Estimates A/B compartments from Illumina methylation arrays

Description

Estimates A/B compartments as revealed by Hi-C by computing the first eigenvector on a binned probe correlation matrix.

Usage

compartments(object, resolution=100*1000, what = "OpenSea", chr="chr22",
                  method = c("pearson", "spearman"), keep=TRUE)

Arguments

ArgumentDescription
objectAn object of class (Genomic)MethylSet or (Genomic)RatioSet
resolutionAn integer specifying the binning resolution
whatWhich subset of probes should be used?
chrThe chromosome to be analyzed.
methodMethod of correlation.
keepShould the correlation matrix be stored or not?

Details

This function extracts A/B compartments from Illumina methylation microarrays. Analysis of Hi-C data has shown that the genome can be divided into two compartments (A/B compartments) that are cell-type specific and are associated with open and closed chromatin respectively. The approximately 170,000 open sea probes on the 450k array can be used to estimate these compartments by computing the first eigenvector on a binned correlation matrix. The binning resolution can be specified by resolution , and by default is set to a 100 kb. We do not recommend higher resolutions because of the low-resolution probe design of the 450k array.

Value

an object of class GRanges containing the correlation matrix, the compartment eigenvector and the compartment labels (A or B) as metadata.

Author

Jean-Philippe Fortin jfortin@jhsph.edu , Kasper D. Hansen kasperdanielhansen@gmail.com

References

JP Fortin and KD Hansen. Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data . bioRxiv (2015). doi: 10.1101/019000 .

Examples

if (require(minfiData)) {
GMset <- mapToGenome(MsetEx)
## compartments at 1MB resolution; we recommend 100kb.
comps <- compartments(GMset, res = 10^6)
}
Link to this function

controlStripPlot()

Plot control probe signals.

Description

Strip plots are produced for each control probe type specified.

Usage

controlStripPlot(rgSet, controls = c("BISULFITE CONVERSION I",
    "BISULFITE CONVERSION II"), sampNames = NULL, xlim = c(5, 17))

Arguments

ArgumentDescription
rgSetAn RGChannelSet .
controlsA vector of control probe types to plot.
sampNamesSample names to be used for labels.
xlimx-axis limits.

Details

This function produces the control probe signal plot component of the QC report.

Value

No return value. Plots are produced as a side-effect.

Seealso

qcReport , mdsPlot , densityPlot , densityBeanPlot

Author

Martin Aryee aryee@jhu.edu .

Examples

if (require(minfiData)) {

names <- pData(RGsetEx)$Sample_Name
controlStripPlot(RGsetEx, controls=c("BISULFITE CONVERSION I"), sampNames=names)

}

A method for converting a type of methylation arrays into a virtual array of another type.

Description

A method for converting a type of methylation array into a array of another type. The three generations of Illumina methylation arrays are supported: the 27k, the 450k and the EPIC arrays. Specifically, the 450k array and the EPIC array share many probes in common. For RGChannelSet , this function will convert an EPIC array into a 450k array (or vice-versa) by dropping probes that differ between the two arrays. Because most of the probes on the 27k array have a different chemistry than the 450k and EPIC probes, converting an 27k RGChannelSet into another array is not supported. Each array can be converted into another array at the CpG site level, that is any MethylSet and RatioSet (or GenomicMethylSet and GenomicRatioSet ) can be converted to a 27k, 450k or EPIC array. The output array is specified by the outType argument.

Usage

list(list("convertArray"), list("RGChannelSet"))(object,
                  outType = c("IlluminaHumanMethylation450k",
                              "IlluminaHumanMethylationEPIC"),
                  verbose = TRUE)
list(list("convertArray"), list("MethylSet"))(object,
                  outType = c("IlluminaHumanMethylation450k",
                              "IlluminaHumanMethylationEPIC",
                              "IlluminaHumanMethylation27k"),
                  verbose = TRUE)
list(list("convertArray"), list("RatioSet"))(object,
                  outType = c("IlluminaHumanMethylation450k",
                              "IlluminaHumanMethylationEPIC",
                              "IlluminaHumanMethylation27k"),
                  verbose = TRUE)
list(list("convertArray"), list("GenomicMethylSet"))(object,
                  outType = c("IlluminaHumanMethylation450k",
                              "IlluminaHumanMethylationEPIC",
                              "IlluminaHumanMethylation27k"),
                  verbose = TRUE)
list(list("convertArray"), list("GenomicRatioSet"))(object,
                  outType = c("IlluminaHumanMethylation450k",
                              "IlluminaHumanMethylationEPIC",
                              "IlluminaHumanMethylation27k"),
                  verbose = TRUE)

Arguments

ArgumentDescription
objectThe input object.
outTypeThe array type of the output.
verboseShould the function be verbose?

Details

FIXME: describe the RGChannelSet conversion.

Value

The output object has the same class as the input object, that is either an RGChannelSet , a MethylSet , a RatioSet , a GemomicMethylSet or a GenomicRatioSet , with the type of the array given by the outType argument.

Author

Jean-Philippe Fortin and Kasper D. Hansen.

Examples

if(require(minfiData)) {
data(RGsetEx.sub)
rgSet <- convertArray(RGsetEx.sub, outType = "IlluminaHumanMethylationEPIC")
rgSet
}

Collapse methylation values of adjacent CpGs into a summary value.

Description

This function groups adjacent loci into clusters with a specified maximum gap between CpGs in the cluster, and a specified maximum cluster width. The loci within each cluster are summarized resulting in a single methylation estimate per cluster.

Usage

cpgCollapse(object, what = c("Beta", "M"), maxGap = 500,
            blockMaxGap = 2.5 * 10^5, maxClusterWidth = 1500,
            dataSummary = colMeans, na.rm = FALSE,
            returnBlockInfo = TRUE, islandAnno = NULL, verbose = TRUE,
            list())

Arguments

ArgumentDescription
objectAn object of class [Genomic]MethylSet or [Genomic]RatioSet .
whatShould operation be performed on the M-scale or Beta-scale?
maxGapMaximum gap between CpGs in a cluster
blockMaxGapMaximum block gap
maxClusterWidthMaximum cluster width
dataSummaryFunction used to summarize methylation across CpGs in the cluster.
na.rmShould NAs be removed when summarizing? Passed on to the dataSummary function.
returnBlockInfoShould the block annotation table be returned in addition to the block table?
islandAnnoWhich Island annotation should be used. NULL indicates the default. This argument is only useful if the annotatio object contains more than one island annotation.
verboseShould the function be verbose?
list()Passed on to getMethSignal and getCN. Can be used to specify

Details

This function is used as the first step of block-finding. It groups adjacent loci into clusters with a default maximum gap of 500bp and a maximum cluster width of 1,500bp. The loci within each cluster are then summarized (using the mean by default) resulting in a single methylation estimate per cluster. Cluster estimates from open-sea probes are used in block-finding.

Value

If returnBlockInfo is FALSE : a GenomicRatioSet of collapsed CpG clusters.

If returnBlockInfo is TRUE :

*

Seealso

blockFinder

Author

Rafael Irizarry

Link to this function

densityBeanPlot()

Density bean plots of methylation Beta values.

Description

Density bean plots of methylation Beta values, primarily for QC.

Usage

densityBeanPlot(dat, sampGroups = NULL, sampNames = NULL, main = NULL,
    pal = brewer.pal(8, "Dark2"), numPositions = 10000)

Arguments

ArgumentDescription
datAn RGChannelSet , a MethylSet or a matrix . We either use the getBeta function to get Beta values (for the first two) or we assume the matrix contains Beta values.
sampGroupsOptional sample group labels. See details.
sampNamesOptional sample names. See details.
mainPlot title.
palColor palette.
numPositionsThe density calculation uses numPositions randomly selected CpG positions. If NULL use all positions.

Details

This function produces the density bean plot component of the QC report. If sampGroups is specified, group-specific colors will be used. For speed reasons the plots are produced using a random subset of CpG positions. The number of positions used is specified by the numPositions option.

Value

No return value. Plots are produced as a side-effect.

Seealso

qcReport , mdsPlot , controlStripPlot , densityPlot

Author

Martin Aryee aryee@jhu.edu .

References

P Kampstra. list("Beanplot: A boxplot alternative for visual comparison of ", " distributions.") Journal of Statistical Software 28, (2008). http://www.jstatsoft.org/v28/c01

Examples

if (require(minfiData)) {

names <- pData(RGsetEx)$Sample_Name
groups <- pData(RGsetEx)$Sample_Group
par(mar=c(5,6,4,2))
densityBeanPlot(RGsetEx, sampNames=names, sampGroups=groups)

}

Density plots of methylation Beta values.

Description

Density plots of methylation Beta values, primarily for QC.

Usage

densityPlot(dat, sampGroups = NULL, main = "", xlab = "Beta",
    pal = brewer.pal(8, "Dark2"), xlim, ylim, add = TRUE, legend = TRUE,
    list())

Arguments

ArgumentDescription
datAn RGChannelSet , a MethylSet or a matrix . We either use the getBeta function to get Beta values (for the first two) or we assume the matrix contains Beta values.
sampGroupsOptional sample group labels. See details.
mainPlot title.
xlabx-axis label.
palColor palette.
xlimx-axis limits.
ylimy-axis limits.
addStart a new plot?
legendPlot legend.
list()Additional options to be passed to the plot command.

Details

This function produces the density plot component of the QC report. If sampGroups is specified, group-specific colors will be used.

Value

No return value. Plots are produced as a side-effect.

Seealso

qcReport , mdsPlot , controlStripPlot , densityBeanPlot

Author

Martin Aryee aryee@jhu.edu .

Examples

if (require(minfiData)) {

groups <- pData(RGsetEx)$Sample_Group
densityPlot(RGsetEx, sampGroups=groups)

}

Detection p-values for all probed genomic positions.

Description

This function identifies failed positions defined as both the methylated and unmethylated channel reporting background signal levels.

Usage

detectionP(rgSet, type = "m+u")

Arguments

ArgumentDescription
rgSetAn RGChannelSet .
typeHow to calculate p-values. Only m+u is currently implemented (See details).

Details

A detection p-value is returned for every genomic position in every sample. Small p-values indicate a good position. Positions with non-significant p-values (typically >0.01) should not be trusted.

The m+u method compares the total DNA signal (Methylated + Unmethylated) for each position to the background signal level. The background is estimated using negative control positions, assuming a normal distribution. Calculations are performed on the original (non-log) scale.

This function is different from the detection routine in Genome Studio.

Value

A matrix with detection p-values.

Author

Martin Aryee aryee@jhu.edu .

Examples

if (require(minfiData)) {
detP <- detectionP(RGsetEx.sub)
failed <- detP>0.01
colMeans(failed) # Fraction of failed positions per sample
sum(rowMeans(failed)>0.5) # How many positions failed in >50% of samples?
}

Find differentially methylated positions

Description

Identify CpGs where methylation is associated with a continuous or categorical phenotype.

Usage

dmpFinder(dat, pheno, type = c("categorical", "continuous"),
    qCutoff = 1, shrinkVar = FALSE)

Arguments

ArgumentDescription
datA MethylSet or a matrix .
phenoThe phenotype to be tested for association with methylation.
typeIs the phenotype ' continuous or categorical ?
qCutoffDMPs with an FDR q-value greater than this will not be returned.
shrinkVarShould variance shrinkage be used? See details.

Details

This function tests each genomic position for association between methylation and a phenotype. Continuous phenotypes are tested with linear regression, while an F-test is used for categorical phenotypes.

Variance shrinkage ( shrinkVar=TRUE ) is recommended when sample sizes are small (<10). The sample variances are squeezed by computing empirical Bayes posterior means using the limma package.

Value

A table with one row per CpG.

Seealso

squeezeVar and the limma package in general.

Author

Martin Aryee aryee@jhu.edu .

Examples

if (require(minfiData)) {

grp <- pData(MsetEx)$Sample_Group
MsetExSmall <- MsetEx[1:1e4,] # To speed up the example
M <- getM(MsetExSmall, type = "beta", betaThreshold = 0.001)
dmp <- dmpFinder(M, pheno=grp, type="categorical")
sum(dmp$qval < 0.05, na.rm=TRUE)
head(dmp)

}
Link to this function

estimateCellCounts()

Cell Proportion Estimation

Description

Estimates the relative proportion of pure cell types within a sample. For example, given peripheral blood samples, this function will return the relative proportions of lymphocytes, monocytes, B-cells, and neutrophils.

Usage

estimateCellCounts(rgSet, compositeCellType = "Blood",
                   processMethod = "auto", probeSelect = "auto",
                   cellTypes = c("CD8T","CD4T", "NK","Bcell","Mono","Gran"),
                   referencePlatform = c("IlluminaHumanMethylation450k",
                                         "IlluminaHumanMethylationEPIC",
                                         "IlluminaHumanMethylation27k"),
                   returnAll = FALSE, meanPlot = FALSE, verbose = TRUE, list())

Arguments

ArgumentDescription
rgSetThe input RGChannelSet for the procedure.
compositeCellTypeWhich composite cell type is being deconvoluted. Should be one of "Blood", "CordBlood", or "DLPFC". See details.
processMethodHow should the user and reference data be processed together? Default input "auto" will use preprocessQuantile for Blood and DLPFC and preprocessNoob otherwise, in line with the existing literature. Set it to the name of a preprocessing function as a character if you want to override it, like "preprocessFunnorm" .
probeSelectHow should probes be selected to distinguish cell types? Options include "both", which selects an equal number (50) of probes (with F-stat p-value < 1E-8) with the greatest magnitude of effect from the hyper- and hypo-methylated sides, and "any", which selects the 100 probes (with F-stat p-value < 1E-8) with the greatest magnitude of difference regardless of direction of effect. Default input "auto" will use "any" for cord blood and "both" otherwise, in line with previous versions of this function and/or our recommendations. Please see the references for more details.
cellTypesWhich cell types, from the reference object, should be we use for the deconvolution? See details.
referencePlatformThe platform for the reference dataset; if the input rgSet belongs to another platform, it will be converted using convertArray .
returnAllShould the composition table and the normalized user supplied data be return?
verboseShould the function be verbose?
meanPlotWhether to plots the average DNA methylation across the cell-type discrimating probes within the mixed and sorted samples.
list()Passed to preprocessQuantile .

Details

This is an implementaion of the Houseman et al (2012) regression calibration approachalgorithm to the Illumina 450k microarray for deconvoluting heterogeneous tissue sources like blood. For example, this function will take an RGChannelSet from a DNA methylation (DNAm) study of blood, and return the relative proportions of CD4+ and CD8+ T-cells, natural killer cells, monocytes, granulocytes, and b-cells in each sample.

The function currently supports cell composition estimation for blood, cord blood, and the frontal cortex, through compositeCellType values of "Blood", "CordBlood", and "DLPFC", respectively. Packages containing the appropriate reference data should be installed before running the function for the first time ("FlowSorted.Blood.450k", "FlowSorted.DLPFC.450k", "FlowSorted.CordBlood.450k"). Each tissue supports the estimation of different cell types, delimited via the cellTypes argument. For blood, these are "Bcell", "CD4T", "CD8T", "Eos", "Gran", "Mono", "Neu", and "NK" (though the default value for cellTypes is often sufficient). For cord blood, these are "Bcell", "CD4T", "CD8T", "Gran", "Mono", "Neu", and "nRBC". For frontal cortex, these are "NeuN_neg" and "NeuN_pos". See documentation of individual reference packages for more details.

The meanPlot should be used to check for large batch effects in the data, reducing the confidence placed in the composition estimates. This plot depicts the average DNA methylation across the cell-type discrimating probes in both the provided and sorted data. The means from the provided heterogeneous samples should be within the range of the sorted samples. If the sample means fall outside the range of the sorted means, the cell type estimates will inflated to the closest cell type. Note that we quantile normalize the sorted data with the provided data to reduce these batch effects.

Value

Matrix of composition estimates across all samples and cell types.

If returnAll=TRUE a list of a count matrix (see previous paragraph), a composition table and the normalized user data in form of a GenomicMethylSet.

Seealso

preprocessQuantile and convertArray .

Author

Andrew E. Jaffe, Shan V. Andrews, E. Andres Houseman

References

EA Houseman, WP Accomando, DC Koestler, BC Christensen, CJ Marsit, HH Nelson, JK Wiencke and KT Kelsey. list("DNA methylation arrays as surrogate measures of cell mixture distribution.") BMC bioinformatics (2012) 13:86. doi: 10.1186/1471-2105-13-86 .

AE Jaffe and RA Irizarry. list("Accounting for cellular ", " heterogeneity is critical in epigenome-wide association studies.") Genome Biology (2014) 15:R31. doi: 10.1186/gb-2014-15-2-r31 .

KM Bakulski, JI Feinberg, SV Andrews, J Yang, S Brown, S McKenney, F Witter, J Walston, AP Feinberg, and MD Fallin. list("DNA methylation of cord blood cell types: Applications for mixed cell birth studies.") Epigenetics (2016) 11:5. doi: 10.1080/15592294.2016.1161875 .

Examples

if(require(FlowSorted.Blood.450k)) {
wh.WBC <- which(FlowSorted.Blood.450k$CellType == "WBC")
wh.PBMC <- which(FlowSorted.Blood.450k$CellType == "PBMC")
RGset <- FlowSorted.Blood.450k[, c(wh.WBC, wh.PBMC)]
## The following line is purely to work around an issue with repeated
## sampleNames and Biobase::combine()
sampleNames(RGset) <- paste(RGset$CellType,
c(seq(along = wh.WBC), seq(along = wh.PBMC)), sep = "_")
counts <- estimateCellCounts(RGset, meanPlot = FALSE)
round(counts, 2)
}
Link to this function

fixMethOutliers()

Fix methylation outliers

Description

Methylation outliers (loci with very extreme values of the Meth or Unmeth channel) are identified and fixed (see details).

Usage

fixMethOutliers(object, K = -3, verbose = FALSE)

Arguments

ArgumentDescription
objectAn object of class [Genomic]MethylSet .
KThe number of standard deviations away from the median when defining the outlier cutoff, see details.
verboseShould the function be verbose?

Details

This function fixes outlying methylation calls in the Meth channel and Unmeth channel separately.

Unlike other types of arrays, all loci on a methylation array ought to measure something (apart from loci on the Y chromosome in a female sample). An outlier is a loci with a very low value in one of the two methylation channels. Typically, relatively few loci ought to be outliers.

An outlier is defined in a sample and methylation channel specific way. First the (sample, methylation channel) values are log2(x+0.5) transformed and then the median and mad of these values are computed. An outlier is then defined to be any value less than the median plus K times the mad, and these outlier values are thresholded at the cutoff (on the original scale).

Value

An object of the same class as object where outlier values in the methylation channels have been thresholded.

Seealso

minfiQC

Author

Rafael A. Irizarry and Kasper D. Hansen

Examples

if(require(minfiData)) {
MsetEx <- fixMethOutliers(MsetEx)
}

Find gap signals in 450k data

Description

This function finds probes in the Illumina 450k Array for which calculated beta values cluster into distinct groups separated by a defined threshold. It identifies, for these gaps signals the number of groups, the size of these groups, and the samples in each group.

Usage

gaphunter(object, threshold=0.05, keepOutliers=FALSE,
            outCutoff=0.01, verbose=TRUE)

Arguments

ArgumentDescription
objectAn object of class (Genomic)RatioSet, (Genomic)MethylSet, or matrix. If one of the first two, code list("getBeta") is used to calculate beta values. If a matrix, must be one of beta values.
thresholdThe difference in consecutive, ordered beta values that defines the presence of a gap signal. Defaults to 5 percent.
keepOutliersShould outlier-driven gap signals be kept in the results? Defaults to FALSE
outCutoffValue used to identify gap signals driven by outliers. Defined as the percentage of the total sample size; the sum of samples in all groups except the largest must exceed this number of samples in order for the probe to still be considered a gap signal. Defaults to 1 percent.
verboselogical value. If TRUE , it writes some messages indicating progress. If FALSE nothing should be printed.

Details

The function can calculate a beta matrix or utilize a user-supplied matrix of beta values.

The function will idenfity probes with a gap in a beta signal greater than or equal to the defined threshold . These probes constitue an additional, dataset-specific subset of probes that merit special consideration due to their tendency to be driven by an underlying SNP or other genetic variant. In this manner, these probes can serve as surrogates for underlying genetic signal locally and/or in a broader (i.e. haplotype) context. Please see our upcoming manuscript for a detailed description of the utility of these probes.

Outlier-driven gap signals are those in which the sum of the smaller group(s) does not exceed a certain percentage of the sample size, defined by the argument outCutoff.

Value

A list with three values,

*

Author

Shan V. Andrews sandre17@jhu.edu .

References

SV Andrews, C Ladd-Acosta, AP Feinberg, KD Hansen, MD Fallin. list(list("Gap hunting"), " to characterize clustered probe signals in Illumina methylation array data.") Epigenetics & Chromatin (2016) 9:56. doi: 10.1186/s13072-016-0107-z .

Examples

if(require(minfiData)) {
gapres <- gaphunter(MsetEx.sub, threshold=0.3, keepOutliers=TRUE)
#Note: the threshold argument is increased from the default value in this small example
#dataset with 6 people to avoid the reporting of a large amount of probes as gap signals.
#In a typical EWAS setting with hundreds of samples, the default arguments should be
#sufficient.
}
Link to this function

getAnnotation()

Accessing annotation for Illumina methylation objects

Description

These functions access provided annotation for various Illumina methylation objects.

Usage

getAnnotation(object, what = "everything", lociNames = NULL,
              orderByLocation = FALSE, dropNonMapping = FALSE)
getLocations(object, mergeManifest = FALSE,
             orderByLocation = FALSE, lociNames = NULL)
getAnnotationObject(object)
getSnpInfo(object, snpAnno = NULL)
addSnpInfo(object, snpAnno = NULL)
dropLociWithSnps(object, snps = c("CpG", "SBE"), maf = 0, snpAnno = NULL)
getProbeType(object, withColor = FALSE)
getIslandStatus(object, islandAnno = NULL)

Arguments

ArgumentDescription
objectA minfi object.
whatWhich annotation objects should be returned?
lociNamesRestrict the return values to these loci.
orderByLocationShould the return object be ordered according to genomic location.
dropNonMappingShould loci that do not have a genomic location associated with it (by being marked as unmapped or multi ) be dropped from the return object.
mergeManifestShould the manifest be merged into the return object?
snpAnnoThe snp annotation you want to use; NULL signifies picking the default.
withColorShould the return object have the type I probe color labelled?
snpsThe type of SNPs used.
mafMinor allelle fraction.
islandAnnoLike snpAnno , but for islands.

Details

getAnnotation returns requested annotation as a DataFrame , with each row corresponding to a methylation loci. If object is of class IlluminaHumanAnnotation no specific ordering of the return object is imposed. If, on the other hand, the class of object imposes some natural order on the return object (ie. if the object is of class | [Genomic](Methyl|Ratio)Set ), this order is kept in the return| object. Note that RGChannelSet does not impose a specific ordering on the methylation loci.

getAnnotationObject returns the annotation object, as opposed to the annotation the object contains. This is useful for printing and examining the contents of the object.

getLocations is a convenience function which returns Locations as a GRanges and which furthermore drops unmapped loci. A user should not need to call this function, instead mapToGenome should be used to get genomic coordinates and granges to return those coordinates.

getSnpInfo is a conevnience function which gets a SNP DataFrame containing information on which probes contains SNPs where. addSnpInfo adds this information to the rowRanges or granges of the object. dropLociWithSnps is a convenience function for removing loci with SNPs based on their MAF.

To see which options are available for what , simply print the annotation object, possibly using getAnnotationObject .

Value

For getAnnotation , a DataFrame with the requested information.

For getAnnotationObject , a IlluminaMethylationAnnotation object.

For getLocations , a GRanges with the locations.

For getProbeType and getIslandStatus , a character vector with the requested information.

For getSnpInfo , a DataFrame with the requested information. For addSnpInfo , an object of the same class as object but with the SNP information added to the metadata columns of the granges of the object.

For dropLociWithSnps an object of the same kind as the input, possibly with fewer loci.

Seealso

IlluminaMethylationAnnotation for the basic class, mapToGenome for a better alternative (for users) to getLocations .

Author

Kasper Daniel Hansen khansen@jhsph.edu

Examples

if(require(minfiData)) {
table(getIslandStatus(MsetEx))
getAnnotation(MsetEx, what = "Manifest")
}
Link to this function

getGenomicRatioSetFromGEO()

Reading Illumina methylation array data from GEO.

Description

Reading Illumina methylation array data from GEO.

Usage

getGenomicRatioSetFromGEO(GSE = NULL, path = NULL, array = "IlluminaHumanMethylation450k",
                          annotation = .default.450k.annotation, what = c("Beta", "M"),
                          mergeManifest = FALSE, i = 1)

Arguments

ArgumentDescription
GSEThe GSE ID of the dataset to be downloaded from GEO.
pathIf data already downloaded, the path with soft files. Either GSE or path are required.
arrayArray name.
annotationThe feature annotation to be used. This includes the location of features thus depends on genome build.
whatAre Beta or M values being downloaded.
mergeManifestShould the Manifest be merged to the final object.
iIf the GEO download results in more than one dataset, it pickes entry i .

Details

This function downloads data from GEO using getGEO from the GEOquery package. It then returns a GenomicRatioSet object. Note that the rs probes (used for genotyping) are dropped.

Value

A GenomicRatioSet object.

Seealso

If the data is already in memor you can use makeGenomicRatioSetFromMatrix

Author

Tim Triche Jr. and Rafael A. Irizarry rafa@jimmy.harvard.edu .

Examples

mset=getGenomicRatioSetFromGEO("GSE42752")

Estimate sample-specific quality control (QC) for methylation data

Description

Estimate sample-specific quality control (QC) for methylation data.

Usage

getQC(object)
addQC(object, qc)
plotQC(qc, badSampleCutoff = 10.5)

Arguments

ArgumentDescription
objectAn object of class [Genomic]MethylSet .
qcAn object as produced by getQC .
badSampleCutoffThe cutoff for identifying a bad sample.

Value

For getQC , a DataFrame with two columns: mMed and uMed which are the chipwide medians of the Meth and Unmeth channels.

For addQC , essentially object supplied to the function, but with two new columns added to the pheno data slot: uMed and mMed .

Seealso

minfiQC for an all-in-one function.

Author

Rafael A. Irizarry and Kasper D. Hansen

Examples

if(require(minfiData)){
qc <- getQC(MsetEx)
MsetEx <- addQC(MsetEx, qc = qc)
## plotQC(qc)
}

Estimating sample sex based on methylation data

Description

Estimates samples sex based on methylation data.

Usage

getSex(object = NULL, cutoff = -2)
addSex(object, sex = NULL)
plotSex(object, id = NULL)

Arguments

ArgumentDescription
objectAn object of class [Genomic]MethylSet .
cutoffWhat should the difference in log2 copynumber be between males and females.
sexAn optional character vector of sex (with values M and F ).
idText used as plotting symbols in the plotSex function. Used for sample identification on the plot.

Details

Estimation of sex is based on the median values of measurements on the X and Y chromosomes respectively. If yMed - xMed is less than cutoff we predict a female, otherwise male.

Value

For getSex , a DataFrame with columns predictedSex (a character with values M and F ), xMed and yMed , which are the chip-wide medians of measurements on the two sex chromosomes.

For addSex , an object of the same type as object but with the output of getSex(object) added to the pheno data.

For plotSex , a plot of xMed vs. yMed , which are the chip-wide medians of measurements on the two sex chromosomes, coloured by predictedSex .

Author

Rafael A. Irizarry, Kasper D. Hansen, Peter F. Hickey

Examples

if(require(minfiData)) {
GMsetEx <- mapToGenome(MsetEx)
estSex <- getSex(GMsetEx)
GMsetEx <- addSex(GMsetEx, sex = estSex)
}

logit in base 2.

Description

Utility functions for computing logit and inverse logit in base 2.

Usage

logit2(x)
ilogit2(x)

Arguments

ArgumentDescription
xA numeric vector.

Value

A numeric vector.

Author

Kasper Daniel Hansen khansen@jhsph.edu .

Examples

logit2(c(0.25, 0.5, 0.75))
Link to this function

makeGenomicRatioSetFromMatrix()

Make a GenomicRatioSet from a matrix

Description

Make a GenomicRatioSet from a matrix.

Usage

makeGenomicRatioSetFromMatrix(mat, rownames = NULL, pData = NULL,
                              array = "IlluminaHumanMethylation450k",
                              annotation = .default.450k.annotation,
                              mergeManifest = FALSE, what = c("Beta", "M"))

Arguments

ArgumentDescription
matThe matrix that will be converted.
rownamesThe feature IDs associated with the rows of mat that will be used to match to the IlluminaHumanMethylation450k feature IDs.
pDataA DataFrame or data.frame describing the samples represented by the columns of mat . If the rownames of the pData don't match the colnames of mat these colnames will be changed. If pData is not supplied, a minimal DataFrame is created.
arrayArray name.
annotationThe feature annotation to be used. This includes the location of features thus depends on genome build.
mergeManifestShould the Manifest be merged to the final object.
whatAre Beta or M values being downloaded.

Details

Many 450K data is provided as csv files. This function permits you to convert a matrix of values into the class that is used by functions such as bumphunter and blockFinder . The rownames of mat are used to match the 450K array features. Alternatively the rownames can be supplied directly through rownames .

Value

A GenomicRatioSet object.

Seealso

getGenomicRatioSetFromGEO is similar but reads data from GEO.

Author

Rafael A. Irizarry rafa@jimmy.harvard.edu .

Examples

mat <- matrix(10,5,2)
rownames(mat) <- c( "cg13869341", "cg14008030","cg12045430", "cg20826792","cg00381604")
grset <- makeGenomicRatioSetFromMatrix(mat)
Link to this function

mapToGenome_methods()

Mapping methylation data to the genome

Description

Mapping Ilumina methylation array data to the genome using an annotation package. Depending on the genome, not all methylation loci may have a genomic position.

Usage

list(list("mapToGenome"), list("MethylSet"))(object, mergeManifest = FALSE)
list(list("mapToGenome"), list("MethylSet"))(object, mergeManifest = FALSE)
list(list("mapToGenome"), list("RGChannelSet"))(object, list())

Arguments

ArgumentDescription
objectEither a MethylSet , a RGChannelSet or a RatioSet .
mergeManifestShould the information in the associated manifest package be merged into the location GRanges ?
list()Passed to the method for MethylSet .

Details

FIXME: details on the MethylSet method.

The RGChannelSet method of this function is a convenience function: the RGChannelSet is first transformed into a MethylSet using preprocessRaw . The resulting MethylSet is then mapped directly to the genome.

This function silently drops loci which cannot be mapped to a genomic position, based on the associated annotation package.

Value

An object of class GenomicMethylSet or GenomicRatioSet .

Seealso

GenomicMethylSet for the output object and MethylSet for the input object. Also, getLocations obtains the genomic locations for a given object.

Author

Kasper Daniel Hansen khansen@jhsph.edu

Examples

if (require(minfiData)) {
## MsetEx.sub is a small subset of MsetEx;
## only used for computational speed.
GMsetEx.sub <- mapToGenome(MsetEx.sub)
}

Multi-dimensional scaling plots giving an overview of similarities and differences between samples.

Description

Multi-dimensional scaling (MDS) plots showing a 2-d projection of distances between samples.

Usage

mdsPlot(dat, numPositions = 1000, sampNames = NULL, sampGroups = NULL, xlim, ylim,
    pch = 1, pal = brewer.pal(8, "Dark2"), legendPos = "bottomleft",
    legendNCol, main = NULL)

Arguments

ArgumentDescription
datAn RGChannelSet , a MethylSet or a matrix . We either use the getBeta function to get Beta values (for the first two) or we assume the matrix contains Beta values.
numPositionsUse the numPositions genomic positions with the most methylation variability when calculating distance between samples.
sampNamesOptional sample names. See details.
sampGroupsOptional sample group labels. See details.
xlimx-axis limits.
ylimy-axis limits.
pchPoint type. See par for details.
palColor palette.
legendPosThe legend position. See legend for details.
legendNColThe number of columns in the legend. See legend for details.
mainPlot title.

Details

Euclidean distance is calculated between samples using the numPositions most variable CpG positions. These distances are then projected into a 2-d plane using classical multidimensional scaling transformation.

Value

No return value. Plots are produced as a side-effect.

Seealso

qcReport , controlStripPlot , densityPlot , densityBeanPlot , par , legend

Author

Martin Aryee aryee@jhu.edu .

References

I Borg, P Groenen. list("Modern Multidimensional Scaling: theory and applications (2nd ", " ed.)") New York: Springer-Verlag (2005) pp. 207-212. ISBN 0387948457.

http://en.wikipedia.org/wiki/Multidimensional_scaling

Examples

if (require(minfiData)) {

names <- pData(MsetEx)$Sample_Name
groups <- pData(MsetEx)$Sample_Group
mdsPlot(MsetEx, sampNames=names, sampGroups=groups)

}

easy one-step QC of methylation object

Description

This function combines a number of functions into a simple to use, one step QC step/

Usage

minfiQC(object, fixOutliers = TRUE, verbose = FALSE)

Arguments

ArgumentDescription
objectAn object of class [Genomic]MethylSet .
fixOutliersShould the function fix outlying observations (using fixMethOutliers ) before running QC?
verboseShould the function be verbose?

Details

A number of functions are run sequentially on the object .

First outlier values are thresholded using fixMethOutliers . Then qc is performed using getQC and then sample specific sex is estimated using getSex .

Value

A list with two values,

*

Seealso

getSex , getQC , fixMethOutliers

Author

Kasper D. Hansen

Examples

if(require(minfiData)) {
out <- minfiQC(MsetEx)
## plotQC(out$qc)
## plotSex(out$sex)
}
Link to this function

minfi_defunct()

Defunct functions in package minfi

Description

These functions are provided now defunct in minfi .

Details

The following functions are now defunct (not working anymore); use the replacement indicated below:

  • list(list("read.450k"), ": Use ", list(list("read.metharray")))

  • list(list("read.450k.sheet"), ": Use ", list(list("read.metharray.sheet")))

  • list(list("read.450k.exp"), ": Use ", list(list("read.metharray.exp")))

Seealso

Defunct .

Link to this function

minfi_deprecated()

Deprecated functions in package minfi

Description

These functions are provided for compatibility with older versions of minfi only, and will be defunct at the next release.

Details

No functions are currently deprecated.

The following functions are deprecated and will be made defunct; use the replacement indicated below:

  • list(list("read.450k"), ": ", list(list("read.metharray")))

  • list(list("read.450k.sheet"), ": ", list(list("read.metharray.sheet")))

  • list(list("read.450k.exp"), ": ", list(list("read.metharray.exp")))

Link to this function

minfi_package()

Analyze Illumina's methylation arrays

Description

Tools for analyzing and visualizing Illumina methylation array data. There is special focus on the 450k array; the 27k array is not supported at the moment.

Details

The package contains a (hopefully) useful vignette; this vignette contains a lengthy description of the package content and capabilities.

Link to this function

plotBetasByType()

Plot the overall distribution of beta values and the distributions of the Infinium I and II probe types.

Description

Plot the overall density distribution of beta values and the density distributions of the Infinium I and II probe types.

Usage

plotBetasByType(data, probeTypes = NULL, legendPos = "top",
                colors = c("black", "red", "blue"),
                main = "", lwd = 3, cex.legend = 1)

Arguments

ArgumentDescription
dataA MethylSet or a matrix or a vector . We either use the getBeta function to get Beta values (in the first case) or we assume the matrix or vector contains Beta values.
probeTypesIf data is a MethylSet this argument is not needed. Otherwise, a data.frame with a column 'Name' containing probe IDs and a column 'Type' containing their corresponding assay design type.
legendPosThe x and y co-ordinates to be used to position the legend. They can be specified by keyword or in any way which is accepted by xy.coords . See legend for details.
colorsColors to be used for the different beta value density distributions. Must be a vector of length 3.
mainPlot title.
lwdThe line width to be used for the different beta value density distributions.
cex.legendThe character expansion factor for the legend text.

Details

The density distribution of the beta values for a single sample is plotted. The density distributions of the Infinium I and II probes are then plotted individually, showing how they contribute to the overall distribution. This is useful for visualising how using preprocessSWAN affects the data.

Value

No return value. Plot is produced as a side-effect.

Seealso

densityPlot , densityBeanPlot , par , legend

Author

Jovana Maksimovic jovana.maksimovic@mcri.edu.au .

Examples

if (require(minfiData)) {
Mset.swan <- preprocessSWAN(RGsetEx, MsetEx)
par(mfrow=c(1,2))
plotBetasByType(MsetEx[,1], main="Raw")
plotBetasByType(Mset.swan[,1], main="SWAN")
}

Plot methylation values at an single genomic position

Description

Plot single-position (single CpG) methylation values as a function of a categorical or continuous phenotype

Usage

plotCpg(dat, cpg, pheno, type = c("categorical", "continuous"),
    measure = c("beta", "M"), ylim = NULL, ylab = NULL, xlab = "",
    fitLine = TRUE, mainPrefix = NULL, mainSuffix = NULL)

Arguments

ArgumentDescription
datAn RGChannelSet , a MethylSet or a matrix . We either use the getBeta (or getM for measure="M" ) function to get Beta values (or M-values) (for the first two) or we assume the matrix contains Beta values (or M-values).
cpgA character vector of the CpG position identifiers to be plotted.
phenoA vector of phenotype values.
typeIs the phenotype categorical or continuous?
measureShould Beta values or log-ratios (M) be plotted?
ylimy-axis limits.
ylaby-axis label.
xlabx-axis label.
fitLineFit a least-squares best fit line when using a continuous phenotype.
mainPrefixText to prepend to the CpG name in the plot main title.
mainSuffixText to append to the CpG name in the plot main title.

Details

This function plots methylation values (Betas or log-ratios) at individual CpG loci as a function of a phenotype.

Value

No return value. Plots are produced as a side-effect.

Author

Martin Aryee aryee@jhu.edu .

Examples

if (require(minfiData)) {

grp <- pData(MsetEx)$Sample_Group
cpgs <- c("cg00050873", "cg00212031", "cg26684946", "cg00128718")
par(mfrow=c(2,2))
plotCpg(MsetEx, cpg=cpgs, pheno=grp, type="categorical")

}
Link to this function

preprocessFunnorm()

Functional normalization for Illumina 450k arrays

Description

Functional normalization (FunNorm) is a between-array normalization method for the Illumina Infinium HumanMethylation450 platform. It removes unwanted variation by regressing out variability explained by the control probes present on the array.

Usage

preprocessFunnorm(rgSet, nPCs=2, sex = NULL, bgCorr = TRUE,
                  dyeCorr = TRUE, keepCN = TRUE, ratioConvert = TRUE,
                  verbose = TRUE)

Arguments

ArgumentDescription
rgSetAn object of class RGChannelSet .
nPCsNumber of principal components from the control probes PCA
sexAn optional numeric vector containing the sex of the samples.
bgCorrShould the NOOB background correction be done, prior to functional normalization (see preprocessNoob )
dyeCorrShould dye normalization be done as part of the NOOB background correction (see preprocessNoob )?
keepCNShould copy number estimates be kept around? Setting to FALSE will decrease the size of the output object significantly.
ratioConvertShould we run ratioConvert , ie. should the output be a GenomicRatioSet or should it be kept as a GenomicMethylSet ; the latter is for experts.
verboseShould the function be verbose?

Details

This function implements functional normalization preprocessing for Illumina methylation microarrays. Functional normalization extends the idea of quantile normalization by adjusting for known covariates measuring unwanted variation. For the 450k array, the first k principal components of the internal control probes matrix play the role of the covariates adjusting for technical variation. The number k of principal components can be set by the argument nPCs . By default nPCs is set to 2, and have been shown to perform consistently well across different datasets. This parameter should only be modified by expert users. The normalization procedure is applied to the Meth and Unmeth intensities separately, and to type I and type II signals separately. For the probes on the X and Y chromosomes we normalize males and females separately using the gender information provided in the sex argument. For the Y chromosome, standard quantile normalization is used due to the small number of probes, which results in instability for functional normalization. If sex is unspecified ( NULL ), a guess is made using by the getSex function using copy number information. Note that this algorithm does not rely on any assumption and therefore can be be applicable for cases where global changes are expected such as in cancer-normal comparisons or tissue differences.

Value

an object of class GenomicRatioSet , unless ratioConvert=FALSE in which case an object of class GenomicMethylSet .

Seealso

RGChannelSet as well as IlluminaMethylationManifest for the basic classes involved in these functions. preprocessRaw and preprocessQuantile are other preprocessing functions. Background correction may be done using preprocessNoob .

Author

Jean-Philippe Fortin jfortin@jhsph.edu , Kasper D. Hansen khansen@jhsph.edu .

References

JP Fortin, A Labbe, M Lemire, BW Zanke, TJ Hudson, EJ Fertig, CMT Greenwood and KD Hansen. list("Functional normalization of 450k methylation array data improves ", " replication in large cancer studies") . (2014) Genome Biology (2014) 15:503. doi: 10.1186/s13059-014-0503-2 .

Examples

if (require(minfiData)) {
## RGsetEx.sub is a small subset of RGsetEx;
## only used for computational speed.
Mset.sub.funnorm <- preprocessFunnorm(RGsetEx.sub)
}
Link to this function

preprocessIllumina()

Perform preprocessing as Genome Studio.

Description

These functions implements preprocessing for Illumina methylation microarrays as used in Genome Studio, the standard software provided by Illumina.

Usage

preprocessIllumina(rgSet, bg.correct = TRUE, normalize = c("controls", "no"),
    reference = 1)
bgcorrect.illumina(rgSet)
normalize.illumina.control(rgSet, reference = 1)

Arguments

ArgumentDescription
rgSetAn object of class RGChannelSet .
bg.correctlogical, should background correction be performed?
normalizelogical, should (control) normalization be performed?
referencefor control normalization, which array is the reference?

Details

We have reverse engineered the preprocessing methods from Genome Studio, based on the documentation.

The current implementation of control normalization is equal to what Genome Studio provides (this statement is based on comparing Genome Studio output to the output of this function), with the following caveat: this kind of normalization requires the selection of a reference array. It is unclear how Genome Studio selects the reference array, but we allow for the manual specification of this parameter.

The current implementation of background correction is roughly equal to Genome Studio. Based on examining the output of 24 arrays, we are able to exactly recreate 18 out of the 24. The remaining 6 arrays had a max discrepancy in the Red and/or Green channel of 1-4 (this is on the unlogged intensity scale, so 4 is very small).

A script for doing this comparison may be found in the scripts directory (although it is of limited use without the data files).

Value

preprocessIllumina returns a MethylSet , while bgcorrect.illumina and normalize.illumina.control both return a RGChannelSet with corrected color channels.

Seealso

RGChannelSet and MethylSet as well as IlluminaMethylationManifest for the basic classes involved in these functions. preprocessRaw is another basic preprocessing function.

Author

Kasper Daniel Hansen khansen@jhsph.edu .

Examples

if (require(minfiData)) {

dat <- preprocessIllumina(RGsetEx, bg.correct=FALSE, normalize="controls")
slot(name="preprocessMethod", dat)[1]

}
Link to this function

preprocessNoob()

The Noob/ssNoob preprocessing method for Infinium methylation microarrays.

Description

Noob (normal-exponential out-of-band) is a background correction method with dye-bias normalization for Illumina Infinium methylation arrays.

Usage

preprocessNoob(rgSet, offset = 15, dyeCorr = TRUE, verbose = FALSE,
               dyeMethod=c("single", "reference"))

Arguments

ArgumentDescription
rgSetAn object of class RGChannelSet .
offsetAn offset for the normexp background correction.
dyeCorrShould dye correction be done?
verboseShould the function be verbose?
dyeMethodHow should dye bias correction be done: use a single sample approach (ssNoob), or a reference array?

Value

An object of class MethylSet .

Seealso

RGChannelSet as well as IlluminaMethylationManifest for the basic classes involved in these functions. preprocessRaw and preprocessQuantile are other preprocessing functions.

Author

Tim Triche, Jr.

References

TJ Triche, DJ Weisenberger, D Van Den Berg, PW Laird and KD Siegmund list("Low-level processing of Illumina Infinium DNA Methylation ", " BeadArrays") . Nucleic Acids Res (2013) 41, e90. doi: 10.1093/nar/gkt090 .

Examples

if (require(minfiData)) {
## RGsetEx.sub is a small subset of RGsetEx;
## only used for computational speed.
MsetEx.sub.noob <- preprocessNoob(RGsetEx.sub)
}
if (require(minfiData)) {
dyeMethods <- c(ssNoob="single", refNoob="reference")
GRsets <- lapply(dyeMethods,
function(m) preprocessNoob(RGsetEx, dyeMethod=m))
all.equal(getBeta(GRsets$refNoob), getBeta(GRsets$ssNoob)) # TRUE
}
Link to this function

preprocessQuantile()

Stratified quantile normalization for an Illumina methylation array.

Description

Stratified quantile normalization for Illumina amethylation arrays.

This function implements stratified quantile normalization preprocessing for Illumina methylation microarrays. Probes are stratified by region (CpG island, shore, etc.)

Usage

preprocessQuantile(object, fixOutliers = TRUE, removeBadSamples = FALSE,
                   badSampleCutoff = 10.5, quantileNormalize = TRUE,
                   stratified = TRUE, mergeManifest = FALSE, sex = NULL,
                   verbose = TRUE)

Arguments

ArgumentDescription
objectAn object of class RGChannelSet or [Genomic]MethylSet .
fixOutliersShould low outlier Meth and Unmeth signals be fixed?
removeBadSamplesShould bad samples be removed?
badSampleCutoffSamples with median Meth and Umneth signals below this cutoff will be labelled bad .
quantileNormalizeShould quantile normalization be performed?
stratifiedShould quantile normalization be performed within genomic region strata (e.g. CpG island, shore, etc.)?
mergeManifestShould the information in the associated manifest package be merged into the output object?
sexGender
verboseShould the function be verbose?

Details

This function implements stratified quantile normalization preprocessing for Illumina methylation microarrays. If removeBadSamples is TRUE we calculate the median Meth and median Unmeth signal for each sample, and remove those samples where their average falls below badSampleCutoff . The normalization procedure is applied to the Meth and Unmeth intensities separately. The distribution of type I and type II signals is forced to be the same by first quantile normalizing the type II probes across samples and then interpolating a reference distribution to which we normalize the type I probes. Since probe types and probe regions are confounded and we know that DNAm distributions vary across regions we stratify the probes by region before applying this interpolation. For the probes on the X and Y chromosomes we normalize males and females separately using the gender information provided in the sex argument. If gender is unspecified ( NULL ), a guess is made using by the getSex function using copy number information. Background correction is not used, but very small intensities close to zero are thresholded using the fixMethOutlier . Note that this algorithm relies on the assumptions necessary for quantile normalization to be applicable and thus is not recommended for cases where global changes are expected such as in cancer-normal comparisons.

Note that this normalization procedure is essentially similar to one previously presented (Touleimat and Tost, 2012), but has been independently re-implemented due to the present lack of a released, supported version.

Value

a GenomicRatioSet

Seealso

getSex , minfiQC , fixMethOutliers for functions used as part of preprocessQuantile .

Note

A bug in the function was found to affect the Beta values of type I probes, when stratified=TRUE (default). This is fixed in minfi version 1.19.7 and 1.18.4 and greater.

Author

Rafael A. Irizarry

References

N Touleimat and J Tost. list("Complete pipeline for Infinium Human ", " Methylation 450K BeadChip data processing using subset quantile ", " normalization for accurate DNA methylation estimation.") Epigenomics (2012) 4:325-341.

Examples

if (require(minfiData)) {
# NOTE: RGsetEx.sub is a small subset of RGsetEx; only used for computational
#       speed
GMset.sub.quantile <- preprocessQuantile(RGsetEx.sub)
}
if(require(minfiData)) {
GMset <- preprocessQuantile(RGsetEx)
}
Link to this function

preprocessRaw()

Creation of a MethylSet without normalization

Description

Converts the Red/Green channel for an Illumina methylation array into methylation signal, without using any normalization.

Usage

preprocessRaw(rgSet)

Arguments

ArgumentDescription
rgSetAn object of class RGChannelSet .

Details

This function takes the Red and the Green channel of an Illumina methylation array, together with its associated manifest object and converts it into a MethylSet containing the methylated and unmethylated signal.

Value

An object of class MethylSet

Seealso

RGChannelSet and MethylSet as well as IlluminaMethylationManifest .

Author

Kasper Daniel Hansen khansen@jhsph.edu .

Examples

if (require(minfiData)) {

dat <- preprocessRaw(RGsetEx)
slot(name="preprocessMethod", dat)[1]

}
Link to this function

preprocessSwan()

Subset-quantile Within Array Normalisation for Illumina Infinium HumanMethylation450 BeadChips

Description

Subset-quantile Within Array Normalisation (SWAN) is a within array normalisation method for the Illumina Infinium HumanMethylation450 platform. It allows Infinium I and II type probes on a single array to be normalized together.

Usage

preprocessSWAN(rgSet, mSet = NULL, verbose = FALSE)

Arguments

ArgumentDescription
rgSetAn object of class RGChannelSet .
mSetAn optional object of class MethylSet . If set to NULL preprocessSwan uses preprocessRaw on the rgSet argument. In case mSet is supplied, make sure it is the result of preprocessing the rgSet argument.
verboseShould the function be verbose?

Details

The SWAN method has two parts. First, an average quantile distribution is created using a subset of probes defined to be biologically similar based on the number of CpGs underlying the probe body. This is achieved by randomly selecting N Infinium I and II probes that have 1, 2 and 3 underlying CpGs, where N is the minimum number of probes in the 6 sets of Infinium I and II probes with 1, 2 or 3 probe body CpGs. If no probes have previously been filtered out e.g. sex chromosome probes, etc. N=11,303. This results in a pool of 3N Infinium I and 3N Infinium II probes. The subset for each probe type is then sorted by increasing intensity. The value of each of the 3N pairs of observations is subsequently assigned to be the mean intensity of the two probe types for that row or quantile . This is the standard quantile procedure. The intensities of the remaining probes are then separately adjusted for each probe type using linear interpolation between the subset probes.

Value

an object of class MethylSet

Seealso

RGChannelSet and MethylSet as well as IlluminaMethylationManifest .

Note

SWAN uses a random subset of probes to do the between array normalization. In order to achive reproducible results, the seed needs to be set using set.seed .

Author

Jovana Maksimovic jovana.maksimovic@mcri.edu.au

References

J Maksimovic, L Gordon and A Oshlack (2012). list("SWAN: Subset ", " quantile Within-Array Normalization for Illumina Infinium ", " HumanMethylation450 BeadChips") . Genome Biology 13, R44.

Examples

if (require(minfiData)) {
## RGsetEx.sub is a small subset of RGsetEx;
## only used for computational speed.
MsetEx.sub.swan <- preprocessSWAN(RGsetEx.sub)
}
if (require(minfiData)) {
dat <- preprocessRaw(RGsetEx)
preprocessMethod(dat)
datSwan <- preprocessSWAN(RGsetEx, mSet = dat)
datIlmn <- preprocessIllumina(RGsetEx)
preprocessMethod(datIlmn)
datIlmnSwan <- preprocessSWAN(RGsetEx, mSet = datIlmn)
}

QC report for Illumina Infinium Human Methylation 450k arrays

Description

Produces a PDF QC report for Illumina Infinium Human Methylation 450k arrays, useful for identifying failed samples.

Usage

qcReport(rgSet, sampNames = NULL, sampGroups = NULL, pdf = "qcReport.pdf",
    maxSamplesPerPage = 24, controls = c("BISULFITE CONVERSION I",
    "BISULFITE CONVERSION II", "EXTENSION", "HYBRIDIZATION",
    "NON-POLYMORPHIC", "SPECIFICITY I", "SPECIFICITY II", "TARGET REMOVAL"))

Arguments

ArgumentDescription
rgSetAn object of class RGChannelSet .
sampNamesSample names to be used for labels.
sampGroupsSample groups to be used for labels.
pdfPath and name of the PDF output file.
maxSamplesPerPageMaximum number of samples to plot per page in those sections that plot each sample separately.
controlsThe control probe types to include in the report.

Details

This function produces a QC report as a PDF file. It is a useful first step after reading in a new dataset to get an overview of quality and to flag potentially problematic samples.

Value

No return value. A PDF is produced as a side-effect.

Seealso

mdsPlot , controlStripPlot , densityPlot , densityBeanPlot

Author

Martin Aryee aryee@jhu.edu .

Examples

if (require(minfiData)) {

names <- pData(RGsetEx)$Sample_Name
groups <- pData(RGsetEx)$Sample_Group

qcReport(RGsetEx, sampNames=names, sampGroups=groups, pdf="qcReport.pdf")

}
Link to this function

ratioConvert_methods()

Converting methylation signals to ratios (Beta or M-values)

Description

Converting methylation data from methylation and unmethylation channels, to ratios (Beta and M-values).

Usage

list(list("ratioConvert"), list("MethylSet"))(object, what = c("beta", "M", "both"), keepCN = TRUE, list())
list(list("ratioConvert"), list("GenomicMethylSet"))(object, what = c("beta", "M", "both"), keepCN = TRUE, list())

Arguments

ArgumentDescription
objectEither a MethylSet , or a GenomicRatioSet .
whatWhich ratios should be computed and stored?
keepCNA logical, should copy number values be computed and stored in the object?
list()Passed to getBeta , getM methods.

Value

An object of class RatioSet or GenomicRatioSet .

Seealso

RatioSet or code list("GenomicRatioSet") for the output object and MethylSet or code list("GenomicMethylSet") for the input object.

Author

Kasper Daniel Hansen khansen@jhsph.edu

Examples

if (require(minfiData)) {
## MsetEx.sub is a small subset of MsetEx;
## only used for computational speed.
RsetEx.sub <- ratioConvert(MsetEx.sub, keepCN = TRUE)
}
Link to this function

readGEORawFile()

Read in Unmethylated and Methylated signals from a GEO raw file.

Description

Read in Unmethylated and Methylated signals from a GEO raw file.

Usage

readGEORawFile(filename, sep = ",", Uname = "Unmethylated signal",
               Mname = "Methylated signal", row.names = 1, pData = NULL,
               array = "IlluminaHumanMethylation450k",
               annotation = .default.450k.annotation, mergeManifest = FALSE,
               showProgress = TRUE, ...)

Arguments

ArgumentDescription
filenameThe name of the file to be read from.
sepThe field separator character. Values on each line of the file are separated by this character.
UnameA string that uniquely identifies the columns containing the unmethylated signals.
MnameA string that uniquely identifies the columns containing the methylated signals.
row.namesThe column containing the feature (CpG) IDs.
pDataA DataFrame or data.frame describing the samples represented by the columns of mat . If the rownames of the pData don't match the colnames of mat these colnames will be changed. If pData is not supplied, a minimal DataFrame is created.
arrayArray name.
annotationThe feature annotation to be used. This includes the location of features thus depends on genome build.
mergeManifestShould the Manifest be merged to the final object.
showProgressTRUE displays progress on the console. It is produced in fread's C code.
...Additional arguments passed to data.table:: .

Details

450K experiments uploaded to GEO typically include a raw data file as part of the supplementary materials. Unfortunately there does not appear to be a standard format. This function provides enough flexibility to read these files. Note that you will likely need to change the sep , Uname , and Mname arguments and make sure the first column includes the feature (CpG) IDs. You can use the readLines function to decipher how to set these arguments.

Note that the function uses the fread function in the data.table package to read the data. To install data.table type install.packages("data.table") . We use this package because the files too large for read.table .

Value

A GenomicMethylSet object.

Seealso

getGenomicRatioSetFromGEO

Author

Rafael A. Irizarry rafa@jimmy.harvard.edu .

Examples

library(GEOquery)
getGEOSuppFiles("GSE29290")
gunzip("GSE29290/GSE29290_Matrix_Signal.txt.gz")
# NOTE: This particular example file uses a comma as the decimal separator
#       (e.g., 0,00 instead of 0.00). We replace all such instances using the
#       command line tool 'sed' before reading in the modified file.
cmd <- paste0("sed s/,/./g GSE29290/GSE29290_Matrix_Signal.txt > ",
"GSE29290/GSE29290_Matrix_Signal_mod.txt")
system(cmd)
gmset <- readGEORawFile(filename = "GSE29290/GSE29290_Matrix_Signal_mod.txt",
Uname = "Signal_A",
Mname = "Signal_B",
sep = " ")

Read in tab deliminited file in the TCGA format

Description

Read in tab deliminited file in the TCGA format

Usage

readTCGA(filename, sep = "  ", keyName = "Composite Element REF", Betaname = "Beta_value",
         pData = NULL, array = "IlluminaHumanMethylation450k",
         annotation = .default.450k.annotation, mergeManifest = FALSE,
         showProgress = TRUE)

Arguments

ArgumentDescription
filenameThe name of the file to be read from.
sepThe field separator character. Values on each line of the file are separated by this character.
keyNameThe column name of the field containing the feature IDs.
BetanameThe character string contained all column names of the beta value fields.
pDataA DataFrame or data.frame describing the samples represented by the columns of mat . If the rownames of the pData don't match the colnames of mat these colnames will be changed. If pData is not supplied, a minimal DataFrame is created.
arrayArray name.
annotationThe feature annotation to be used. This includes the location of features thus depends on genome build.
mergeManifestShould the Manifest be merged to the final object.
showProgressTRUE displays progress on the console. It is produced in fread's C code.

Details

This function is a wrapper for makeGenomicRatioSetFromMatrix . It assumes a very specific format, used by TCGA, and then uses the fread function in the data.table package to read the data. To install data.table type install.packages("data.table") . We use this package because the files too large for read.table .

Currently, an example of a file that this function reads is here: http://gdac.broadinstitute.org/runs/stddata2014_10_17/data/UCEC/20141017/gdac.broadinstitute.org_UCEC.Merge_methylationhumanmethylation450jhu_usc_eduLevel_3within_bioassay_data_set_functiondata.Level_3.2014101700.0.0.tar.gz . Note it is a 8.1 GB archive.

Value

A GenomicRatioSet object.

Seealso

makeGenomicRatioSetFromMatrix

Author

Rafael A. Irizarry rafa@jimmy.harvard.edu .

Examples

filename <- "example.txt" ##file must be in the specicif TCGA format
readTCGA(filename)
Link to this function

readmetharray()

Parsing IDAT files from Illumina methylation arrays.

Description

Parsing IDAT files from Illumina methylation arrays.

Usage

read.metharray(basenames, extended = FALSE, verbose = FALSE, force = FALSE)

Arguments

ArgumentDescription
basenamesThe basenames or filenames of the IDAT files. By basenames we mean the filename without the ending _Grn.idat or _Red.idat (such that each sample occur once). By filenames we mean filenames including _Grn.idat or _Red.idat (but only one of the colors)
extendedShould a RGChannelSet or a RGChannelSetExtended be returned.
verboseShould the function be verbose?
forceShould reading different size IDAT files be forced? See Details.

Details

The type of methylation array is guess by looking at the number of probes in the IDAT files.

We have seen IDAT files from the same array, but with different number of probes in the wild. Specifically this is the case for early access EPIC arrays which have fewer probes than final release EPIC arrays. It is possible to combine IDAT files from the same inferred array, but with different number of probes, into the same RGChannelSet by setting force=TRUE . The output object will have the same number of probes as the smallest array being parsed; effectively removing probes which could have been analyzed.

Value

An object of class RGChannelSet or RGChannelSetExtended .

Seealso

read.metharray.exp for a convenience function for reading an experiment, read.metharray.sheet for reading a sample sheet and RGChannelSet for the output class.

Author

Kasper Daniel Hansen khansen@jhsph.edu .

Examples

if(require(minfiData)) {

baseDir <- system.file("extdata", package = "minfiData")
RGset1 <- read.metharray(file.path(baseDir, "5723646052", "5723646052_R02C02"))

}
Link to this function

readmetharrayexp()

Reads an entire metharray experiment using a sample sheet

Description

Reads an entire methylation array experiment using a sample sheet or (optionally) a target like data.frame.

Usage

read.metharray.exp(base = NULL, targets = NULL, extended = FALSE,
    recursive = FALSE, verbose = FALSE, force = FALSE)

Arguments

ArgumentDescription
baseThe base directory.
targetsA targets data.frame , see details
extendedShould the output of the function be a "RGChannelSetExtended" (default is "RGChannelSet" ).
recursiveShould the search be recursive (see details)
verboseShould the function be verbose?
forceShould reading different size IDAT files be forced? See the documentation for read.metharray

Details

If the targets argument is NULL , the function finds all two-color IDAT files in the directory given by base . If recursive is TRUE , the function searches base and all subdirectories. A two-color IDAT files are pair of files with names ending in _Red.idat or _Grn.idat .

If the targets argument is not NULL it is assumed it has a columned named Basename , and this is assumed to be pointing to the base name of a two color IDAT file, ie. a name that can be made into a real IDAT file by appending either _Red.idat or _Grn.idat .

The type of methylation array is guess by looking at the number of probes in the IDAT files.

Value

An object of class "RGChannelSet" or "RGChannelSetExtended" .

Seealso

read.metharray for the workhorse function, read.metharray.sheet for reading a sample sheet and RGChannelSet for the output class.

Author

Kasper Daniel Hansen khansen@jhsph.edu .

Examples

if(require(minfiData)) {

baseDir <- system.file("extdata", package = "minfiData")
RGset <- read.metharray.exp(file.path(baseDir, "5723646052"))

}
Link to this function

readmetharraysheet()

Reading an Illumina methylation sample sheet

Description

Reading an Illumina methylation sample sheet, containing pheno-data information for the samples in an experiment.

Usage

read.metharray.sheet(base, pattern = "csv$", ignore.case = TRUE,
    recursive = TRUE, verbose = TRUE)

Arguments

ArgumentDescription
baseThe base directory from which the search is started.
patternWhat pattern is used to identify a sample sheet file, see list.files
ignore.caseShould the file search be case sensitive?
recursiveShould the file search be recursive, see list.files ?
verboseShould the function be verbose?

Details

This function search the directory base (possibly including subdirectories depending on the argument recursive for sample sheet files (see below). These files are identified solely on the base of their filename given by the arguments pattern and ignore.case (note the use of a dollarsign to mean end of file name).

In case multiple sheet files are found, they are all read and the return object will contain the concatenation of the files.

A sample sheet file is essentially a CSV (comma-separated) file containing one line per sample, with a number of columns describing pheno-data or other important information about the sample. The file may contain a header, in which case it is assumed that all lines up to and including a line starting with [Data] should be dropped. This is modelled after a sample sheet file Illumina provides. It is also very similar to the targets file made used by the popular limma package (see the extensive package vignette).

An attempt at guessing the file path to the IDAT files represented in the sheet is made. This should be doublechecked and might need to manually changed.

The type of methylation array is guess by looking at the number of probes in the IDAT files.

Value

A data.frame containing the columns of all the sample sheets. As described in details, a column named Sentrix_Position is renamed to Array and Sentrix_ID is renamed to Slide . In addition the data.frame will contain a column named Basename .

Seealso

read.metharray.exp and read.metharray for functions reading IDAT files. list.files for help on the arguments recursive and ignore.case .

Author

Kasper Daniel Hansen khansen@jhsph.edu .

Examples

if(require(minfiData)) {

baseDir <- system.file("extdata", package = "minfiData")
sheet <- read.metharray.sheet(baseDir)

}

Subset an RGChannelset by CpG loci.

Description

Subset an RGChannelSet by CpG loci.

Usage

subsetByLoci(rgSet, includeLoci = NULL, excludeLoci = NULL,
             keepControls = TRUE, keepSnps = TRUE)

Arguments

ArgumentDescription
rgSetAn object of class RGChannelSet (or RGChannelSetExtended ).
includeLociA character vector of CpG identifiers which should be kept.
excludeLociA character vector of CpG identifiers which should be excluded.
keepControlsShould control probes be kept?
keepSnpsShould SNP probes be kept?

Details

This task is non-trivial because an RGChannelSet is indexed by probe position on the array, not by loci name.

Value

An object of class RGChannelSet , which some probes removed.

Examples

if(require(minfiData)) {
loci <- c("cg00050873", "cg00212031", "cg00213748", "cg00214611")
subsetByLoci(RGsetEx.sub, includeLoci = loci)
subsetByLoci(RGsetEx.sub, excludeLoci = loci)
}

Various utilities

Description

Utility functions operating on objects from the minfi package.

Usage

getMethSignal(object, what = c("Beta", "M"), list())

Arguments

ArgumentDescription
objectAn object from the minfi package supporting either getBeta or getM .
whatWhich signal is returned.
list()Passed to the method described by argument what .

Value

A matrix.

Author

Kasper Daniel Hansen khansen@jhsph.edu .

Examples

if(require(minfiData)) {
head(getMethSignal(MsetEx, what = "Beta"))
}