bioconductor v3.9.0 MSnbase
MSnbase provides infrastructure for manipulation,
Link to this section Summary
Functions
Representation of chromatographic MS data
Container for multiple Chromatogram objects
Class "FeatComp"
Features of Interest
The "MIAPE" Class for Storing Proteomics Experiment Information
Class MSmap
The 'MSnExp' Class for MS Data And Meta-Data
The "MSnProcess" Class
Storing multiple related MSnSets
The "MSnSet" Class for MS Proteomics Expression Data and Meta-Data
MSnbase options
Parse MzTab
files
The OnDiskMSnExp
Class for MS Data And Meta-Data
Simple processing step class
The "ReporterIons" Class
List of Spectrum objects along with annotations
The "Spectrum1" Class for MS1 Spectra
The "Spectrum2" Class for MSn Spectra
The "Spectrum" Class
TMT 6/10-plex sets
Adds Identification Data
Identify aggregation outliers
Generate an average MSnSet
Bin 'MSnExp' or 'Spectrum' instances
Calculate ions produced by fragmentation.
Extract chromatogram object(s)
Clean 'MSnExp', 'Spectrum' or 'Chromatogram' instances
Combines features in an MSnSet
object
Combine Spectra
Combine signal from consecutive spectra of LCMS experiments
Keep only common feature names
Compare two MSnSets
Compare Spectra of an 'MSnExp' or 'Spectrum' instances
Combine spectra to a consensus spectrum
MSnbase Deprecated and Defunct
Estimate the m/z resolution of a spectrum
Estimate m/z scattering in consecutive scans
Noise Estimation for 'Spectrum' instances
Calculate all ratio pairs
Extracts precursor-specific spectra from an 'MSnExp' object
Expand or merge feature variables
Converts factors to strings
Calculates coeffivient of variation for features
Fills up a vector
Filter out unreliable PSMs.
Format Retention Time
Return a variable name
Amino acids
Atomic mass.
Returns the matching column names of indices.
Checks if raw data files have any spectra or chromatograms
iPQF: iTRAQ (and TMT) Protein Quantification based on Features
iTRAQ 4-plex set
NA heatmap visualisation for 2 groups
Quantitative proteomics data imputation
Get mode from mzML data file
Example MSnExp
and MSnSet
data sets
Tests equality of list elements class
Convert to camel case by replacing dots by captial letters
Create a data with missing values
Combine a list of spectra to a single spectrum
Documenting missing data visualisation
Coerce identification data to a data.frame
How many features in a group?
Count the number of quantitfied features.
Overview of missing value
Navigate an MSnExp
object
Combine peptides into proteins.
Normalisation of MSnExp
, MSnSet
and
Spectrum
objects
Non-parametric coefficient of variation
Class to Contain Raw Mass-Spectrometry Assays and Experimental Metadata
Peak Detection for 'MSnExp' or 'Spectrum' instances
The 'plot2d' method for 'MSnExp' quality assessment
The 'plotDensity' method for 'MSnExp' quality assessment
The delta m/z plot
Exploring missing data in 'MSnSet' instances
Plotting a 'Spectrum' vs another 'Spectrum' object.
Plotting 'MSnExp' and 'Spectrum' object(s)
Number of precursor selection events
Performs reporter ions purity correction
Quantifies 'MSnExp' and 'Spectrum' objects
Imports mass-spectrometry raw data files as 'MSnExp' instances.
Read 'MSnSet'
Import mgf files as 'MSnExp' instances.
Import peptide-spectrum matches
Read an 'mzTab' file
Read an 'mzTab' file
Read SRM/MRM chromatographic data
Reduce a data.frame
Removes non-identified features
Removes low intensity peaks
Removes reporter ion tag peaks
Select feature variables of interest
Smooths 'MSnExp' or 'Spectrum' instances
Trims 'MSnExp' or 'Spectrum' instances
Update MSnbase objects
Write MS data to mzML or mzXML files
Write an experiment or spectrum to an mgf file
Link to this section Functions
Chromatogram_class()
Representation of chromatographic MS data
Description
The Chromatogram
class is designed to store
chromatographic MS data, i.e. pairs of retention time and intensity
values. Instances of the class can be created with the
Chromatogram
constructor function but in most cases the dedicated
methods for OnDiskMSnExp and
MSnExp objects extracting chromatograms should be
used instead (i.e. the chromatogram
method).
Chromatogram
: create an instance of the
Chromatogram
class.
aggregationFun,aggregationFun<-
get or set the
aggregation function.
rtime
returns the retention times for the rentention time
- intensity pairs stored in the chromatogram.
intensity
returns the intensity for the rentention time
- intensity pairs stored in the chromatogram.
mz
get the mz (range) of the chromatogram. The
function returns a numeric(2)
with the lower and upper mz value.
precursorMz
get the mz of the precursor ion. The
function returns a numeric(2)
with the lower and upper mz value.
fromFile
returns the value from the fromFile
slot.
length
returns the length (number of retention time -
intensity pairs) of the chromatogram.
as.data.frame
returns the rtime
and
intensity
values from the object as data.frame
.
filterRt
: filters the chromatogram based on the provided
retention time range.
clean
: Removes unused 0-intensity data points. See
clean
documentation for more details and examples.
plot
: plots a Chromatogram
object.
msLevel
returns the MS level of the chromatogram.
isEmpty
returns TRUE
for empty chromatogram or
chromatograms with all intensities being NA
.
productMz
get the mz of the product chromatogram/ion. The
function returns a numeric(2)
with the lower and upper mz value.
bin
aggregates intensity values from a chromatogram in discrete bins
along the retention time axis and returns a Chromatogram
object with
the retention time representing the mid-point of the bins and the intensity
the binned signal.
Usage
Chromatogram(rtime = numeric(), intensity = numeric(),
mz = c(NA_real_, NA_real_), filterMz = c(NA_real_, NA_real_),
precursorMz = c(NA_real_, NA_real_), productMz = c(NA_real_,
NA_real_), fromFile = integer(), aggregationFun = character(),
msLevel = 1L)
aggregationFun(object)
list(list("show"), list("Chromatogram"))(object)
list(list("rtime"), list("Chromatogram"))(object)
list(list("intensity"), list("Chromatogram"))(object)
list(list("mz"), list("Chromatogram"))(object, filter = FALSE)
list(list("precursorMz"), list("Chromatogram"))(object)
list(list("fromFile"), list("Chromatogram"))(object)
list(list("length"), list("Chromatogram"))(x)
list(list("as.data.frame"), list("Chromatogram"))(x)
list(list("filterRt"), list("Chromatogram"))(object, rt)
list(list("clean"), list("Chromatogram"))(object, all = FALSE, na.rm = FALSE)
list(list("plot"), list("Chromatogram,ANY"))(x, col = "#00000060", lty = 1,
type = "l", xlab = "retention time", ylab = "intensity",
main = NULL, ...)
list(list("msLevel"), list("Chromatogram"))(object)
list(list("isEmpty"), list("Chromatogram"))(x)
list(list("productMz"), list("Chromatogram"))(object)
list(list("bin"), list("Chromatogram"))(object, binSize = 0.5,
breaks = seq(floor(min(rtime(object))), ceiling(max(rtime(object))), by
= binSize), fun = max)
Arguments
Argument | Description |
---|---|
rtime | numeric with the retention times (length has to be equal to the length of intensity ). |
intensity | numeric with the intensity values (length has to be equal to the length of rtime ). |
mz | numeric(2) representing the mz value range (min, max) on which the chromatogram was created. This is supposed to contain the real range of mz values in contrast to the filterMz below. If not applicable use mzrange = c(0, 0) . |
filterMz | numeric(2) representing the mz value range (min, max) that was used to filter the original object on mz dimension. If not applicable use filterMz = c(0, 0) . |
precursorMz | numeric(2) for SRM/MRM transitions. Represents the mz of the precursor ion. See details for more information. |
productMz | numeric(2) for SRM/MRM transitions. Represents the mz of the product. See details for more information. |
fromFile | integer(1) the index of the file within the OnDiskMSnExp or MSnExp from which the chromatogram was extracted. |
aggregationFun | character string specifying the function that was used to aggregate intensity values for the same retention time across the mz range. Supported are "sum" (total ion chromatogram), "max" (base peak chromatogram), "min" and "mean" . |
msLevel | integer with the MS level from which the chromatogram was extracted. |
object | A Chromatogram object. |
filter | For mz : whether the mz range used to filter the original object should be returned ( filter = TRUE ), or the mz range calculated on the real data ( filter = FALSE ). |
x | For as.data.frame and length : a Chromatogram object. |
rt | For filterRt : numeric(2) defining the lower and upper retention time for the filtering. |
all | For clean : logical(1) whether all 0 intensities should be removed (default is FALSE ). See clean for more details and examples. |
na.rm | For clean : logical(1) whether all NA intensities should be removed before cleaning the Chromatogram . Defaults to FALSE . See clean for more details and examples. |
col | For plot : the color to be used for plotting. |
lty | For plot : the line type. See plot for more details. |
type | For plot : the type of plot. See plot for more details. |
xlab | For plot : the x-axis label. |
ylab | For plot : the y-axis label. |
main | For plot : the plot title. If not provided the mz range will be used as plot title. |
... | For plot : additional arguments to be passed to the plot function. |
binSize | for bin : numeric(1) with the size of the bins (in seconds). |
breaks | for bin : numeric defining the bins. Usually not required as the function calculates the bins automatically based on binSize . |
fun | for bin : function to be used to aggregate the intensity values falling within each bin. |
Details
The mz
, filterMz
, precursorMz
and
productMz
are stored as a numeric(2)
representing a range
even if the chromatogram was generated for only a single ion (i.e. a
single mz value). Using ranges for mz
values allow this class to
be used also for e.g. total ion chromatograms or base peak chromatograms.
The slots precursorMz
and productMz
allow to represent SRM
(single reaction monitoring) and MRM (multiple SRM) chromatograms. As
example, a Chromatogram
for a SRM transition 273 -> 153 will have
a @precursorMz = c(273, 273)
and a
@productMz = c(153, 153)
.
Seealso
Chromatograms
for combining Chromatogram
in
a two-dimensional matrix (rows being mz-rt ranges, columns samples).
chromatogram
for the method to extract chromatogram data
from a MSnExp or OnDiskMSnExp
object.
clean
for the method to clean a Chromatogram
object.
Author
Johannes Rainer
Examples
## Create a simple Chromatogram object.
ints <- abs(rnorm(100, sd = 100))
rts <- seq_len(length(ints))
chr <- Chromatogram(rtime = rts, intensity = ints)
chr
## Extract intensities
intensity(chr)
## Extract retention times
rtime(chr)
## Extract the mz range - is NA for the present example
mz(chr)
## plot the Chromatogram
plot(chr)
## Create a simple Chromatogram object based on random values.
chr <- Chromatogram(intensity = abs(rnorm(1000, mean = 2000, sd = 200)),
rtime = sort(abs(rnorm(1000, mean = 10, sd = 5))))
chr
## Get the intensities
head(intensity(chr))
## Get the retention time
head(rtime(chr))
## What is the retention time range of the object?
range(rtime(chr))
## Filter the chromatogram to keep only values between 4 and 10 seconds
chr2 <- filterRt(chr, rt = c(4, 10))
range(rtime(chr2))
Chromatograms_class()
Container for multiple Chromatogram objects
Description
The Chromatograms
class allows to store
Chromatogram
objects in a matrix
-like
two-dimensional structure.
Chromatograms
: create an instance of class
Chromatograms
.
Chromatograms
objects can, just like a matrix
,
be subsetted using the [
method. Single elements, rows or columns
can be replaced using e.g. x[1, 1] <- value
where value
has to be a Chromatogram
object or a list
of such objects.
plot
: plots a Chromatograms
object. For each row
in the object one plot is created, i.e. all Chromatogram
objects in the same row are added to the same plot.
phenoData
: accesses the phenotypical desccription of the
samples. Returns an AnnotatedDataFrame
object.
pData
: accesses the phenotypical description of the
samples. Returns a data.frame
.
pData<-
: replace the phenotype data.
$
and $<-
: get or replace individual columns of
the object's pheno data.
colnames<-
: replace or set the column names of the
Chromatograms
object. Does also set the rownames
of the
phenoData
.
sampleNames
: get the sample names.
sampleNames<-
: replace or set the sample names of the
Chromatograms
object (i.e. the rownames
of the pheno data
and colnames
of the data matrix.
isEmpty
: returns TRUE
if the Chromatograms
object or all of its Chromatogram
objects is/are empty or contain
only NA
intensities.
featureNames
: returns the feature names of the
Chromatograms
object.
featureNames<-
: set the feature names.
featureData
: return the feature data.
featureData<-
: replace the object's feature data.
fData
: return the feature data as a data.frame
.
fData<-
: replace the object's feature data by passing a
data.frame
fvarLabels
: return the feature data variable names (i.e.
column names).
rownames<-
: replace the rownames (and featureNames) of
the object.
precursorMz
: return the precursor m/z from the chromatograms. The
method returns a matrix
with 2 columns ( "mzmin"
and
"mzmax"
) and as many rows as there are rows in the
Chromatograms
object. Each row contains the precursor m/z of the
chromatograms in that row. An error is thrown if the chromatograms within one
row have different precursor m/z values.
productMz
: return the product m/z from the chromatograms. The
method returns a matrix
with 2 columns ( "mzmin"
and
"mzmax"
) and as many rows as there are rows in the
Chromatograms
object. Each row contains the product m/z of the
chromatograms in that row. An error is thrown if the chromatograms within one
row have different product m/z values.
mz
: returns the m/z for each row of the Chromatograms
object
as a two-column matrix
(with columns "mzmin"
and
"mzmax"
).
polarity
: returns the polarity of the scans/chromatograms: 1
,
0
or -1
for positive, negative or unknown polarity.
bin
aggregates intensity values of chromatograms in discrete bins
along the retention time axis. By default, individual Chromatogram
objects of one row are binned into the same bins. The function returns a
Chromatograms
object with binned chromatograms.
clean
: removes 0-intensity data points. Either all of them
(with all = TRUE
) or all except those adjacent to non-zero
intensities ( all = FALSE
; default). See clean
documentation for more details and examples.
Usage
Chromatograms(data, phenoData, featureData, ...)
list(list("show"), list("Chromatograms"))(object)
list(list("["), list("Chromatograms,ANY,ANY,ANY"))(x, i, j, drop = FALSE)
list(list("["), list("Chromatograms"))(x, i, j) <- value
list(list("plot"), list("Chromatograms,ANY"))(x, col = "#00000060", lty = 1,
type = "l", xlab = "retention time", ylab = "intensity",
main = NULL, ...)
list(list("phenoData"), list("Chromatograms"))(object)
list(list("pData"), list("Chromatograms"))(object)
list(list("pData"), list("Chromatograms,data.frame"))(object) <- value
list(list("$"), list("Chromatograms"))(x, name)
list(list("$"), list("Chromatograms"))(x, name) <- value
list(list("colnames"), list("Chromatograms"))(x) <- value
list(list("sampleNames"), list("Chromatograms"))(object)
list(list("sampleNames"), list("Chromatograms,ANY"))(object) <- value
list(list("isEmpty"), list("Chromatograms"))(x)
list(list("featureNames"), list("Chromatograms"))(object)
list(list("featureNames"), list("Chromatograms"))(object) <- value
list(list("featureData"), list("Chromatograms"))(object)
list(list("featureData"), list("Chromatograms,ANY"))(object) <- value
list(list("fData"), list("Chromatograms"))(object)
list(list("fData"), list("Chromatograms,ANY"))(object) <- value
list(list("fvarLabels"), list("Chromatograms"))(object)
list(list("rownames"), list("Chromatograms"))(x) <- value
list(list("precursorMz"), list("Chromatograms"))(object)
list(list("productMz"), list("Chromatograms"))(object)
list(list("mz"), list("Chromatograms"))(object)
list(list("polarity"), list("Chromatograms"))(object)
list(list("bin"), list("Chromatograms"))(object, binSize = 0.5,
breaks = numeric(), fun = max)
list(list("clean"), list("Chromatograms"))(object, all = FALSE, na.rm = FALSE)
Arguments
Argument | Description |
---|---|
data | A list of Chromatogram objects. |
phenoData | either a data.frame , AnnotatedDataFrame or AnnotatedDataFrame describing the phenotypical information of the samples. |
featureData | either a data.frame or AnnotatedDataFrame with additional information for each row of chromatograms. |
... | Additional parameters to be passed to the matrix constructor, such as nrow , ncol and byrow . |
object | a Chromatograms object. |
x | For all methods: a Chromatograms object. |
i | For [ : numeric , logical or character defining which row(s) to extract. |
j | For [ : numeric , logical or character defining which columns(s) to extract. |
drop | For [ : logical(1) whether to drop the dimensionality of the returned object (if possible). The default is drop = FALSE , i.e. each subsetting returns a Chromatograms object (or a Chromatogram object if a single element is extracted). |
value | For [<- : the replacement object(s). Can be a list of Chromatogram objects or, if length of i and j are 1, a single Chromatogram object. For pData<- : a data.frame with the number of rows matching the number of columns of object . For colnames : a character with the new column names. |
col | For plot : the color to be used for plotting. Either a vector of length 1 or equal to ncol(x) . |
lty | For plot : the line type (see plot for more details. Can be either a vector of length 1 or of length equal to ncol(x) . |
type | For plot : the type of plot (see plot for more details. Can be either a vector of length 1 or of length equal to ncol(x) . |
xlab | For plot : the x-axis label. |
ylab | For plot : the y-axis label. |
main | For plot : the plot title. If not provided the mz range will be used as plot title. |
name | For $ , the name of the pheno data column. |
binSize | for bin : numeric(1) with the size of the bins (in seconds). |
breaks | for bin : numeric defining the bins. Usually not required as the function calculates the bins automatically based on binSize and the retention time range of chromatograms in the same row. |
fun | for bin : function to be used to aggregate the intensity values falling within each bin. |
all | for clean : logical(1) whether all 0 intensities should be removed ( all = TRUE ), or whether 0-intensities adjacent to peaks should be kept ( all = FALSE ; default). |
na.rm | for clean : logical(1) whether all NA intensities should be removed prior to clean 0 intensity data points. |
Details
The Chromatograms
class extends the base matrix
class
and hence allows to store Chromatogram
objects in a
two-dimensional array. Each row is supposed to contain
Chromatogram
objects for one MS data slice with a common
m/z and rt range. Columns contain Chromatogram
objects from the
same sample.
plot
: if nrow(x) > 1
the plot area is split into
nrow(x)
sub-plots and the chromatograms of one row are plotted in
each.
Value
For [
: the subset of the Chromatograms
object. If a
single element is extracted (e.g. if i
and j
are of length
1) a Chromatogram
object is returned. Otherwise (if
drop = FALSE
, the default, is specified) a Chromatograms
object is returned. If drop = TRUE
is specified, the method
returns a list
of Chromatogram
objects.
For phenoData
: an AnnotatedDataFrame
representing the
pheno data of the object.
For pData
: a data.frame
representing the pheno data of
the object.
For $
: the value of the corresponding column in the pheno data
table of the object.
Seealso
Chromatogram
for the class representing chromatogram
data.
chromatogram
for the method to extract a
Chromatograms
object from a MSnExp or
OnDiskMSnExp object.
readSRMData
for the function to read chromatographic data
of an SRM/MRM experiment.
Note
Subsetting with [
will always return a Chromatograms
object (with the exception of extracting a single element)
unless drop = TRUE
is specified. This is different from the
default subsetting behaviour of matrix
-like objects.
Author
Johannes Rainer
Examples
## Creating some chromatogram objects to put them into a Chromatograms object
ints <- abs(rnorm(25, sd = 200))
ch1 <- Chromatogram(rtime = 1:length(ints), ints)
ints <- abs(rnorm(32, sd = 90))
ch2 <- Chromatogram(rtime = 1:length(ints), ints)
ints <- abs(rnorm(19, sd = 120))
ch3 <- Chromatogram(rtime = 1:length(ints), ints)
ints <- abs(rnorm(21, sd = 40))
ch4 <- Chromatogram(rtime = 1:length(ints), ints)
## Create a Chromatograms object with 2 rows and 2 columns
chrs <- Chromatograms(list(ch1, ch2, ch3, ch4), nrow = 2)
chrs
## Extract the first element from the second column. Extracting a single
## element always returns a Chromatogram object.
chrs[1, 2]
## Extract the second row. Extracting a row or column (i.e. multiple elements
## returns by default a list of Chromatogram objects.
chrs[2, ]
## Extract the second row with drop = FALSE, i.e. return a Chromatograms
## object.
chrs[2, , drop = FALSE]
## Replace the first element.
chrs[1, 1] <- ch3
chrs
## Add a pheno data.
pd <- data.frame(name = c("first sample", "second sample"),
idx = 1:2)
pData(chrs) <- pd
## Column names correspond to the row names of the pheno data
chrs
## Access a column within the pheno data
chrs$name
## Access the m/z ratio for each row; this will be NA for the present
## object
mz(chrs)
## Create some random Chromatogram objects
ints <- abs(rnorm(123, mean = 200, sd = 32))
ch1 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 231)
ints <- abs(rnorm(122, mean = 250, sd = 43))
ch2 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 231)
ints <- abs(rnorm(125, mean = 590, sd = 120))
ch3 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 542)
ints <- abs(rnorm(124, mean = 1200, sd = 509))
ch4 <- Chromatogram(rtime = seq_along(ints), intensity = ints, mz = 542)
## Combine into a 2x2 Chromatograms object
chrs <- Chromatograms(list(ch1, ch2, ch3, ch4), byrow = TRUE, ncol = 2)
## Plot the second row
plot(chrs[2, , drop = FALSE])
## Plot all chromatograms
plot(chrs, col = c("#ff000080", "#00ff0080"))
FeatComp_class()
Class "FeatComp"
Description
Comparing feature names of two comparable MSnSet
instances.
Seealso
averageMSnSet
to compuate an average MSnSet
.
Author
Laurent Gatto lg390@cam.ac.uk and Thomas Naake
Examples
library("pRolocdata")
data(tan2009r1)
data(tan2009r2)
x <- compfnames(tan2009r1, tan2009r2)
x[[1]]
x[2:3]
head(common(x[[1]]))
data(tan2009r3)
tanl <- list(tan2009r1, tan2009r2, tan2009r3)
xx <- compfnames(tanl, fcol1 = NULL)
length(xx)
tail(xx)
all.equal(xx[[15]],
compfnames(tan2009r2, tan2009r3, fcol1 = NULL))
str(sapply(xx, common))
FeaturesOfInterest_class()
Features of Interest
Description
The Features of Interest infrastructure allows to define a set
of features of particular interest to be used/matched against existing
data sets contained in "
. A specific set
of features is stored as an FeaturesOfInterest
object and a
collection of such non-redundant instances (for example for a specific
organism, project, ...) can be collected in a FoICollection
.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
library("pRolocdata")
data(tan2009r1)
x <- FeaturesOfInterest(description = "A traceable test set of features of interest",
fnames = featureNames(tan2009r1)[1:10],
object = tan2009r1)
x
description(x)
foi(x)
y <- FeaturesOfInterest(description = "Non-traceable features of interest",
fnames = featureNames(tan2009r1)[111:113])
y
## an illegal FeaturesOfInterest
try(FeaturesOfInterest(description = "Won't work",
fnames = c("A", "Z", featureNames(tan2009r1)),
object = tan2009r1))
FeaturesOfInterest(description = "This work, but not traceable",
fnames = c("A", "Z", featureNames(tan2009r1)))
xx <- FoICollection()
xx
xx <- addFeaturesOfInterest(x, xx)
xx <- addFeaturesOfInterest(y, xx)
names(xx) <- LETTERS[1:2]
xx
## Sub-setting
xx[1]
xx[[1]]
xx[["A"]]
description(xx)
foi(xx)
fnamesIn(x, tan2009r1)
fnamesIn(x, tan2009r1, count = TRUE)
rmFeaturesOfInterest(xx, 1)
MIAPE_class()
The "MIAPE" Class for Storing Proteomics Experiment Information
Description
The Minimum Information About a Proteomics Experiment. The current implementation is based on the MIAPE-MS 2.4 document.
Author
Laurent Gatto lg390@cam.ac.uk
References
About MIAPE: http://www.psidev.info/index.php?q=node/91 , and references therein, especially 'Guidelines for reporting the use of mass spectrometry in proteomics', Nature Biotechnology 26, 860-861 (2008).
MSmap_class()
Class MSmap
Description
A class to store mass spectrometry data maps, i.e intensities collected along the M/Z and retention time space during a mass spectrometry acquisition.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
## downloads the data
library("rpx")
px1 <- PXDataset("PXD000001")
(i <- grep("TMT.+mzML", pxfiles(px1), value = TRUE))
mzf <- pxget(px1, i)
## Using an mzRpwiz object
## reads the data
ms <- openMSfile(mzf)
hd <- header(ms)
## a set of spectra of interest: MS1 spectra eluted
## between 30 and 35 minutes retention time
ms1 <- which(hd$msLevel == 1)
rtsel <- hd$retentionTime[ms1] / 60 > 30 &
hd$retentionTime[ms1] / 60 < 35
## the map
M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd)
plot(M, aspect = 1, allTicks = FALSE)
plot3D(M)
if (require("rgl") & interactive())
plot3D(M, rgl = TRUE)
## With some MS2 spectra
i <- ms1[which(rtsel)][1]
j <- ms1[which(rtsel)][2]
M2 <- MSmap(ms, i:j, 100, 1000, 1, hd)
plot3D(M2)
## Using an OnDiskMSnExp object and accessors
msn <- readMSData(mzf, mode = "onDisk")
## a set of spectra of interest: MS1 spectra eluted
## between 30 and 35 minutes retention time
ms1 <- which(msLevel(msn) == 1)
rtsel <- rtime(msn)[ms1] / 60 > 30 &
rtime(msn)[ms1] / 60 < 35
## the map
M3 <- MSmap(msn, ms1[rtsel], 521, 523, .005)
plot(M3, aspect = 1, allTicks = FALSE)
## With some MS2 spectra
i <- ms1[which(rtsel)][1]
j <- ms1[which(rtsel)][2]
M4 <- MSmap(msn, i:j, 100, 1000, 1)
plot3D(M4)
MSnExp_class()
The 'MSnExp' Class for MS Data And Meta-Data
Description
The MSnExp
class encapsulates data and meta-data for mass
spectrometry experiments, as described in the slots
section. Several data files (currently in mzXML
) can be loaded
together with the function readMSData
.
This class extends the virtual "
class.
In version 1.19.12, the polarity
slot had been added to the
"
class (previously in
"
). Hence, "MSnExp"
objects
created prior to this change will not be valid anymore, since all MS2
spectra will be missing the polarity
slot. Object can be
appropriately updated using the updateObject
method.
The feature variables in the feature data slot will depend on the
file. See also the documentation in the mzR
package that parses
the raw data files and produces these data.
Seealso
"
and readMSData
for loading
mzXML
, mzData
or mzML
files to generate an
instance of MSnExp
.
The "
manual page contains further
details and examples.
chromatogram
to extract chromatographic data from a
MSnExp
or OnDiskMSnExp
object.
write
for the function to write the data to mzML or
mzXML file(s).
Author
Laurent Gatto lg390@cam.ac.uk
References
Information about the mzXML format as well converters from vendor specific formats to mzXML: http://tools.proteomecenter.org/wiki/index.php?title=Formats:mzXML .
Examples
mzxmlfile <- dir(system.file("extdata",package="MSnbase"),
pattern="mzXML",full.names=TRUE)
msnexp <- readMSData(mzxmlfile)
msnexp
MSnProcess_class()
The "MSnProcess" Class
Description
MSnProcess
is a container for MSnExp and MSnSet processing
information. It records data files, processing steps, thresholds,
analysis methods and times that have been applied to MSnExp or MSnSet
instances.
Seealso
See the "
and "
classes that actually use MSnProcess
as a slot.
Note
This class is likely to be updated using an AnnotatedDataFrame
.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
showClass("MSnProcess")
MSnSetList_class()
Storing multiple related MSnSets
Description
A class for storing lists of MSnSet instances.
Details
There are two ways to store different sets of measurements pertaining an experimental unit, such as replicated measures of different conditions that were recorded over more than one MS acquisition. Without focusing on any proteomics technology in particular, these multiple assays can be recorded as
A single combined
MSnSet
(see the section Combining MSnSet instances in the MSnbase-demo section). In such cases, the different experimental (phenotypical) conditions are recorded as an AnnotatedDataFrame in thephenoData
slots. Quantitative data for features that were missing in an assay are generally encode as missing withNA
values. Alternatively, only features observed in all assays could be selected. See thecommonFeatureNames
functions to select only common features among two or moreMSnSet
instance.Each set of measurements is stored in an
MSnSet
which are combined into oneMSnSetList
. EachMSnSet
elements can have identical or different samples and features. Unless compiled directly manually by the user, one would expect at least one of these dimensions (features/rows or samples/columns) are conserved (i.e. all feature or samples names are identical). Seesplit
/unsplit
below.
Seealso
The commonFeatureNames
function to select common
features among MSnSet
instances.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
library("pRolocdata")
data(tan2009r1)
data(tan2009r2)
## The MSnSetList class
## for an unnamed list, names are set to indices
msnl <- MSnSetList(list(tan2009r1, tan2009r2))
names(msnl)
## a named example
msnl <- MSnSetList(list(A = tan2009r1, B = tan2009r2))
names(msnl)
msnsets(msnl)
length(msnl)
objlog(msnl)
msnl[[1]] ## an MSnSet
msnl[1] ## an MSnSetList of length 1
## Iterating over the elements
lapply(msnl, dim) ## a list
lapply(msnl, normalise) ## an MSnSetList
fData(msnl)
fData(msnl)$X <- sapply(msnl, nrow)
fData(msnl)
## Splitting and unsplitting
## splitting along the columns/samples
data(dunkley2006)
head(pData(dunkley2006))
(splt <- split(dunkley2006, "replicate"))
lapply(splt, dim) ## the number of rows and columns of the split elements
unsplt <- unsplit(splt, dunkley2006$replicate)
stopifnot(compareMSnSets(dunkley2006, unsplt))
## splitting along the rows/features
head(fData(dunkley2006))
(splt <- split(dunkley2006, "markers"))
unsplt <- unsplit(splt, factor(fData(dunkley2006)$markers))
simplify2array(lapply(splt, dim))
stopifnot(compareMSnSets(dunkley2006, unsplt))
MSnSet_class()
The "MSnSet" Class for MS Proteomics Expression Data and Meta-Data
Description
The MSnSet
holds quantified expression data for MS proteomics
data and the experimental meta-data.
The MSnSet
class is derived from the
"
class and mimics the
"
class classically used for
microarray data.
Seealso
"
, "
and
quantify
. MSnSet
quantitation values and
annotation can be exported to a file with
write.exprs
. See readMSnSet
to
create and MSnSet
using data available in a spreadsheet or
data.frame
.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
data(msnset)
msnset <- msnset[10:15]
exprs(msnset)[1, c(1, 4)] <- NA
exprs(msnset)[2, c(1, 2)] <- NA
is.na(msnset)
featureNames(filterNA(msnset, pNA = 1/4))
featureNames(filterNA(msnset, pattern = "0110"))
M <- matrix(rnorm(12), 4)
pd <- data.frame(otherpdata = letters[1:3])
fd <- data.frame(otherfdata = letters[1:4])
x0 <- MSnSet(M, fd, pd)
sampleNames(x0)
M <- matrix(rnorm(12), 4)
colnames(M) <- LETTERS[1:3]
rownames(M) <- paste0("id", LETTERS[1:4])
pd <- data.frame(otherpdata = letters[1:3])
rownames(pd) <- colnames(M)
fd <- data.frame(otherfdata = letters[1:4])
rownames(fd) <- rownames(M)
x <- MSnSet(M, fd, pd)
sampleNames(x)
## Visualisation
library("pRolocdata")
data(dunkley2006)
image(dunkley2006)
## Changing colours
image(dunkley2006, high = "darkgreen")
image(dunkley2006, high = "darkgreen", low = "yellow")
## Forcing feature names
image(dunkley2006, fnames = TRUE)
## Facetting
image(dunkley2006, facetBy = "replicate")
p <- image(dunkley2006)
library("ggplot2") ## for facet_grid
p + facet_grid(replicate ~ membrane.prep, scales = 'free', space = 'free')
p + facet_grid(markers ~ replicate)
## Fold-changes
dd <- dunkley2006
exprs(dd) <- exprs(dd) - 0.25
image(dd)
image(dd, low = "green", high = "red")
## Feature names are displayed by default for smaller data
dunkley2006 <- dunkley2006[1:25, ]
image(dunkley2006)
image(dunkley2006, legend = "hello")
## Coercion
if (require("SummarizedExperiment")) {
data(msnset)
se <- as(msnset, "SummarizedExperiment")
metadata(se) ## only logging
se <- addMSnSetMetadata(se, msnset)
metadata(se) ## all metadata
msnset2 <- as(se, "MSnSet")
processingData(msnset2)
}
as(msnset, "ExpressionSet")
MSnbaseOptions()
MSnbase options
Description
MSnbase defined a few options globally using the standard R
options mechanism. The current values of these options can be
queried with MSnbaseOptions
. The options are:
verbose
: defines a session-wide verbosity flag, that is used if theverbose
argument in individual functions is not set.PARALLEL_THRESH
: defines the minimum number of spectra per file necessary before using parallel processing.fastLoad
:logical(1)
. IfTRUE
performs faster data loading for all methods of OnDiskMSnExp that load data from the original files (such asspectrapply()
). Users experiencing data I/O errors (observed mostly on macOS systems) should set this option toFALSE
.
Usage
MSnbaseOptions()
isMSnbaseVerbose()
setMSnbaseVerbose(opt)
setMSnbaseParallelThresh(opt = 1000)
setMSnbaseFastLoad(opt = TRUE)
isMSnbaseFastLoad()
Arguments
Argument | Description |
---|---|
opt | The value of the new option |
Details
isMSnbaseVerbose
is one wrapper for the verbosity flag,
also available through options("MSnbase")$verbose
.
There are also setters to set options individually. When run without argument, the verbosity setter inverts the current value of the option.
Value
A list
of MSnbase options and the single option
values for the individual accessors.
MzTab_class()
Parse MzTab
files
Description
The MzTab
class stores the output of a basic parsing of a
mzTab
file. It contain the metadata (a list
), comments
(a character
vector), and the at least of of the following data
types: proteins, peptides, PSMs and small molecules (as
data.frames
).
At this stage, the metadata and data are only minimally parsed. More
specific data extraction and preparation are delegated to more
specialised functions, such as the as(., to = "MSnSetList")
and
readMzTabData
for proteomics data.
Note that no attempts are made to verify the validitiy of the mzTab file.
Author
Laurent Gatto, with contributions from Richard Cotton (see https://github.com/lgatto/MSnbase/issues/41 ).
References
The mzTab format is a light-weight, tab-delimited file format for proteomics data. See https://github.com/HUPO-PSI/mzTab for details and specifications.
Griss J, Jones AR, Sachsenberg T, Walzer M, Gatto L, Hartler J, Thallinger GG, Salek RM, Steinbeck C, Neuhauser N, Cox J, Neumann S, Fan J, Reisinger F, Xu QW, Del Toro N, Perez-Riverol Y, Ghali F, Bandeira N, Xenarios I, Kohlbacher O, Vizcaino JA, Hermjakob H. The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience. Mol Cell Proteomics. 2014 Oct;13(10):2765-75. doi: 10.1074/mcp.O113.036681. Epub 2014 Jun 30. PubMed PMID: 24980485; PubMed Central PMCID: PMC4189001.
Examples
## Test files from the mzTab developement repository
fls <- c("Cytidine.mzTab", "MTBLS2.mztab",
"PRIDE_Exp_Complete_Ac_1643.xml-mztab.txt",
"PRIDE_Exp_Complete_Ac_16649.xml-mztab.txt",
"SILAC_CQI.mzTab", "SILAC_SQ.mzTab",
"iTRAQ_CQI.mzTab", "iTRAQ_SQI.mzTab",
"labelfree_CQI.mzTab", "labelfree_SQI.mzTab",
"lipidomics-HFD-LD-study-PL-DG-SM.mzTab",
"lipidomics-HFD-LD-study-TG.mzTab")
baseUrl <- "https://raw.githubusercontent.com/HUPO-PSI/mzTab/master/examples/1_0-Proteomics-Release/"
## a list of mzTab objects
mzt <- sapply(file.path(baseUrl, fls), MzTab)
stopifnot(length(mzt) == length(fls))
mzt[[4]]
dim(proteins(mzt[[4]]))
dim(psms(mzt[[4]]))
prots4 <- proteins(mzt[[4]])
class(prots4)
prots4[1:5, 1:4]
OnDiskMSnExp_class()
The OnDiskMSnExp
Class for MS Data And Meta-Data
Description
Like the MSnExp
class, the OnDiskMSnExp
class
encapsulates data and meta-data for mass spectrometry
experiments, but does, in contrast to the former, not keep the
spectrum data in memory, but fetches the M/Z and intensity values on
demand from the raw files. This results in some instances to a
reduced performance, has however the advantage of a much smaller
memory footprint.
Details
The OnDiskMSnExp
object stores many spectrum related
information into the featureData
, thus, some calls, like
rtime
to retrieve the retention time of the individual scans
does not require the raw data to be read. Only M/Z and intensity
values are loaded on-the-fly from the original files. Extraction of
values for individual scans is, for mzML files, very fast. Extraction
of the full data (all spectra) are performed in a per-file parallel
processing strategy.
Data manipulations related to spectras' M/Z or intensity values
(e.g. removePeaks
or clean
) are (for
OnDiskMSnExp
objects) not applied immediately, but are stored
for later execution into the spectraProcessingQueue
. The
manipulations are performed on-the-fly upon data retrieval.
Other manipulations, like removal of individual spectra are applied
directly, since the corresponding data is available in the object's
featureData
slot.
Seealso
pSet ,
MSnExp ,
readMSData
Author
Johannes Rainer johannes.rainer@eurac.edu
Examples
## Get some example mzML files
library(msdata)
mzfiles <- c(system.file("microtofq/MM14.mzML", package="msdata"),
system.file("microtofq/MM8.mzML", package="msdata"))
## Read the data as an OnDiskMSnExp
odmse <- readMSData(mzfiles, msLevel=1, centroided = TRUE)
## Get the length of data, i.e. the total number of spectra.
length(odmse)
## Get the MS level
head(msLevel(odmse))
## Get the featureData, use fData to return as a data.frame
head(fData(odmse))
## Get to know from which file the spectra are
head(fromFile(odmse))
## And the file names:
fileNames(odmse)
## Scan index and acquisitionNum
head(scanIndex(odmse))
head(acquisitionNum(odmse))
## Extract the spectra; the data is retrieved from the raw files.
head(spectra(odmse))
## Extracting individual spectra or a subset is much faster.
spectra(odmse[1:50])
## Alternatively, we could also subset the whole object by spectra and/or samples:
subs <- odmse[rtime(odmse) >= 2 & rtime(odmse) <= 20, ]
fileNames(subs)
rtime(subs)
## Extract intensities and M/Z values per spectrum; the methods return a list,
## each element representing the values for one spectrum.
ints <- intensity(odmse)
mzs <- mz(odmse)
## Return a data.frame with mz and intensity pairs for each spectrum from the
## object
res <- spectrapply(odmse, FUN = as, Class = "data.frame")
## Calling removePeaks, i.e. setting intensity values below a certain threshold to 0.
## Unlike the name suggests, this is not actually removing peaks. Such peaks with a 0
## intensity are then removed by the "clean" step.
## Also, the manipulations are not applied directly, but put into the "lazy"
## processing queue.
odmse <- removePeaks(odmse, t=10000)
odmse <- clean(odmse)
## The processing steps are only applied when actual raw data is extracted.
spectra(odmse[1:2])
## Get the polarity of the spectra.
head(polarity(odmse))
## Get the retention time of all spectra
head(rtime(odmse))
## Get the intensities after removePeaks and clean
intsAfter <- intensity(odmse)
head(lengths(ints))
head(lengths(intsAfter))
## The same for the M/Z values
mzsAfter <- intensity(odmse)
head(lengths(mzs))
head(lengths(mzsAfter))
## Centroided or profile mode
f <- msdata::proteomics(full.names = TRUE,
pattern = "MS3TMT11.mzML")
odmse <- readMSData(f, mode = "onDisk")
validObject(odmse)
odmse[[1]]
table(isCentroidedFromFile(odmse), msLevel(odmse))
## centroided status could be set manually
centroided(odmse, msLevel = 1) <- FALSE
centroided(odmse, msLevel = 2) <- TRUE
centroided(odmse, msLevel = 3) <- TRUE
## or when reading the data
odmse2 <- readMSData(f, centroided = c(FALSE, TRUE, TRUE),
mode = "onDisk")
table(centroided(odmse), msLevel(odmse))
## Filtering precursor scans
head(acquisitionNum(odmse))
head(msLevel(odmse))
## Extract all spectra stemming from the first MS1 spectrum
(from1 <- filterPrecursorScan(odmse, 21945))
table(msLevel(from1))
## Extract the second sepctrum's parent (MS1) and children (MS3)
## spectra
(from2 <- filterPrecursorScan(odmse, 21946))
table(msLevel(from2))
ProcessingStep_class()
Simple processing step class
Description
The ProcessingStep
class is a simple object to encapsule all
relevant information of a data analysis processing step, i.e. the
function name and all arguments.
Details
Objects of this class are mainly used to record all possible processing steps of an OnDiskMSnExp object for later lazy execution .
Seealso
OnDiskMSnExp
Author
Johannes Rainer johannes.rainer@eurac.edu
Examples
## Define a simple ProcessingStep
procS <- ProcessingStep("sum", list(c(1, 3, NA, 5), na.rm= TRUE))
executeProcessingStep(procS)
ReporterIons_class()
The "ReporterIons" Class
Description
The ReporterIons
class allows to define a set of isobaric
reporter ions that are used for quantification in MSMS
mode, e.g. iTRAQ (isobaric tag for relative and absolute quantitation)
or TMT (tandem mass tags).
ReporterIons
instances can them be used when quantifying
"
data of plotting the reporters peaks
based on in "
ojects.
Some reporter ions are provided with MSnbase
an can be loaded
with the data
function. These reporter ions data sets
are:
list("
", " ", list(list(list("iTRAQ4"), ":"), list(list("ReporterIon"), " object for the iTRAQ
", " 4-plex set. Load with ", list("data(iTRAQ4)"), ". ")), "
", " ", list(list(list("iTRAQ5"), ":"), list(list("ReporterIon"), " object for the iTRAQ
", " 4-plex set plus the isobaric tag. Load with ", list("data(iTRAQ5)"), ". ")), "
", " ", list(list(list("TMT6"), ":"), list(list("ReporterIon"), " object for the TMT
", " 6-plex set. Load with ", list("data(TMT6)"), ". ")),
"
", " ", list(list(list("TMT7"), ":"), list(list("ReporterIon"), " object for the TMT ", " 6-plex set plus the isobaric tag. Load with ", list("data(TMT6)"), ". ")), " ", " ")
Seealso
TMT6
or iTRAQ4
for readily available examples.
Author
Laurent Gatto lg390@cam.ac.uk
References
Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, Khainovski N, Pillai S, Dey S, Daniels S, Purkayastha S, Juhasz P, Martin S, Bartlet-Jones M, He F, Jacobson A, Pappin DJ. "Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents." Mol Cell Proteomics , 2004 Dec;3(12):1154-69. Epub 2004 Sep 22. PubMed PMID: 15385600.
Thompson A, Sch" a fer J, Kuhn K, Kienle S, Schwarz J, Schmidt G, Neumann T, Johnstone R, Mohammed AK, Hamon C. "Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS." Anal Chem. 2003 Apr 15;75(8):1895-904. Erratum in: Anal Chem. 2006 Jun 15;78(12):4235. Mohammed, A Karim A [added] and Anal Chem. 2003 Sep 15;75(18):4942. Johnstone, R [added]. PubMed PMID: 12713048.
Examples
## Code used for the iTRAQ4 set
ri <- new("ReporterIons",
description="4-plex iTRAQ",
name="iTRAQ4",
reporterNames=c("iTRAQ4.114","iTRAQ4.115",
"iTRAQ4.116","iTRAQ4.117"),
mz=c(114.1,115.1,116.1,117.1),
col=c("red","green","blue","yellow"),
width=0.05)
ri
reporterNames(ri)
ri[1:2]
Spectra()
List of Spectrum objects along with annotations
Description
Spectra
objects allow to collect one or more Spectrum object(s)
( Spectrum1 or Spectrum2 ) in a list
-like structure with
the possibility to add arbitrary annotations to each individual
Spectrum
object. These can be accessed/set with the mcols()
method.
Spectra
objects can be created with the Spectra
function.
Functions to access the individual spectra's attributes are available (listed below).
writeMgfData
exports a Spectra
object to a file in MGF format. All
metadata columns present in mcols
are exported as additional fields with
the capitalized column names used as field names (see examples below).
Usage
Spectra(..., elementMetadata = NULL)
list(list("mz"), list("Spectra"))(object)
list(list("intensity"), list("Spectra"))(object)
list(list("rtime"), list("Spectra"))(object)
list(list("precursorMz"), list("Spectra"))(object)
list(list("precursorCharge"), list("Spectra"))(object)
list(list("precScanNum"), list("Spectra"))(object)
list(list("precursorIntensity"), list("Spectra"))(object)
list(list("acquisitionNum"), list("Spectra"))(object)
list(list("scanIndex"), list("Spectra"))(object)
list(list("peaksCount"), list("Spectra,ANY"))(object)
list(list("msLevel"), list("Spectra"))(object)
list(list("tic"), list("Spectra"))(object)
list(list("ionCount"), list("Spectra"))(object)
list(list("collisionEnergy"), list("Spectra"))(object)
list(list("fromFile"), list("Spectra"))(object)
list(list("polarity"), list("Spectra"))(object)
list(list("smoothed"), list("Spectra"))(object)
list(list("isEmpty"), list("Spectra"))(x)
list(list("centroided"), list("Spectra"))(object)
list(list("isCentroided"), list("Spectra"))(object)
list(list("writeMgfData"), list("Spectra"))(object, con = "spectra.mgf",
COM = NULL, TITLE = NULL)
list(list("clean"), list("Spectra"))(object, all = FALSE, msLevel. = msLevel.,
...)
list(list("removePeaks"), list("Spectra"))(object, t, msLevel., ...)
list(list("filterMz"), list("Spectra"))(object, mz, msLevel., ...)
list(list("pickPeaks"), list("Spectra"))(object, halfWindowSize = 3L,
method = c("MAD", "SuperSmoother"), SNR = 0L, refineMz = c("none",
"kNeighbors", "kNeighbours", "descendPeak"), ...)
list(list("smooth"), list("Spectra"))(x, method = c("SavitzkyGolay",
"MovingAverage"), halfWindowSize = 2L, ...)
list(list("filterMsLevel"), list("Spectra"))(object, msLevel.)
Arguments
Argument | Description |
---|---|
... | For Spectra : Spectrum object(s) or a list of Spectrum objects. For all other methods optional arguments passed along. |
elementMetadata | For Spectra : DataFrame with optional information that should be added as metadata information ( mcols ) to the object. The number of rows has to match the number of Spectrum objects, each row is expected to represent additional metadata information for one spectrum. |
object | For all functions: a Spectra object. |
x | For all functions: a Spectra object. |
con | For writeMgfData : character(1) defining the file name of the MGF file. |
COM | For writeMgfData : optional character(1) providing a comment to be added to the file. |
TITLE | For writeMgfData : optional character(1) defining the title for the MGF file. |
all | For clean : if FALSE original 0-intensity values are retained around peaks. |
msLevel. | For clean , removePeaks , filterMz : optionally specify the MS level of the spectra on which the operation should be performed. For filterMsLevels : MS level(s) to which the Spectra should be reduced. |
t | For removePeaks : numeric(1) specifying the threshold below which intensities are set to 0. |
mz | For filterMz : numeric(2) defining the lower and upper m/z for the filter. See filterMz() for details. |
halfWindowSize | For pickPeaks and smooth : see pickPeaks() and smooth() for details. |
method | For pickPeaks and smooth : see pickPeaks() and smooth() for details. |
SNR | For pickPeaks : see pickPeaks() for details. |
refineMz | For pickPeaks : see pickPeaks() for details. |
Details
Spectra
inherits all methods from the SimpleList class of the
S4Vectors
package. This includes lapply
and other data manipulation
and subsetting operations.
Author
Johannes Rainer
Examples
## Create from Spectrum objects
sp1 <- new("Spectrum1", mz = c(1, 2, 4), intensity = c(4, 5, 2))
sp2 <- new("Spectrum2", mz = c(1, 2, 3, 4), intensity = c(5, 3, 2, 5),
precursorMz = 2)
spl <- Spectra(sp1, sp2)
spl
spl[[1]]
## Add also metadata columns
mcols(spl)$id <- c("a", "b")
mcols(spl)
## Create a Spectra with metadata
spl <- Spectra(sp1, sp2, elementMetadata = DataFrame(id = c("a", "b")))
mcols(spl)
mcols(spl)$id
## Extract the mz values for the individual spectra
mz(spl)
## Extract the intensity values for the individual spectra
intensity(spl)
## Extract the retention time values for the individual spectra
rtime(spl)
## Extract the precursor m/z of each spectrum.
precursorMz(spl)
## Extract the precursor charge of each spectrum.
precursorCharge(spl)
## Extract the precursor scan number for each spectrum.
precScanNum(spl)
## Extract the precursor intensity of each spectrum.
precursorIntensity(spl)
## Extract the acquisition number of each spectrum.
acquisitionNum(spl)
## Extract the scan index of each spectrum.
scanIndex(spl)
## Get the number of peaks per spectrum.
peaksCount(spl)
## Get the MS level of each spectrum.
msLevel(spl)
## Get the total ion current for each spectrum.
tic(spl)
## Get the total ion current for each spectrum.
ionCount(spl)
## Extract the collision energy for each spectrum.
collisionEnergy(spl)
## Extract the file index for each spectrum.
fromFile(spl)
## Get the polarity for each spectrum.
polarity(spl)
## Whether spectra are smoothed (i.e. processed with the `smooth`
## function).
smoothed(spl)
## Are spectra empty (i.e. contain no peak data)?
isEmpty(spl)
## Do the spectra contain centroided data?
centroided(spl)
## Do the spectra contain centroided data? Whether spectra are centroided
## is estimated from the peak data.
isCentroided(spl)
## Export the spectrum list to a MGF file. Values in metadata columns are
## exported as additional field for each spectrum.
tmpf <- tempfile()
writeMgfData(spl, tmpf)
## Evaluate the written output. The ID of each spectrum (defined in the
## "id" metadata column) is exported as field "ID".
readLines(tmpf)
## Set mcols to NULL to avoid export of additional data fields.
mcols(spl) <- NULL
file.remove(tmpf)
writeMgfData(spl, tmpf)
readLines(tmpf)
## Filter the object by MS level
filterMsLevel(spl, msLevel. = 1)
Spectrum1_class()
The "Spectrum1" Class for MS1 Spectra
Description
Spectrum1
extends the "
class and
introduces an MS1 specific attribute in addition to the slots in
"
. Spectrum1
instances are not
created directly but are contained in the assayData
slot of an
"
.
Seealso
Virtual super-class "
,
"
for MS2 spectra and
"
for a full experiment container.
Author
Laurent Gatto lg390@cam.ac.uk
Spectrum2_class()
The "Spectrum2" Class for MSn Spectra
Description
Spectrum2
extends the "
class and
introduces several MS2 specific attributes in addition to the slots in
"
. Since version 1.99.2, this class is
used for any MS levels > 1. Spectrum2
are not created directly
but are contained in the assayData
slot of an
"
.
In version 1.19.12, the polarity
slot had been added to the
"
class (previously in
"
). Hence, "Spectrum2"
objects
created prior to this change will not be valid anymore, since they
will miss the polarity
slots. Object can be appropriately
updated using the updateObject
method.
Seealso
Virtual super-class "
,
"
for MS1 spectra and
"
for a full experiment container.
Author
Laurent Gatto lg390@cam.ac.uk
Spectrum_class()
The "Spectrum" Class
Description
Virtual container for spectrum data common to all different types of
spectra. A Spectrum
object can not be directly instanciated. Use
"
and "
instead.
In version 1.19.12, the polarity
slot has been added to this
class (previously in "
).
Seealso
Instaciable sub-classes "
and
"
for MS1 and MS2 spectra.
Note
This is a virtual class and can not be instanciated directly.
Author
Laurent Gatto lg390@cam.ac.uk
TMT6()
TMT 6/10-plex sets
Description
This instance of class "
corresponds
to the TMT 6-plex set, i.e the 126, 127, 128, 129, 130 and 131
isobaric tags. In the TMT7
data set, an unfragmented tag, i.e
reporter and attached isobaric tag, is also included at MZ 229. A
second TMT6b
has slightly different values.
The TMT10
instance corresponds to the 10-plex version. There
are spectific HCD ( TMT10HCD
, same as TMT10
) and ETD
( TMT10ETD
) sets.
These objects are used to plot the reporter ions of interest in an
MSMS spectra (see "
) as well as for
quantification (see quantify
).
Usage
TMT6
TMT6b
TMT7
TMT7b
TMT10
TMT10ETD
TMT10HCD
TMT11
TMT11HCD
Seealso
iTRAQ4
.
References
Thompson A, Sch" a fer J, Kuhn K, Kienle S, Schwarz J, Schmidt G, Neumann T, Johnstone R, Mohammed AK, Hamon C. "Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS." Anal Chem. 2003 Apr 15;75(8):1895-904. Erratum in: Anal Chem. 2006 Jun 15;78(12):4235. Mohammed, A Karim A [added] and Anal Chem. 2003 Sep 15;75(18):4942. Johnstone, R [added]. PubMed PMID: 12713048.
Examples
TMT6
TMT6[1:2]
TMT10
newReporter <- new("ReporterIons",
description="an example",
name="my reporter ions",
reporterNames=c("myrep1","myrep2"),
mz=c(121,122),
col=c("red","blue"),
width=0.05)
newReporter
addIdentificationData_methods()
Adds Identification Data
Description
These methods add identification data to a raw MS experiment (an
"
object) or to quantitative data (an
"
object). The identification data needs
to be available as a mzIdentML
file (and passed as filenames,
or directly as identification object) or, alternatively, can be passed
as an arbitrary data.frame
. See details in the Methods
section.
Details
The featureData
slots in a "
or a
"
instance provides only one row per MS2
spectrum but the identification is not always bijective. Prior to
addition, the identification data is filtered as documented in the
filterIdentificationDataFrame
function: (1) only PSMs
matching the regular (non-decoy) database are retained; (2) PSMs of
rank greater than 1 are discarded; and (3) only proteotypic peptides
are kept.
If after filtering, more then one PSM per spectrum are still present,
these are combined (reduced, see
reduce,data.frame-method
) into a single row and
separated by a semi-colon. This has as side-effect that feature
variables that are being reduced are converted to characters. See the
reduce
manual page for examples.
See also the section about identification data in the MSnbase-demo vignette for details and additional examples.
After addition of the identification data, new feature variables are
created. The column nprot
contains the number of members in the
protein group; the columns accession
and description
contain a semicolon separated list of all matches. The columns
npsm.prot
and npep.prot
represent the number of PSMs and
peptides that were matched to a particular protein group. The column
npsm.pep
indicates how many PSMs were attributed to a peptide
(as defined by its sequence pepseq
). All these values are
re-calculated after filtering and reduction.
Seealso
filterIdentificationDataFrame
for the function that
filters identification data, readMzIdData
to read the
identification data as a unfiltered data.frame
and
reduce,data.frame-method
to reduce it to a
data.frame
that contains only unique PSMs per row.
Author
Sebastian Gibb mail@sebastiangibb.de and Laurent Gatto
Examples
## find path to a mzXML file
quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "mzXML$")
## find path to a mzIdentML file
identFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "dummyiTRAQ.mzid")
## create basic MSnExp
msexp <- readMSData(quantFile)
## add identification information
msexp <- addIdentificationData(msexp, identFile)
## access featureData
fData(msexp)
idSummary(msexp)
aggvar()
Identify aggregation outliers
Description
This function evaluates the variability within all protein group
of an MSnSet
. If a protein group is composed only of a
single feature, NA
is returned.
Usage
aggvar(object, groupBy, fun)
Arguments
Argument | Description |
---|---|
object | An object of class MSnSet . |
groupBy | A character containing the protein grouping feature variable name. |
fun | A function the summarise the distance between features within protein groups, typically max or mean . median . |
Details
This function can be used to identify protein groups with
incoherent feature (petides or PSMs) expression patterns. Using
max
as a function, one can identify protein groups with
single extreme outliers, such as, for example, a mis-identified
peptide that was erroneously assigned to that protein group. Using
mean
identifies more systematic inconsistencies where, for
example, the subsets of peptide (or PSM) feautres correspond to
proteins with different expression patterns.
Value
A matrix
providing the number of features per
protein group ( nb_feats
column) and the aggregation
summarising distance ( agg_dist
column).
Seealso
combineFeatures
to combine PSMs
quantitation into peptides and/or into proteins.
Author
Laurent Gatto
Examples
library("pRolocdata")
data(hyperLOPIT2015ms3r1psm)
groupBy <- "Protein.Group.Accessions"
res1 <- aggvar(hyperLOPIT2015ms3r1psm, groupBy, fun = max)
res2 <- aggvar(hyperLOPIT2015ms3r1psm, groupBy, fun = mean)
par(mfrow = c(1, 3))
plot(res1, log = "y", main = "Single outliers (max)")
plot(res2, log = "y", main = "Overall inconsistency (mean)")
plot(res1[, "agg_dist"], res2[, "agg_dist"],
xlab = "max", ylab = "mean")
averageMSnSet()
Generate an average MSnSet
Description
Given a list of MSnSet
instances, typically representing
replicated experiments, the function returns an average
MSnSet
.
Usage
averageMSnSet(x, avg = function(x) mean(x, na.rm = TRUE), disp = npcv)
Arguments
Argument | Description |
---|---|
x | A list of valid MSnSet instances to be averaged. |
avg | The averaging function. Default is the mean after removing missing values, as computed by function(x) . |
disp | The disperion function. Default is an non-parametric coefficient of variation that replaces the standard deviation by the median absolute deviation as computed by mad(x)/abs(mean(x)) . See npcv for details. Note that the mad of a single value is 0 (as opposed to NA for the standard deviation, see example below). |
Details
This function is aimed at facilitating the visualisation of replicated experiments and should not be used as a replacement for a statistical analysis.
The samples of the instances to be averaged must be identical but
can be in a different order (they will be reordered by
default). The features names of the result will correspond to the
union of the feature names of the input MSnSet
instances. Each average value will be computed by the avg
function and the dispersion of the replicated measurements will be
estimated by the disp
function. These dispersions will be
stored as a data.frame
in the feature metadata that can be
accessed with fData(.)$disp
. Similarly, the number of
missing values that were present when average (and dispersion)
were computed are available in fData(.)$disp
.
Currently, the feature metadata of the returned object corresponds
the the feature metadata of the first object in the list
(augmented with the missing value and dispersion values); the
metadata of the features that were missing in this first input are
missing (i.e. populated with NA
s). This may change in the
future.
Value
A new average MSnSet
.
Seealso
compfnames
to compare MSnSet feature names.
Author
Laurent Gatto
Examples
library("pRolocdata")
## 3 replicates from Tan et al. 2009
data(tan2009r1)
data(tan2009r2)
data(tan2009r3)
x <- MSnSetList(list(tan2009r1, tan2009r2, tan2009r3))
avg <- averageMSnSet(x)
dim(avg)
head(exprs(avg))
head(fData(avg)$nNA)
head(fData(avg)$disp)
## using the standard deviation as measure of dispersion
avg2 <-averageMSnSet(x, disp = sd)
head(fData(avg2)$disp)
## keep only complete observations, i.e proteins
## that had 0 missing values for all samples
sel <- apply(fData(avg)$nNA, 1 , function(x) all(x == 0))
avg <- avg[sel, ]
disp <- rowMax(fData(avg)$disp)
library("pRoloc")
setStockcol(paste0(getStockcol(), "AA"))
plot2D(avg, cex = 7.7 * disp)
title(main = paste("Dispersion: non-parametric CV",
paste(round(range(disp), 3), collapse = " - ")))
bin_methods()
Bin 'MSnExp' or 'Spectrum' instances
Description
This method aggregates individual spectra ( Spectrum
instances)
or whole experiments ( MSnExp
instances) into discrete bins. All
intensity values which belong to the same bin are summed together.
Seealso
clean
, pickPeaks
, smooth
,
removePeaks
and trimMz
for other spectra processing methods.
Author
Sebastian Gibb mail@sebastiangibb.de
Examples
s <- new("Spectrum2", mz=1:10, intensity=1:10)
intensity(s)
intensity(bin(s, binSize=2))
data(itraqdata)
sum(peaksCount(itraqdata))
itraqdata2 <- bin(itraqdata, binSize=2)
sum(peaksCount(itraqdata2))
processingData(itraqdata2)
calculateFragments_methods()
Calculate ions produced by fragmentation.
Description
These method calculates a-, b-, c-, x-, y- and z-ions produced by fragmentation.
Arguments
Argument | Description |
---|---|
sequence | character , peptide sequence. |
object | Object of class " or "missing" . |
tolerance | numeric tolerance between the theoretical and measured MZ values (only available if object is not missing ). |
method | method used for for duplicated matches. Choose "highest" or "closest" to select the peak with the highest intensity respectively the closest MZ in the tolerance range. If "all" is given all possible matches in the tolerance range are reported (only available if object is not missing ). |
type | character vector of target ions; possible values: c("a", "b", "c", "x", "y", "z") ; default: type=c("b", "y") . |
z | numeric desired charge state; default z=1 . |
modifications | named numeric vector of used modifications. The name must correspond to the one-letter-code of the modified amino acid and the numeric value must represent the mass that should be added to the original amino accid mass, default: Carbamidomethyl modifications=c(C=57.02146) . Use Nterm or Cterm as names for modifications that should be added to the amino respectively carboxyl-terminus. |
neutralLoss | list , it has to have two named elments, namely water and ammonia that contain a character vector which type of neutral loss should be calculated. Currently neutral loss on the C terminal "Cterm" , at the amino acids c("D", "E", "S", "T") for "water" (shown with an _ ) and c("K", "N", "Q", "R") for "ammonia" (shown with an * ) are supported. list() There is a helper function defaultNeutralLoss that returns the correct list. It has two arguments disableWaterLoss and disableAmmoniaLoss to remove single neutral loss options. See the example section for use cases. |
verbose | logical if TRUE (default) the used modifications are printed. |
Author
Sebastian Gibb mail@sebastiangibb.de
Examples
## find path to a mzXML file
file <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "mzXML$")
## create basic MSnExp
msexp <- readMSData(file, centroided = FALSE)
## centroid them
msexp <- pickPeaks(msexp)
## calculate fragments for ACE with default modification
calculateFragments("ACE", modifications=c(C=57.02146))
## calculate fragments for ACE with an addition N-terminal modification
calculateFragments("ACE", modifications=c(C=57.02146, Nterm=229.1629))
## calculate fragments for ACE without any modifications
calculateFragments("ACE", modifications=NULL)
calculateFragments("VESITARHGEVLQLRPK",
type=c("a", "b", "c", "x", "y", "z"),
z=1:2)
calculateFragments("VESITARHGEVLQLRPK", msexp[[1]])
## neutral loss
defaultNeutralLoss()
## disable water loss on the C terminal
defaultNeutralLoss(disableWaterLoss="Cterm")
## real example
calculateFragments("PQR")
calculateFragments("PQR",
neutralLoss=defaultNeutralLoss(disableWaterLoss="Cterm"))
calculateFragments("PQR",
neutralLoss=defaultNeutralLoss(disableAmmoniaLoss="Q"))
## disable neutral loss completely
calculateFragments("PQR", neutralLoss=NULL)
chromatogram_MSnExp_method()
Extract chromatogram object(s)
Description
The chromatogram
method extracts chromatogram(s) from an
MSnExp or OnDiskMSnExp object.
Depending on the provided parameters this can be a total ion chromatogram
(TIC), a base peak chromatogram (BPC) or an extracted ion chromatogram
(XIC) extracted from each sample/file.
Usage
list(list("chromatogram"), list("MSnExp"))(object, rt, mz, aggregationFun = "sum",
missing = NA_real_, msLevel = 1L, BPPARAM = bpparam())
Arguments
Argument | Description |
---|---|
object | For chromatogram : a MSnExp or OnDiskMSnExp object from which the chromatogram should be extracted. |
rt | A numeric(2) or two-column matrix defining the lower and upper boundary for the retention time range/window(s) for the chromatogram(s). If a matrix is provided, a chromatogram is extracted for each row. If not specified, a chromatogram representing the full retention time range is extracted. See examples below for details. |
mz | A numeric(2) or two-column matrix defining the mass-to-charge (mz) range(s) for the chromatogram(s). For each spectrum/retention time, all intensity values within this mz range are aggregated to result in the intensity value for the spectrum/retention time. If not specified, the full mz range is considered. See examples below for details. |
aggregationFun | character defining the function to be used for intensity value aggregation along the mz dimension. Allowed values are "sum" (TIC), "max" (BPC), "min" and "mean" . |
missing | numeric(1) allowing to specify the intensity value for if for a given retention time (spectrum) no signal was measured within the mz range. Defaults to NA_real_ . |
msLevel | integer specifying the MS level from which the chromatogram should be extracted. Defaults to msLevel = 1L . |
BPPARAM | Parallelisation backend to be used, which will depend on the architecture. Default is BiocParallel::bpparam() . |
Details
Arguments rt
and mz
allow to specify the MS
data slice from which the chromatogram should be extracted.
The parameter aggregationSum
allows to specify the function to be
used to aggregate the intensities across the mz range for the same
retention time. Setting aggregationFun = "sum"
would e.g. allow
to calculate the total ion chromatogram (TIC),
aggregationFun = "max"
the base peak chromatogram (BPC).
The length of the extracted Chromatogram
object,
i.e. the number of available data points, corresponds to the number of
scans/spectra measured in the specified retention time range. If in a
specific scan (for a give retention time) no signal was measured in the
specified mz range, a NA_real_
is reported as intensity for the
retention time (see Notes for more information). This can be changed
using the missing
parameter.
By default or if mz
and/or rt
are numeric vectors, the
function extracts one Chromatogram
object for each file
in the MSnExp or OnDiskMSnExp
object. Providing a numeric matrix with argument mz
or rt
enables to extract multiple chromatograms per file, one for each row in
the matrix. If the number of columns of mz
or rt
are not
equal to 2, range
is called on each row of the matrix.
Value
chromatogram
returns a Chromatograms
object with
the number of columns corresponding to the number of files in
object
and number of rows the number of specified ranges (i.e.
number of rows of matrices provided with arguments mz
and/or
rt
). The featureData
of the returned object contains columns
"mzmin"
and "mzmax"
with the values from input argument
mz
(if used) and "rtmin"
and "rtmax"
if the input
argument rt
was used.
Seealso
Chromatogram
and Chromatograms
for the
classes that represent single and multiple chromatograms.
Author
Johannes Rainer
Examples
## Read a test data file.
library(msdata)
f <- c(system.file("microtofq/MM14.mzML", package = "msdata"),
system.file("microtofq/MM8.mzML", package = "msdata"))
## Read the data as an MSnExp
msd <- readMSData(f, msLevel = 1)
## Extract the total ion chromatogram for each file:
tic <- chromatogram(msd)
tic
## Extract the TIC for the second file:
tic[1, 2]
## Plot the TIC for the first file
plot(rtime(tic[1, 1]), intensity(tic[1, 1]), type = "l",
xlab = "rtime", ylab = "intensity", main = "TIC")
## Extract chromatograms for a MS data slices defined by retention time
## and mz ranges.
rtr <- rbind(c(10, 60), c(280, 300))
mzr <- rbind(c(140, 160), c(300, 320))
chrs <- chromatogram(msd, rt = rtr, mz = mzr)
## Each row of the returned Chromatograms object corresponds to one mz-rt
## range. The Chromatogram for the first range in the first file is empty,
## because the retention time range is outside of the file's rt range:
chrs[1, 1]
## The mz and/or rt ranges used are provided as featureData of the object
fData(chrs)
## The mz method can be used to extract the m/z ranges directly
mz(chrs)
## Also the Chromatogram for the second range in the second file is empty
chrs[2, 2]
## Get the extracted chromatogram for the first range in the second file
chr <- chrs[1, 2]
chr
plot(rtime(chr), intensity(chr), xlab = "rtime", ylab = "intensity")
clean_methods()
Clean 'MSnExp', 'Spectrum' or 'Chromatogram' instances
Description
This method cleans out individual spectra ( Spectrum
instances),
chromatograms ( Chromatogram instances)
or whole experiments ( MSnExp
instances) of 0-intensity
peaks. Unless all
is set to FALSE
, original 0-intensity
values are retained only around peaks. If more than two 0's were
separating two peaks, only the first and last ones, those directly
adjacent to the peak ranges are kept. If two peaks are separated by
only one 0-intensity value, it is retained. An illustrative example is
shown below.
Seealso
removePeaks
and trimMz
for other spectra
processing methods.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
int <- c(1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0)
sp1 <- new("Spectrum2",
intensity=int,
mz=1:length(int))
sp2 <- clean(sp1) ## default is all=FALSE
intensity(sp1)
intensity(sp2)
intensity(clean(sp1, all = TRUE))
mz(sp1)
mz(sp2)
mz(clean(sp1, all = TRUE))
data(itraqdata)
itraqdata2 <- clean(itraqdata)
sum(peaksCount(itraqdata))
sum(peaksCount(itraqdata2))
processingData(itraqdata2)
## Create a simple Chromatogram object
chr <- Chromatogram(rtime = 1:12,
intensity = c(0, 0, 20, 0, 0, 0, 123, 124343, 3432, 0, 0, 0))
## Remove 0-intensity values keeping those adjacent to peaks
chr <- clean(chr)
intensity(chr)
## Remove all 0-intensity values
chr <- clean(chr, all = TRUE)
intensity(chr)
## Clean a Chromatogram with NAs.
chr <- Chromatogram(rtime = 1:12,
intensity = c(0, 0, 20, NA, NA, 0, 123, 124343, 3432, 0, 0, 0))
chr <- clean(chr, all = FALSE, na.rm = TRUE)
intensity(chr)
combineFeatures()
Combines features in an MSnSet
object
Description
This function combines the features in an
"
instance applying a summarisation
function (see fun
argument) to sets of features as defined by a
factor (see fcol
argument). Note that the feature names are
automatically updated based on the groupBy
parameter.
The coefficient of variations are automatically computed and collated
to the featureData slot. See cv
and cv.norm
arguments
for details.
If NA values are present, a message will be shown. Details on how missing value impact on the data aggregation are provided below.
Usage
combineFeatures(object, groupBy, method = c("mean", "median",
"weighted.mean", "sum", "medpolish", "robust", "iPQF", "NTR"), fcol,
redundancy.handler = c("unique", "multiple"), cv = TRUE, cv.norm =
"sum", verbose = isMSnbaseVerbose(), fun, ...)
Arguments
Argument | Description |
---|---|
object | An instance of class " whose features will be summerised. |
groupBy | A factor , character , numeric or a list of the above defining how to summerise the features. The list must be of length nrow(object) . Each element of the list is a vector describing the feature mapping. If the list can be named, its names must match fetureNames(object) . See redundancy.handler for details about the latter. |
fun | Deprecated; use method instead. |
method | The summerising function. Currently, mean, median, weighted mean, sum, median polish, robust summarisation (using MASS::rlm ), iPQF (see iPQF for details) and NTR (see NTR for details) are implemented, but user-defined functions can also be supplied. Note that the robust menthods assumes that the data are already log-transformed. |
fcol | Feature meta-data label (fData column name) defining how to summerise the features. It must be present in fvarLabels(object) and, if present, will be used to defined groupBy as fData(object)[, fcol] . Note that fcol is ignored if groupBy is present. |
redundancy.handler | If groupBy is a list , one of "unique" (default) or "multiple" (ignored otherwise) defining how to handle peptides that can be associated to multiple higher-level features (proteins) upon combination. Using "unique" will only consider uniquely matching features (features matching multiple proteins will be discarded). "multiple" will allow matching to multiple proteins and each feature will be repeatedly tallied for each possible matching protein. |
cv | A logical defining if feature coefficients of variation should be computed and stored as feature meta-data. Default is TRUE . |
cv.norm | A character defining how to normalise the feature intensitites prior to CV calculation. Default is sum . Use none to keep intensities as is. See featureCV for more details. |
verbose | A logical indicating whether verbose output is to be printed out. |
list() | Additional arguments for the fun function. |
Details
Missing values have different effect based on the aggregation method employed, as detailed below. See also examples below.
When using either
"sum"
,"mean"
,"weighted.mean"
or"median"
, any missing value will be propagated at the higher level. Ifna.rm = TRUE
is used, then the missing value will be ignored.Missing values will result in an error when using
"medpolish"
, unlessna.rm = TRUE
is used.When using robust summarisation (
"robust"
), individual missing values are excluded prior to fitting the linear model by robust regression. To remove all values in the feature containing the missing values, usefilterNA
.The
"iPQF"
method will fail with an error if missing value are present, which will have to be handled explicitly. See below.
More generally, missing values often need dedicated handling such as
filtering (see filterNA
) or imputation (see
impute
).
Value
A new "
instance is returned with
ncol
(i.e. number of samples) is unchanged, but nrow
(i.e. the number od features) is now equals to the number of levels in
groupBy
. The feature metadata ( featureData
slot) is
updated accordingly and only the first occurrence of a feature in the
original feature meta-data is kept.
Seealso
featureCV
to calculate coefficient of variation,
nFeatures
to document the number of features per group
in the feature data, and the aggvar
to explore
variability within protein groups.
iPQF
for iPQF summarisation.
NTR
for normalisation to reference summarisation.
Author
Laurent Gatto lg390@cam.ac.uk with contributions from Martina Fischer for iPQF and Ludger Goeminne, Adriaan Sticker and Lieven Clement for robust.
References
iPQF: a new peptide-to-protein summarization method using peptide spectra characteristics to improve protein quantification. Fischer M, Renard BY. Bioinformatics. 2016 Apr 1;32(7):1040-7. doi:10.1093/bioinformatics/btv675. Epub 2015 Nov
- PubMed PMID:26589272.
Examples
data(msnset)
msnset <- msnset[11:15, ]
exprs(msnset)
## arbitrary grouping into two groups
grp <- as.factor(c(1, 1, 2, 2, 2))
msnset.comb <- combineFeatures(msnset, grp, "sum")
dim(msnset.comb)
exprs(msnset.comb)
fvarLabels(msnset.comb)
## grouping with a list
grpl <- list(c("A", "B"), "A", "A", "C", c("C", "B"))
## optional naming
names(grpl) <- featureNames(msnset)
exprs(combineFeatures(msnset, grpl, method = "sum", redundancy.handler = "unique"))
exprs(combineFeatures(msnset, grpl, method = "sum", redundancy.handler = "multiple"))
## missing data
exprs(msnset)[4, 4] <-
exprs(msnset)[2, 2] <- NA
exprs(msnset)
## NAs propagate in the 115 and 117 channels
exprs(combineFeatures(msnset, grp, "sum"))
## NAs are removed before summing
exprs(combineFeatures(msnset, grp, "sum", na.rm = TRUE))
## using iPQF
data(msnset2)
anyNA(msnset2)
res <- combineFeatures(msnset2,
groupBy = fData(msnset2)$accession,
redundancy.handler = "unique",
method = "iPQF",
low.support.filter = FALSE,
ratio.calc = "sum",
method.combine = FALSE)
head(exprs(res))
## using robust summarisation
data(msnset) ## reset data
msnset <- log(msnset, 2) ## log2 transform
## Feature X46, in the ENO protein has one missig value
which(is.na(msnset), arr.ind = dim(msnset))
exprs(msnset["X46", ])
## Only the missing value in X46 and iTRAQ4.116 will be ignored
res <- combineFeatures(msnset,
fcol = "ProteinAccession",
method = "robust")
tail(exprs(res))
msnset2 <- filterNA(msnset) ## remove features with missing value(s)
res2 <- combineFeatures(msnset2,
fcol = "ProteinAccession",
method = "robust")
## Here, the values for ENO are different because the whole feature
## X46 that contained the missing value was removed prior to fitting.
tail(exprs(res2))
combineSpectra()
Combine Spectra
Description
combineSpectra
combines spectra in a MSnExp or Spectra
object applying the summarization function fun
to sets of spectra defined
by a factor ( fcol
parameter). The resulting combined spectrum for each set
contains metadata information (present in mcols
and all spectrum
information other than mz
and intensity
) from the first spectrum in each
set.
Usage
list(list("combineSpectra"), list("Spectra"))(object, fcol, method = meanMzInts,
fun, ...)
Arguments
Argument | Description |
---|---|
object | A MSnExp or Spectra |
fcol | For Spectra objects: mcols column name to be used to define the sets of spectra to be combined. If missing, all spectra are considered to be one set. |
method | function to be used to combine the spectra by fcol . Has to be a function that takes a list of spectra as input and returns a single Spectrum . See meanMzInts() for details. |
fun | Deprecated use method instead. |
... | additional arguments for fun . |
Value
A Spectra
or MSnExp
object with combined spectra. Metadata
( mcols
) and all spectrum attributes other than mz
and intensity
are taken from the first Spectrum
in each set.
Seealso
meanMzInts()
for a function to combine spectra.
Author
Johannes Rainer, Laurent Gatto
Examples
set.seed(123)
mzs <- seq(1, 20, 0.1)
ints1 <- abs(rnorm(length(mzs), 10))
ints1[11:20] <- c(15, 30, 90, 200, 500, 300, 100, 70, 40, 20) # add peak
ints2 <- abs(rnorm(length(mzs), 10))
ints2[11:20] <- c(15, 30, 60, 120, 300, 200, 90, 60, 30, 23)
ints3 <- abs(rnorm(length(mzs), 10))
ints3[11:20] <- c(13, 20, 50, 100, 200, 100, 80, 40, 30, 20)
## Create the spectra.
sp1 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01),
intensity = ints1, rt = 1)
sp2 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01),
intensity = ints2, rt = 2)
sp3 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.009),
intensity = ints3, rt = 3)
spctra <- Spectra(sp1, sp2, sp3,
elementMetadata = DataFrame(idx = 1:3, group = c("b", "a", "a")))
## Combine the spectra reporting the maximym signal
res <- combineSpectra(spctra, mzd = 0.05, intensityFun = max)
res
## All values other than m/z and intensity are kept from the first spectrum
rtime(res)
## Plot the individual and the merged spectrum
par(mfrow = c(2, 1), mar = c(4.3, 4, 1, 1))
plot(mz(sp1), intensity(sp1), xlim = range(mzs[5:25]), type = "h", col = "red")
points(mz(sp2), intensity(sp2), type = "h", col = "green")
points(mz(sp3), intensity(sp3), type = "h", col = "blue")
plot(mz(res[[1]]), intensity(res[[1]]), type = "h",
col = "black", xlim = range(mzs[5:25]))
## Combine spectra in two sets.
res <- combineSpectra(spctra, fcol = "group", mzd = 0.05)
res
rtime(res)
## Plot the individual and the merged spectra
par(mfrow = c(3, 1), mar = c(4.3, 4, 1, 1))
plot(mz(sp1), intensity(sp1), xlim = range(mzs[5:25]), type = "h", col = "red")
points(mz(sp2), intensity(sp2), type = "h", col = "green")
points(mz(sp3), intensity(sp3), type = "h", col = "blue")
plot(mz(res[[1]]), intensity(res[[1]]), xlim = range(mzs[5:25]), type = "h",
col = "black")
plot(mz(res[[2]]), intensity(res[[2]]), xlim = range(mzs[5:25]), type = "h",
col = "black")
combineSpectraMovingWindow()
Combine signal from consecutive spectra of LCMS experiments
Description
combineSpectraMovingWindow
combines signal from consecutive spectra within
a file. The resulting MSnExp
has the same total number of spectra than the
original object, but with each individual's spectrum information
representing aggregated data from the original spectrum and its neighboring
spectra. This is thus equivalent with a smoothing of the data in retention
time dimension.
Note that the function returns always a MSnExp
object, even if x
was an
OnDiskMSnExp
object.
Usage
combineSpectraMovingWindow(x, halfWindowSize = 1L,
intensityFun = base::mean, mzd = NULL, timeDomain = FALSE,
weighted = FALSE, BPPARAM = bpparam())
Arguments
Argument | Description |
---|---|
x | MSnExp or OnDiskMSnExp object. |
halfWindowSize | integer(1) with the half window size for the moving window. |
intensityFun | function to aggregate the intensity values per m/z group. Should be a function or the name of a function. The function is expected to return a numeric(1) . |
mzd | numeric(1) defining the maximal m/z difference below which mass peaks are considered to represent the same ion/mass peak. Intensity values for such grouped mass peaks are aggregated. If not specified this value is estimated from the distribution of differences of m/z values from the provided spectra (see details). |
timeDomain | logical(1) whether definition of the m/z values to be combined into one m/z is performed on m/z values ( timeDomain = FALSE ) or on sqrt(mz) ( timeDomain = TRUE ). Profile data from TOF MS instruments should be aggregated based on the time domain (see details). Note that a pre-defined mzd should also be estimated on the square root of m/z values if timeDomain = TRUE . |
weighted | logical(1) whether m/z values per m/z group should be aggregated with an intensity-weighted mean. The default is to report the mean m/z. |
BPPARAM | parallel processing settings. |
Details
The method assumes same ions being measured in consecutive scans (i.e. LCMS data) and thus combines their signal which can increase the increase the signal to noise ratio.
Intensities (and m/z values) for signals with the same m/z value in
consecutive scans are aggregated using the intensityFun
.
m/z values of intensities from consecutive scans will never be exactly
identical, even if they represent signal from the same ion. The function
determines thus internally a similarity threshold based on differences
between m/z values within and between spectra below which m/z values are
considered to derive from the same ion. For robustness reasons, this
threshold is estimated on the 100 spectra with the largest number of
m/z - intensity pairs (i.e. mass peaks).
See meanMzInts()
for details.
Parameter timeDomain
: by default, m/z-intensity pairs from consecutive
scans to be aggregated are defined based on the square root of the m/z
values. This is because it is highly likely that in all QTOF MS instruments
data is collected based on a timing circuit (with a certain variance) and
m/z values are later derived based on the relationship t = k * sqrt(m/z)
.
Differences between individual m/z values will thus be dependent on the
actual m/z value causing both the difference between m/z values and their
scattering being different in the lower and upper m/z range. Determining
m/z values to be combined on the sqrt(mz)
reduces this dependency. For
non-QTOF MS data timeDomain = FALSE
might be used instead.
Value
MSnExp
with the same number of spectra than x
.
Seealso
meanMzInts()
for the function combining spectra provided in
a list
.
estimateMzScattering()
for a function to estimate m/z value scattering in
consecutive spectra.
Note
The function has to read all data into memory for the spectra combining
and thus the memory requirements of this function are high, possibly
preventing its usage on large experimental data. In these cases it is
suggested to perform the combination on a per-file basis and save the
results using the writeMSData()
function afterwards.
Author
Johannes Rainer, Sigurdur Smarason
Examples
library(MSnbase)
library(msdata)
## Read a profile-mode LC-MS data file.
fl <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)[1]
od <- readMSData(fl, mode = "onDisk")
## Subset the object to the retention time range that includes the signal
## for proline. This is done for performance reasons.
rtr <- c(165, 175)
od <- filterRt(od, rtr)
## Combine signal from neighboring spectra.
od_comb <- combineSpectraMovingWindow(od)
## The combined spectra have the same number of spectra, same number of
## mass peaks per spectra, but the signal is larger in the combined object.
length(od)
length(od_comb)
peaksCount(od)
peaksCount(od_comb)
## Comparing the chromatographic signal for proline (m/z ~ 116.0706)
## before and after spectra data combination.
mzr <- c(116.065, 116.075)
chr <- chromatogram(od, rt = rtr, mz = mzr)
chr_comb <- chromatogram(od_comb, rt = rtr, mz = mzr)
par(mfrow = c(1, 2))
plot(chr)
plot(chr_comb)
## Chromatographic data is "smoother" after combining.
commonFeatureNames()
Keep only common feature names
Description
Subsets MSnSet
instances to their common feature names.
Usage
commonFeatureNames(x, y)
Arguments
Argument | Description |
---|---|
x | An instance of class MSnSet or a list or MSnSetList with at least 2 MSnSet objects. |
y | An instance of class MSnSet . Ignored if x is a list / MSnSetList . |
Value
An linkS4class{MSnSetList}
composed of the input
MSnSet
containing only common features in the same
order. The names of the output are either the names of the
x
and y
input variables or the names of x
if a list is provided.
Author
Laurent Gatto
Examples
library("pRolocdata")
data(tan2009r1)
data(tan2009r2)
cmn <- commonFeatureNames(tan2009r1, tan2009r2)
names(cmn)
## as a named list
names(commonFeatureNames(list(a = tan2009r1, b = tan2009r2)))
## without message
suppressMessages(cmn <- commonFeatureNames(tan2009r1, tan2009r2))
## more than 2 instance
data(tan2009r3)
cmn <- commonFeatureNames(list(tan2009r1, tan2009r2, tan2009r3))
length(cmn)
compareMSnSets()
Compare two MSnSets
Description
Compares two MSnSet instances. The
qual
and processingData
slots are generally omitted.
Usage
compareMSnSets(x, y, qual = FALSE, proc = FALSE)
Arguments
Argument | Description |
---|---|
x | First MSnSet |
y | Second MSnSet |
qual | Should the qual slots be compared? Default is FALSE . |
proc | Should the processingData slots be compared? Default is FALSE . |
Value
A logical
Author
Laurent Gatto
compareSpectra_methods()
Compare Spectra of an 'MSnExp' or 'Spectrum' instances
Description
This method compares spectra ( Spectrum
instances) pairwise
or all spectra of an experiment ( MSnExp
instances). Currently
the comparison is based on the number of common peaks fun = "common"
,
the Pearson correlation fun = "cor"
, the dot product
fun = "dotproduct"
or a user-defined function.
For fun = "common"
the tolerance
(default 25e-6
)
can be set and the tolerance can be defined to be relative (default
relative = TRUE
) or absolute ( relative = FALSE
). To
compare spectra with fun = "cor"
and fun = "dotproduct"
,
the spectra need to be binned. The binSize
argument (in Dalton)
controls the binning precision. Please see bin
for
details.
Instead of these three predefined functions for fun
a
user-defined comparison function can be supplied. This function takes
two list("Spectrum") objects as the first two arguments
and list() as third argument. The function must return a single
numeric
value. See the example section.
Seealso
bin
, clean
, pickPeaks
,
smooth
, removePeaks
and trimMz
for other spectra processing methods.
Author
Sebastian Gibb mail@sebastiangibb.de
References
Stein, S. E., & Scott, D. R. (1994). Optimization and testing of mass spectral library search algorithms for compound identification. Journal of the American Society for Mass Spectrometry, 5(9), 859-866. doi: https://doi.org/10.1016/1044-0305(94)87009-8
Lam, H., Deutsch, E. W., Eddes, J. S., Eng, J. K., King, N., Stein, S. E. and Aebersold, R. (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics, 7: 655-667. doi: https://doi.org/10.1002/pmic.200600625
Examples
s1 <- new("Spectrum2", mz=1:10, intensity=1:10)
s2 <- new("Spectrum2", mz=1:10, intensity=10:1)
compareSpectra(s1, s2)
compareSpectra(s1, s2, fun="cor", binSize=2)
compareSpectra(s1, s2, fun="dotproduct")
## define our own (useless) comparison function (it is just a basic example)
equalLength <- function(x, y, ...) {
return(peaksCount(x)/(peaksCount(y)+.Machine$double.eps))
}
compareSpectra(s1, s2, fun=equalLength)
compareSpectra(s1, new("Spectrum2", mz=1:5, intensity=1:5), fun=equalLength)
compareSpectra(s1, new("Spectrum2"), fun=equalLength)
data(itraqdata)
compareSpectra(itraqdata[1:5], fun="cor")
consensusSpectrum()
Combine spectra to a consensus spectrum
Description
consensusSpectrum
takes a list of spectra and combines them to a
consensus spectrum containing mass peaks that are present in a user
definable proportion of spectra.
Usage
consensusSpectrum(x, mzd, minProp = 0.5, intensityFun = base::max,
ppm = 0, ...)
Arguments
Argument | Description |
---|---|
x | list of Spectrum objects (either Spectrum1 or Spectrum2 ). |
mzd | numeric(1) defining the maximal m/z difference below which mass peaks are grouped in to the same final mass peak (see details for more information). If not provided this value is estimated from the distribution of differences of m/z values from the spectra (see meanMzInts() for more details). See also parameter ppm below for the definition of an m/z dependent peak grouping. |
minProp | numeric(1) defining the minimal proportion of spectra in which a mass peak has to be present in order to include it in the final consensus spectrum. Should be a number between 0 and 1 (present in all spectra). |
intensityFun | function to be used to define the intensity of the aggregated peak. By default the maximum signal for a mass peak is reported. |
ppm | numeric(1) allowing to perform a m/z dependent grouping of mass peaks. See details for more information. |
... | additional arguments to be passed to intensityFun . |
Details
Peaks from spectra with a difference of their m/z being smaller than mzd
are grouped into the same final mass peak with their intensities being
aggregated with intensityFun
. The m/z of the final mass peaks is calculated
using a intensity-weighted mean of the m/z values from the individual mass
peaks. Alternatively (or in addition) it is possible to perform an m/z dependent
grouping of mass peaks with parameter ppm
: mass peaks from different spectra
with a difference in their m/z smaller than ppm
of their m/z are grouped
into the same final peak.
Seealso
Other spectra combination functions: meanMzInts
Author
Johannes Rainer
Examples
library(MSnbase)
## Create 3 example spectra.
sp1 <- new("Spectrum2", rt = 1, precursorMz = 1.41,
mz = c(1.2, 1.5, 1.8, 3.6, 4.9, 5.0, 7.8, 8.4),
intensity = c(10, 3, 140, 14, 299, 12, 49, 20))
sp2 <- new("Spectrum2", rt = 1.1, precursorMz = 1.4102,
mz = c(1.4, 1.81, 2.4, 4.91, 6.0, 7.2, 9),
intensity = c(3, 184, 8, 156, 12, 23, 10))
sp3 <- new("Spectrum2", rt = 1.2, precursorMz = 1.409,
mz = c(1, 1.82, 2.2, 3, 7.0, 8),
intensity = c(8, 210, 7, 101, 17, 8))
spl <- Spectra(sp1, sp2, sp3)
## Plot the spectra, each in a different color
par(mfrow = c(2, 1), mar = c(4.3, 4, 1, 1))
plot(mz(sp1), intensity(sp1), type = "h", col = "#ff000080", lwd = 2,
xlab = "m/z", ylab = "intensity", xlim = range(mz(spl)),
ylim = range(intensity(spl)))
points(mz(sp2), intensity(sp2), type = "h", col = "#00ff0080", lwd = 2)
points(mz(sp3), intensity(sp3), type = "h", col = "#0000ff80", lwd = 2)
cons <- consensusSpectrum(spl, mzd = 0.02, minProp = 2/3)
## Peaks of the consensus spectrum
mz(cons)
intensity(cons)
## Other Spectrum data is taken from the first Spectrum in the list
rtime(cons)
precursorMz(cons)
plot(mz(cons), intensity(cons), type = "h", xlab = "m/z", ylab = "intensity",
xlim = range(mz(spl)), ylim = range(intensity(spl)), lwd = 2)
defunct()
MSnbase Deprecated and Defunct
Description
The function, class, or data object you have asked for has been deprecated or made defunct.
Deprecated:
Defunct: readMzXMLData
, extractSpectra
,
writeMzTabData
, makeMTD
, makePEP
, makePRT
,
NAnnotatedDataFrame
class.
estimateMzResolution()
Estimate the m/z resolution of a spectrum
Description
estimateMzResolution
estimates the m/z resolution of a profile-mode
Spectrum
(or of all spectra in an MSnExp or OnDiskMSnExp object.
The m/z resolution is defined as the most frequent difference between a
spectrum's m/z values.
Usage
list(list("estimateMzResolution"), list("MSnExp"))(object, ...)
list(list("estimateMzResolution"), list("Spectrum"))(object, ...)
Arguments
Argument | Description |
---|---|
object | either a Spectrum , MSnExp or OnDiskMSnExp object. |
... | currently not used. |
Value
numeric(1)
with the m/z resolution. If called on a MSnExp
or
OnDiskMSnExp
a list
of m/z resolutions are returned (one for
each spectrum).
Note
This assumes the data to be in profile mode and does not return meaningful results for centroided data.
The estimated m/z resolution depends on the number of ions detected in a spectrum, as some instrument don't measure (or report) signal if below a certain threshold.
Author
Johannes Rainer
Examples
## Load a profile mode example file
library(MSnbase)
library(msdata)
f <- proteomics(full.names = TRUE,
pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz")
od <- readMSData(f, mode = "onDisk")
## Estimate the m/z resolution on the 3rd spectrum.
estimateMzResolution(od[[3]])
## Estimate the m/z resolution for each spectrum
mzr <- estimateMzResolution(od)
## plot the distribution of estimated m/z resolutions. The bimodal
## distribution represents the m/z resolution of the MS1 (first peak) and
## MS2 spectra (second peak).
plot(density(unlist(mzr)))
estimateMzScattering()
Estimate m/z scattering in consecutive scans
Description
Estimate scattering of m/z values (due to technical, instrument specific noise) for the same ion in consecutive scans of a LCMS experiment.
Usage
estimateMzScattering(x, halfWindowSize = 1L, timeDomain = FALSE)
Arguments
Argument | Description |
---|---|
x | MSnExp or OnDiskMSnExp object. |
halfWindowSize | integer(1) defining the half window size for the moving window to combine consecutive spectra. |
timeDomain | logical(1) whether m/z scattering should be estimated on mz ( timeDomain = FALSE ) or sqrt(mz) ( timeDomain = TRUE ) values. See combineSpectraMovingWindow() for details on this parameter. |
Details
The m/z values of the same ions in consecutive scans (spectra) of a LCMS run will not be identical. This random noise is expected to be smaller than the resolution of the MS instrument. The distribution of differences of m/z values from neighboring spectra is thus expected to be (at least) bi-modal with the first peak representing the above described random variation and the second (or largest) peak the m/z resolution. The m/z value of the first local minimum between these first two peaks in the distribution is returned as the m/z scattering .
Seealso
estimateMzResolution()
for the function to estimate a
profile-mode spectrum's m/z resolution from it's data.
Note
For timeDomain = TRUE
the function does not return the estimated
scattering of m/z values, but the scattering of sqrt(mz)
values.
Author
Johannes Rainer
Examples
library(MSnbase)
library(msdata)
## Load a profile-mode LC-MS data file
f <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)[1]
im <- readMSData(f, mode = "inMem", msLevel = 1L)
res <- estimateMzScattering(im)
## Plot the distribution of estimated m/z scattering
plot(density(unlist(res)))
## Compare the m/z resolution and m/z scattering of the spectrum with the
## most peaks
idx <- which.max(unlist(spectrapply(im, peaksCount)))
res[[idx]]
abline(v = res[[idx]], lty = 2)
estimateMzResolution(im[[idx]])
## As expected, the m/z scattering is much lower than the m/z resolution.
estimateNoise_method()
Noise Estimation for 'Spectrum' instances
Description
This method performs a noise estimation on individual spectra
( Spectrum
instances).
There are currently two different noise estimators, the
Median Absolute Deviation ( method = "MAD"
) and
Friedman's Super Smoother ( method = "SuperSmoother"
),
as implemented in the MALDIquant::detectPeaks
and
MALDIquant::estimateNoise
functions respectively.
Seealso
pickPeaks
, and the underlying method in MALDIquant
:
estimateNoise
.
Author
Sebastian Gibb mail@sebastiangibb.de
References
S. Gibb and K. Strimmer. 2012. MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics 28: 2270-2271. http://strimmerlab.org/software/maldiquant/
Examples
sp1 <- new("Spectrum1",
intensity = c(1:6, 5:1),
mz = 1:11,
centroided = FALSE)
estimateNoise(sp1, method = "SuperSmoother")
exprsToRatios_methods()
Calculate all ratio pairs
Description
Calculations all possible ratios for the assayData
columns in
an "
. The function
getRatios(x, log = FALSE)
takes a matrix x
as input and
is used by exprsToRatios
.
Examples
data(msnset)
pData(msnset)
head(exprs(msnset))
r <- exprsToRatios(msnset)
head(exprs(r))
pData(r)
extractPrecSpectra_methods()
Extracts precursor-specific spectra from an 'MSnExp' object
Description
Extracts the MSMS spectra that originate from the precursor(s) having
the same MZ value as defined in the prec
argument.
A warning will be issued of one or several of the precursor MZ values
in prec
are absent in the experiment precursor MZ values (i.e
in precursorMz(object)
).
Author
Laurent Gatto lg390@cam.ac.uk
Examples
file <- dir(system.file(package="MSnbase",dir="extdata"),
full.name=TRUE,pattern="mzXML$")
aa <- readMSData(file,verbose=FALSE)
my.prec <- precursorMz(aa)[1]
my.prec
bb <- extractPrecSpectra(aa,my.prec)
precursorMz(bb)
processingData(bb)
fData_utils()
Expand or merge feature variables
Description
The expandFeatureVars
and mergeFeatureVars
respectively expand
and merge groups of feature variables. Using these functions, a
set of columns in a feature data can be merged into a single new
data.frame-column variables and a data.frame-column can be
expanded into single feature columns. The original feature
variables are removed.
Usage
expandFeatureVars(x, fcol, prefix)
mergeFeatureVars(x, fcol, fcol2)
Arguments
Argument | Description |
---|---|
x | An object of class MSnSet . |
fcol | A character() of feature variables to expand (for expandFeatureVars ) or merge (for mergeFeatureVars ). |
prefix | A character(1) to use as prefix to the new feature variables. If missing (default), then fcol is used instead. If NULL , then no prefix is used. |
fcol2 | A character(1) defining the name of the new feature variable. |
Value
An MSnSet
for expanded (merged) feature variables.
Author
Laurent Gatto
Examples
library("pRolocdata")
data(hyperLOPIT2015)
fvarLabels(hyperLOPIT2015)
## Let's merge all svm prediction feature variables
(k <- grep("^svm", fvarLabels(hyperLOPIT2015), value = TRUE))
hl <- mergeFeatureVars(hyperLOPIT2015, fcol = k, fcol2 = "SVM")
fvarLabels(hl)
head(fData(hl)$SVM)
## Let's expand the new SVM into individual columns
hl2 <- expandFeatureVars(hl, "SVM")
fvarLabels(hl2)
## We can set the prefix manually
hl2 <- expandFeatureVars(hl, "SVM", prefix = "Expanded")
fvarLabels(hl2)
## If we don't want any prefix
hl2 <- expandFeatureVars(hl, "SVM", prefix = NULL)
fvarLabels(hl2)
factorsAsStrings()
Converts factors to strings
Description
This function produces the opposite as the stringsAsFactors
argument in the data.frame
or read.table
functions;
it converts factors
columns to characters
.
Usage
factorsAsStrings(x)
Arguments
Argument | Description |
---|---|
x | A data.frame |
Value
A data.frame
where factors
are converted to
characters
.
Author
Laurent Gatto
Examples
data(iris)
str(iris)
str(factorsAsStrings(iris))
featureCV()
Calculates coeffivient of variation for features
Description
This function calculates the column-wise coefficient of variation
(CV), i.e. the ration between the standard deviation and the
mean, for the features in an MSnSet
. The CVs are calculated
for the groups of features defined by groupBy
. For groups
defined by single features, NA
is returned.
Usage
featureCV(x, groupBy, na.rm = TRUE, norm = "none", suffix = NULL)
Arguments
Argument | Description |
---|---|
x | An instance of class MSnSet . |
groupBy | An object of class factor defining how to summarise the features. |
na.rm | A logical(1) defining whether missing values should be removed. |
norm | One of normalisation methods applied prior to CV calculation. See normalise() for more details. Here, the default is 'none' , i.e. no normalisation. |
suffix | A character(1) to be used to name the new CV columns. Default is NULL to ignore this. This argument should be set when CV values are already present in the MSnSet feature variables. |
Value
A matrix
of dimensions length(levels(groupBy))
by
ncol(x)
with the respecive CVs. The column names are formed
by pasting CV.
and the sample names of object x
, possibly
suffixed by .suffix
.
Seealso
Author
Laurent Gatto and Sebastian Gibb
Examples
data(msnset)
msnset <- msnset[1:4]
gb <- factor(rep(1:2, each = 2))
featureCV(msnset, gb)
featureCV(msnset, gb, suffix = "2")
fillUp()
Fills up a vector
Description
This function replaces all the empty characters ""
and/or
NA
s with the value of the closest preceding the preceding
non- NA
/ ""
element. The function is used to populate
dataframe or matrice columns where only the cells of the first row in
a set of partially identical rows are explicitly populated and the
following are empty.
Usage
fillUp(x)
Arguments
Argument | Description |
---|---|
x | a vector. |
Value
A vector as x
with all empty characters ""
and NA
values replaced by the preceding non- NA
/ ""
value.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
d <- data.frame(protein=c("Prot1","","","Prot2","",""),
peptide=c("pep11","","pep12","pep21","pep22",""),
score=c(1:2,NA,1:3))
d
e <- apply(d,2,fillUp)
e
data.frame(e)
fillUp(d[,1])
filterIdentificationDataFrame()
Filter out unreliable PSMs.
Description
A function to filter out PSMs matching to the decoy database, of rank greater than one and matching non-proteotypic peptides.
Usage
filterIdentificationDataFrame(x, decoy = "isDecoy", rank = "rank",
accession = "DatabaseAccess", spectrumID = "spectrumID",
verbose = isMSnbaseVerbose())
Arguments
Argument | Description |
---|---|
x | A data.frame containing PSMs. |
decoy | The column name defining whether entries match the decoy database. Default is "isDecoy" . The column should be a logical and only PSMs holding a FALSE are retained. Ignored is set to NULL . |
rank | The column name holding the rank of the PSM. Default is "rank" . This column should be a numeric and only PSMs having rank equal to 1 are retained. Ignored is set to NULL . |
accession | The column name holding the protein (groups) accession. Default is "DatabaseAccess" . Ignored is set to NULL . |
spectrumID | The name of the spectrum identifier column. Default is spectrumID . |
verbose | A logical verbosity flag. Default is to take isMSnbaseVerbose() . |
Details
The PSMs should be stored in a data.frame
such as those produced
by readMzIdData()
. Note that this function should be called
before calling the reduce method on a
PSM data.frame
.
Value
A new data.frame
with filtered out peptides and with the
same columns as the input x
.
Author
Laurent Gatto
formatRt()
Format Retention Time
Description
Converts seconds to/from 'min:sec' format
Usage
formatRt(rt)
Arguments
Argument | Description |
---|---|
rt | retention in seconds ( numeric ) or "mm:sec" ( character ) |
Details
This function is used to convert retention times. Conversion is seconds to/from the more human friendly format "mm:sec".
Value
A vector of same length as rt
.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
formatRt(1524)
formatRt("25:24")
getVariableName()
Return a variable name
Description
Return the name of variable varname
in call match_call
.
Usage
getVariableName(match_call, varname)
Arguments
Argument | Description |
---|---|
match_call | An object of class call , as returned by match.call . |
varname | An character of length 1 which is looked up in match_call . |
Value
A character
with the name of the variable passed as parameter
varname
in parent close of match_call
.
Author
Laurent Gatto
Examples
a <- 1
f <- function(x, y)
MSnbase:::getVariableName(match.call(), "x")
f(x = a)
f(y = a)
getaminoacids()
Amino acids
Description
Returns a data.frame
of amino acid properties: AA
,
ResidueMass
, Abbrev3
, ImmoniumIonMass
,
Name
, Hydrophobicity
, Hydrophilicity
,
SideChainMass
, pK1
, pK2
and pI
.
Usage
get.amino.acids()
Value
A data.frame
Author
Laurent Gatto
Examples
get.amino.acids()
getatomicmass()
Atomic mass.
Description
Returns a double
of used atomic mass.
Usage
get.atomic.mass()
Value
A named double
.
Author
Sebastian Gibb
Examples
get.atomic.mass()
grepEcols()
Returns the matching column names of indices.
Description
Given a text spread sheet f
and a pattern
to
be matched to its header (first line in the file), the function
returns the matching columns names or indices of the
corresponding data.frame
.
Usage
grepEcols(f, pattern, ..., n = 1)
getEcols(f, ..., n = 1)
Arguments
Argument | Description |
---|---|
f | A connection object or a character string to be read in with readLines(f, n = 1) . |
pattern | A character string containing a regular expression to be matched to the file's header. |
... | Additional parameters passed to strsplit to split the file header into individual column names. |
n | An integer specifying which line in file f to grep (get). Default is 1. Note that this argument must be named. |
Details
The function starts by reading the first line of the file (or connection)
f
with readLines
, then splits it
according to the optional ...
arguments (it is important to
correctly specify strsplit
's split
character vector here)
and then matches pattern
to the individual column names using
grep
.
Similarly, getEcols
can be used to explore the column names and
decide for the appropriate pattern
value.
These functions are useful to check the parameters to be provided to
readMSnSet2
.
Value
Depending on value
, the matching column names of
indices. In case of getEcols
, a character
of
column names.
Seealso
Author
Laurent Gatto
hasSpectraOrChromatograms()
Checks if raw data files have any spectra or chromatograms
Description
Helper functions to check whether raw files contain spectra or chromatograms.
Usage
hasSpectra(files)
hasChromatograms(files)
Arguments
Argument | Description |
---|---|
files | A character() with raw data filenames. |
Value
A logical(n)
where n == length(x)
with TRUE
if that
files contains at least one spectrum, FALSE
otherwise.
Author
Laurent Gatto
Examples
f <- msdata::proteomics(full.names = TRUE)[1:2]
hasSpectra(f)
hasChromatograms(f)
iPQF()
iPQF: iTRAQ (and TMT) Protein Quantification based on Features
Description
The iPQF spectra-to-protein summarisation method integrates
peptide spectra characteristics and quantitative values for protein
quantitation estimation. Spectra features, such as charge state,
sequence length, identification score and others, contain valuable
information concerning quantification accuracy. The iPQF algorithm
assigns weights to spectra according to their overall feature reliability
and computes a weighted mean to estimate protein quantities.
See also combineFeatures
for a more
general overview of feature aggregation and examples.
Usage
iPQF(object, groupBy, low.support.filter = FALSE, ratio.calc = "sum",
method.combine = FALSE, feature.weight = c(7, 6, 4, 3, 2, 1, 5)^2)
Arguments
Argument | Description |
---|---|
object | An instance of class MSnSet containing absolute ion intensities. |
groupBy | Vector defining spectra to protein matching. Generally, this is a feature variable such as fData(object)$accession . |
low.support.filter | A logical specifying if proteins being supported by only 1-2 peptide spectra should be filtered out. Default is FALSE . |
ratio.calc | Either "none" (don't calculate any ratios), "sum" (default), or a specific channel (one of sampleNames(object) ) defining how to calculate relative peptides intensities. |
method.combine | A logical defining whether to further use median polish to combine features. |
feature.weight | Vector "numeric" giving weight to the different features. Default is the squared order of the features redundant -unique-distance metric, charge state, ion intensity, sequence length, identification score, modification state, and mass based on a robustness analysis. |
Value
A matrix
with estimated protein ratios.
Author
Martina Fischer
References
iPQF: a new peptide-to-protein summarization method using peptide spectra characteristics to improve protein quantification. Fischer M, Renard BY. Bioinformatics. 2016 Apr 1;32(7):1040-7. doi:10.1093/bioinformatics/btv675. Epub 2015 Nov 20. PubMed PMID:26589272.
Examples
data(msnset2)
head(exprs(msnset2))
prot <- combineFeatures(msnset2,
groupBy = fData(msnset2)$accession,
method = "iPQF")
head(exprs(prot))
iTRAQ4()
iTRAQ 4-plex set
Description
This instance of class "
corresponds
to the iTRAQ 4-plex set, i.e the 114, 115, 116 and 117 isobaric
tags. In the iTRAQ5 data set, an unfragmented tag, i.e reporter and
attached isobaric tag, is also included at MZ 145.
These objects are used to plot the reporter ions of interest in an
MSMS spectra (see "
) as well as for
quantification (see quantify
).
Usage
iTRAQ4
iTRAQ5
iTRAQ8
iTRAQ9
Seealso
TMT6
.
References
Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, Khainovski N, Pillai S, Dey S, Daniels S, Purkayastha S, Juhasz P, Martin S, Bartlet-Jones M, He F, Jacobson A, Pappin DJ. "Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents." Mol Cell Proteomics , 2004 Dec;3(12):1154-69. Epub 2004 Sep 22. PubMed PMID: 15385600.
Examples
iTRAQ4
iTRAQ4[1:2]
newReporter <- new("ReporterIons",
description="an example",
name="my reporter ions",
reporterNames=c("myrep1","myrep2"),
mz=c(121,122),
col=c("red","blue"),
width=0.05)
newReporter
imageNA2()
NA heatmap visualisation for 2 groups
Description
Produces a heatmap after reordring rows and columsn to highlight missing value patterns.
Usage
imageNA2(object, pcol, Rowv, Colv = TRUE, useGroupMean = FALSE,
plot = TRUE, ...)
Arguments
Argument | Description |
---|---|
object | An instance of class MSnSet |
pcol | Either the name of a phenoData variable to be used to determine the group structure or a factor or any object that can be coerced as a factor of length equal to nrow(object). The resulting factor must have 2 levels. If missing (default) image(object) is called. |
Rowv | Determines if and how the rows/features are reordered. If missing (default), rows are reordered according to order((nNA1 + 1)^2/(nNA2 + 1)) , where NA1 and NA2 are the number of missing values in each group. Use a vector of numerics of feautre names to customise row order. |
Colv | A logical that determines if columns/samples are reordered. Default is TRUE . |
useGroupMean | Replace individual feature intensities by the group mean intensity. Default is FALSE. |
plot | A logical specifying of an image should be produced. Default is TRUE . |
... | Additional arguments passed to image . |
Value
Used for its side effect of plotting. Invisibly returns Rovw and Colv.
Author
Laurent Gatto, Samuel Wieczorek and Thomas Burger
Examples
library("pRolocdata")
library("pRoloc")
data(dunkley2006)
pcol <- ifelse(dunkley2006$fraction <= 5, "A", "B")
nax <- makeNaData(dunkley2006, pNA = 0.10)
exprs(nax)[sample(nrow(nax), 30), pcol == "A"] <- NA
exprs(nax)[sample(nrow(nax), 50), pcol == "B"] <- NA
MSnbase:::imageNA2(nax, pcol)
MSnbase:::imageNA2(nax, pcol, useGroupMean = TRUE)
MSnbase:::imageNA2(nax, pcol, Colv = FALSE, useGroupMean = FALSE)
MSnbase:::imageNA2(nax, pcol, Colv = FALSE, useGroupMean = TRUE)
impute_methods()
Quantitative proteomics data imputation
Description
The impute
method performs data imputation on an
MSnSet
instance using a variety of methods (see below). The
imputation and the parameters are logged into the
processingData(object)
slot.
Users should proceed with care when imputing data and take precautions to assure that the imputation produce valid results, in particular with naive imputations such as replacing missing values with 0.
Details
There are two types of mechanisms resulting in missing values in LC/MSMS experiments.
list("Missing values resulting from absence of detection of a ", " feature, despite ions being present at detectable concentrations. ", " For example in the case of ion suppression or as a result from the ", " stochastic, data-dependent nature of the MS acquisition ", " method. These missing value are expected to be randomly ", " distributed in the data and are defined as missing at random (MAR) ", " or missing completely at random (MCAR). ")
list("Biologically relevant missing values resulting from the ", " absence of the low abundance of ions (below the limit of detection ", " of the instrument). These missing values are not expected to be ", " randomly distributed in the data and are defined as missing not at ", " random (MNAR). ")
MNAR features should ideally be imputed with a left-censor method,
such as QRILC
below. Conversely, it is recommended to use host
deck methods such nearest neighbours, Bayesian missing value
imputation or maximum likelihood methods when values are missing at
random.
Currently, the following imputation methods are available:
list(" ", " ", " ", list(list("MLE"), list("Maximum likelihood-based imputation method using the EM ", " algorithm. Implemented in the ", list("norm::imp.norm"), " function. See ", " ", list(list("imp.norm")), " for details and additional ", " parameters. Note that here, ", list("..."), " are passed to the ", " ", list(list("em.norm")), " function, rather to the actual ", " imputation function ", list("imp.norm"), ". ")), " ", " ", " ", list(list("bpca"), list(
"Bayesian missing value imputation are available, as
", " implemented in the and ", list("pcaMethods::pca"), " functions. See ", " ", list(list("pca")), " for details and additional ", " parameters. ")), " ", " ", " ", list(list("knn"), list("Nearest neighbour averaging, as implemented in the ", " ", list("impute::impute.knn"), " function. See ", " ", list(list("impute.knn")), " for details and additional ", " parameters. ")), " ", " ", " ", list(list("QRILC"),
list("A missing data imputation method that performs the
", " imputation of left-censored missing data using random draws from a ", " truncated distribution with parameters estimated using quantile ", " regression. Implemented in the ", list("imputeLCMD::impute.QRILC"), " ", " function. See ", list(list("impute.QRILC")), " for details ", " and additional parameters. ")), " ", " ", " ", list(list("MinDet"), list("Performs the imputation of left-censored missing data ",
" using a deterministic minimal value approach. Considering a
", " expression data with ", list("n"), " samples and ", list("p"), " features, for ", " each sample, the missing entries are replaced with a minimal value ", " observed in that sample. The minimal value observed is estimated as ", " being the q-th quantile (default ", list("q = 0.01"), ") of the observed ", " values in that sample. Implemented in the ", " ", list("imputeLCMD::impute.MinDet"),
" function. See
", " ", list(list("impute.MinDet")), " for details and additional ", " parameters. ")), " ", " ", " ", list(list("MinProb"), list("Performs the imputation of left-censored missing data ", " by random draws from a Gaussian distribution centred to a minimal ", " value. Considering an expression data matrix with ", list("n"), " samples ", " and ", list("p"), " features, for each sample, the mean value of the ", " Gaussian distribution is set to a minimal observed value in that ",
" sample. The minimal value observed is estimated as being the q-th
", " quantile (default ", list("q = 0.01"), ") of the observed values in that ", " sample. The standard deviation is estimated as the median of the ", " feature standard deviations. Note that when estimating the ", " standard deviation of the Gaussian distribution, only the ", " peptides/proteins which present more than 50% recorded values are ", " considered. Implemented in the ", list(
"imputeLCMD::impute.MinProb"), "
", " function. See ", list(list("impute.MinProb")), " for details ", " and additional parameters. ")), " ", " ", " ", list(list("min"), list("Replaces the missing values by the smallest non-missing ", " value in the data. ")), " ", " ", " ", list(list("zero"), list("Replaces the missing values by 0.")), " ", " ", " ", list(list("mixed"), list("A mixed imputation applying two methods (to be defined ", " by the user as ",
list("mar"), " for values missing at random and
", " ", list("mnar"), " for values missing not at random, see example) on two ", " M[C]AR/MNAR subsets of the data (as defined by the user by a ", " ", list("randna"), " logical, of length equal to ", list("nrow(object)"), "). ")), " ", " ", " ", list(list("nbavg"), list("Average neighbour imputation for fractions collected ", " along a fractionation/separation gradient, such as sub-cellular ", " fractions. The method assumes that the fraction are ordered along ",
" the gradient and is invalid otherwise.
", " ", " Continuous sets ", list("NA"), " value at the beginning and the end of ", " the quantitation vectors are set to the lowest observed value in ", " the data or to a user defined value passed as argument ", list("k"), ". ", " Them, when a missing value is flanked by two non-missing ", " neighbouring values, it is imputed by the mean of its direct ", " neighbours. A stretch of 2 or more missing values will not be ",
" imputed. See the example below.
", " ", " ")), " ", " ", " ", list(list("none"), list("No imputation is performed and the missing values are ", " left untouched. Implemented in case one wants to only impute value ", " missing at random or not at random with the ", list("mixed"), " method.")), " ", " ", " ")
The naset
list("MSnSet") is an real quantitative
data where quantitative values have been replaced by NA
s. See
script/naset.R
for details.
Author
Laurent Gatto and Samuel Wieczorek
References
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays Bioinformatics (2001) 17 (6): 520-525.
Oba et al., A Bayesian missing value estimation method for gene expression profile data, Bioinformatics (2003) 19 (16): 2088-2096.
Cosmin Lazar (2015). imputeLCMD: A collection of methods for left-censored missing data imputation. R package version 2.0. http://CRAN.R-project.org/package=imputeLCMD .
Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J Proteome Res. 2016 Apr 1;15(4):1116-25. doi: 10.1021/acs.jproteome.5b00981. PubMed PMID: 26906401.
Examples
data(naset)
## table of missing values along the rows
table(fData(naset)$nNA)
## table of missing values along the columns
pData(naset)$nNA
## non-random missing values
notna <- which(!fData(naset)$randna)
length(notna)
notna
impute(naset, method = "min")
if (require("imputeLCMD")) {
impute(naset, method = "QRILC")
impute(naset, method = "MinDet")
}
if (require("norm"))
impute(naset, method = "MLE")
impute(naset, "mixed",
randna = fData(naset)$randna,
mar = "knn", mnar = "QRILC")
## neighbour averaging
x <- naset[1:4, 1:6]
exprs(x)[1, 1] <- NA ## min value
exprs(x)[2, 3] <- NA ## average
exprs(x)[3, 1:2] <- NA ## min value and average
## 4th row: no imputation
exprs(x)
exprs(impute(x, "nbavg"))
isCentroidedFromFile()
Get mode from mzML data file
Description
The function extracts the mode (profile or centroided) from the
raw mass spectrometry file by parsing the mzML file directly. If
the object x
stems from any other type of file, NA
s are
returned.
Usage
isCentroidedFromFile(x)
Arguments
Argument | Description |
---|---|
x | An object of class OnDiskMSnExp . |
Details
This function is much faster than isCentroided()
, which
estimates mode from the data, but is limited to data stemming from
mzML files which are still available in their original location
(and accessed with fileNames(x)
).
Value
A named logical
vector of the same length as x
.
Author
Laurent Gatto
Examples
library("msdata")
f <- proteomics(full.names = TRUE,
pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz")
x <- readMSData(f, mode = "onDisk")
table(isCentroidedFromFile(x), msLevel(x))
itraqdata()
Example MSnExp
and MSnSet
data sets
Description
itraqdata
is and example data sets is an iTRAQ 4-plex
experiment that has been run on an Orbitrap Velos instrument. It
includes identification data in the feature data slot obtain from the
Mascot search engine. It is a subset of an spike-in experiment where
proteins have spiked in an list("Erwinia") background, as described in
Karp et al. (2010), list("Addressing accuracy and precision issues in ", " iTRAQ quantitation") , Mol Cell Proteomics. 2010 Sep;9(9):1885-97. Epub 2010 Apr 10. (PMID 20382981).
The spiked-in proteins in itradata
are BSA and ENO and are
present in relative abundances 1, 2.5, 5, 10 and 10, 5, 2.5, 1 in the
114, 115, 116 and 117 reporter tags.
The msnset
object is produced by running the quantify
method on the itraqdata
experimental data, as detailed in the
quantify
example. This example data set is used in the
MSnbase-demo vignette, available with vignette("MSnbase-demo",
.
The msnset2
object is another example iTRAQ4 data that is used
to demonstrate features of the package, in particular the iPQF
feature aggregation method, described in iPQF
. It
corresponds to 11 proteins with spectra measurements from the original
data set described by Breitwieser et al. (2011) list("General
", " statistical modeling of data from protein relative expression isobaric
", " tags") . J. Proteome Res., 10, 2758-2766.
Usage
itraqdata
Examples
data(itraqdata)
itraqdata
## created by
## msnset <- quantify(itraqdata, method = "trap", reporters = iTRAQ4)
data(msnset)
msnset
data(msnset2)
msnset2
listOf()
Tests equality of list elements class
Description
Compares equality of all members of a list.
Usage
listOf(x, class, valid = TRUE)
Arguments
Argument | Description |
---|---|
x | A code list . |
class | A character defining the expected class. |
valid | A logical defining if all elements should be tested for validity. Default is TRUE . |
Value
TRUE
is all elements of x
inherit from
class
.
Author
Laurent Gatto
Examples
listOf(list(), "foo")
listOf(list("a", "b"), "character")
listOf(list("a", 1), "character")
makeCamelCase()
Convert to camel case by replacing dots by captial letters
Description
Convert a vector
of characters to camel case by replacing
dots by captial letters.
Usage
makeCamelCase(x, prefix)
Arguments
Argument | Description |
---|---|
x | A vector to be transformed to camel case. |
prefix | An optional character of length one. Any additional elements are ignores. |
Value
A character
of same length as x
.
Author
Laurent Gatto
Examples
nms <- c("aa.foo", "ab.bar")
makeCamelCase(nms)
makeCamelCase(nms, prefix = "x")
makeNaData()
Create a data with missing values
Description
These functions take an instance of class
"
and sets randomly selected values to
NA
.
Usage
makeNaData(object, nNA, pNA, exclude)
makeNaData2(object, nRows, nNAs, exclude)
whichNA(x)
Arguments
Argument | Description |
---|---|
object | An instance of class MSnSet . |
nNA | The absolute number of missing values to be assigned. |
pNA | The proportion of missing values to be assignmed. |
exclude | A vector to be used to subset object , defining rows that should not be used to set NA s. |
nRows | The number of rows for each set. |
nNAs | The number of missing values for each set. |
x | A matrix or an instance of class MSnSet . |
Details
makeNaData
randomly selects a number nNA
(or a
proportion pNA
) of cells in the expression matrix to be set
to NA
.
makeNaData2
will select length(nRows)
sets of rows
from object
, each with nRows[i]
rows respectively.
The first set will be assigned nNAs[1]
missing values, the
second nNAs[2]
, ... As opposed to makeNaData
, this
permits to control the number of NAs
per rows.
The whichNA
can be used to extract the indices
of the missing values, as illustrated in the example.
Value
An instance of class MSnSet
, as object
, but
with the appropriate number/proportion of missing values. The
returned object has an additional feature meta-data columns,
nNA
Author
Laurent Gatto
Examples
## Example 1
library(pRolocdata)
data(dunkley2006)
sum(is.na(dunkley2006))
dunkleyNA <- makeNaData(dunkley2006, nNA = 150)
processingData(dunkleyNA)
sum(is.na(dunkleyNA))
table(fData(dunkleyNA)$nNA)
naIdx <- whichNA(dunkleyNA)
head(naIdx)
## Example 2
dunkleyNA <- makeNaData(dunkley2006, nNA = 150, exclude = 1:10)
processingData(dunkleyNA)
table(fData(dunkleyNA)$nNA[1:10])
table(fData(dunkleyNA)$nNA)
## Example 3
nr <- rep(10, 5)
na <- 1:5
x <- makeNaData2(dunkley2006[1:100, 1:5],
nRows = nr,
nNAs = na)
processingData(x)
(res <- table(fData(x)$nNA))
stopifnot(as.numeric(names(res)[-1]) == na)
stopifnot(res[-1] == nr)
## Example 3
nr2 <- c(5, 12, 11, 8)
na2 <- c(3, 8, 1, 4)
x2 <- makeNaData2(dunkley2006[1:100, 1:10],
nRows = nr2,
nNAs = na2)
processingData(x2)
(res2 <- table(fData(x2)$nNA))
stopifnot(as.numeric(names(res2)[-1]) == sort(na2))
stopifnot(res2[-1] == nr2[order(na2)])
## Example 5
nr3 <- c(5, 12, 11, 8)
na3 <- c(3, 8, 1, 3)
x3 <- makeNaData2(dunkley2006[1:100, 1:10],
nRows = nr3,
nNAs = na3)
processingData(x3)
(res3 <- table(fData(x3)$nNA))
meanMzInts()
Combine a list of spectra to a single spectrum
Description
Combine peaks from several spectra into a single spectrum. Intensity and
m/z values from the input spectra are aggregated into a single peak if
the difference between their m/z values is smaller than mzd
or smaller than
ppm
of their m/z. While mzd
can be used to group mass peaks with a single
fixed value, ppm
allows a m/z dependent mass peak grouping. Intensity
values of grouped mass peaks are aggregated with the intensityFun
, m/z
values by the mean, or intensity weighted mean if weighted = TRUE
.
Usage
meanMzInts(x, ..., intensityFun = base::mean, weighted = FALSE,
main = 1L, mzd, ppm = 0, timeDomain = FALSE, unionPeaks = TRUE)
Arguments
Argument | Description |
---|---|
x | list of Spectrum objects. |
... | additional parameters that are passed to intensityFun . |
intensityFun | function to aggregate the intensity values per m/z group. Should be a function or the name of a function. The function is expected to return a numeric(1) . |
weighted | logical(1) whether m/z values per m/z group should be aggregated with an intensity-weighted mean. The default is to report the mean m/z. |
main | integer(1) defining the main spectrum, i.e. the spectrum which m/z and intensity values get replaced and is returned. By default the first spectrum in x is used. |
mzd | numeric(1) defining the maximal m/z difference below which mass peaks are considered to represent the same ion/mass peak. Intensity values for such grouped mass peaks are aggregated. If not specified this value is estimated from the distribution of differences of m/z values from the provided spectra (see details). |
ppm | numeric(1) allowing to perform a m/z dependent grouping of mass peaks. See details for more information. |
timeDomain | logical(1) whether definition of the m/z values to be combined into one m/z is performed on m/z values ( timeDomain = FALSE ) or on sqrt(mz) ( timeDomain = TRUE ). Profile data from TOF MS instruments should be aggregated based on the time domain (see details). Note that a pre-defined mzd should also be estimated on the square root of m/z values if timeDomain = TRUE . |
unionPeaks | logical(1) whether the union of all peaks (peak groups) from all spectra are reported or only peak groups that contain peaks that are present in the main spectrum (defined by main ). The default is to report the union of peaks from all spectra. |
Details
For general merging of spectra, the mzd
and/or ppm
should be manually
specified based on the precision of the MS instrument. Peaks from spectra
with a difference in their m/z being smaller than mzd
or smaller than
ppm
of their m/z are grouped into the same final peak.
Some details for the combination of consecutive spectra of an LCMS run:
The m/z values of the same ion in consecutive scans (spectra) of a LCMS run
will not be identical. Assuming that this random variation is much smaller
than the resolution of the MS instrument (i.e. the difference between
m/z values within each single spectrum), m/z value groups are defined
across the spectra and those containing m/z values of the main
spectrum
are retained. The maximum allowed difference between m/z values for the
same ion is estimated as in estimateMzScattering()
. Alternatively it is
possible to define this maximal m/z difference with the mzd
parameter.
All m/z values with a difference smaller than this value are combined to
a m/z group.
Intensities and m/z values falling within each of these m/z groups are
aggregated using the intensity_fun
and mz_fun
, respectively. It is
highly likely that all QTOF profile data is collected with a timing circuit
that collects data points with regular intervals of time that are then later
converted into m/z values based on the relationship t = k * sqrt(m/z)
. The
m/z scale is thus non-linear and the m/z scattering (which is in fact caused
by small variations in the time circuit) will thus be different in the lower
and upper m/z scale. m/z-intensity pairs from consecutive scans to be
combined are therefore defined by default on the square root of the m/z
values. With timeDomain = FALSE
, the actual m/z values will be used.
Value
Spectrum
with m/z and intensity values representing the aggregated values
across the provided spectra. The returned spectrum contains the union of
all peaks from all spectra (if unionPeaks = TRUE
), or the same number of
m/z and intensity pairs than the spectrum with index main
in x
(if
unionPeaks = FALSE
. All other spectrum data (such as retention time etc)
is taken from the main spectrum.
Seealso
estimateMzScattering()
for a function to estimate m/z scattering
in consecutive scans.
estimateMzResolution()
for a function estimating the m/z resolution of
a spectrum.
combineSpectraMovingWindow()
for the function to combine consecutive
spectra of an MSnExp
object using a moving window approach.
Other spectra combination functions: consensusSpectrum
Note
This allows e.g. to combine profile-mode spectra of consecutive scans into
the values for the main spectrum. This can improve centroiding of
profile-mode data by increasing the signal-to-noise ratio and is used in the
combineSpectraMovingWindow()
function.
Author
Johannes Rainer, Sigurdur Smarason
Examples
library(MSnbase)
## Create 3 example profile-mode spectra with a resolution of 0.1 and small
## random variations on these m/z values on consecutive scans.
set.seed(123)
mzs <- seq(1, 20, 0.1)
ints1 <- abs(rnorm(length(mzs), 10))
ints1[11:20] <- c(15, 30, 90, 200, 500, 300, 100, 70, 40, 20) # add peak
ints2 <- abs(rnorm(length(mzs), 10))
ints2[11:20] <- c(15, 30, 60, 120, 300, 200, 90, 60, 30, 23)
ints3 <- abs(rnorm(length(mzs), 10))
ints3[11:20] <- c(13, 20, 50, 100, 200, 100, 80, 40, 30, 20)
## Create the spectra.
sp1 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01),
intensity = ints1)
sp2 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.01),
intensity = ints2)
sp3 <- new("Spectrum1", mz = mzs + rnorm(length(mzs), sd = 0.009),
intensity = ints3)
## Combine the spectra
sp_agg <- meanMzInts(list(sp1, sp2, sp3))
## Plot the spectra before and after combining
par(mfrow = c(2, 1), mar = c(4.3, 4, 1, 1))
plot(mz(sp1), intensity(sp1), xlim = range(mzs[5:25]), type = "h", col = "red")
points(mz(sp2), intensity(sp2), type = "h", col = "green")
points(mz(sp3), intensity(sp3), type = "h", col = "blue")
plot(mz(sp_agg), intensity(sp_agg), xlim = range(mzs[5:25]), type = "h",
col = "black")
missing_data()
Documenting missing data visualisation
Description
There is a need for adequate handling of missing value impuation in
quantitative proteomics. Before developing a framework to handle
missing data imputation optimally, we propose a set of visualisation
tools. This document serves as an internal notebook for current
progress and ideas that will eventually materialise in exported
functionality in the MSnbase
package.
Details
The explore the structure of missing values, we propose to
Explore missing values in the frame of the experimental design. The
imageNA2
function offers such a simple visualisation. It is currently limited to 2-group designs/comparisons. In case of time course experiments or sub-cellular fractionation along a density gradient, we propose to split the time/gradient into 2 groups (early/late, top/bottom) as a first approximation.Explore the proportion of missing values in each group.
Explore the total and group-wise feature intensity distributions.
The existing plotNA
function illustrates the
completeness/missingness of the data.
Seealso
Author
Laurent Gatto lg390@cam.ac.uk, Samuel Wieczorek and Thomas Burger
Examples
## Other suggestions
library("pRolocdata")
library("pRoloc")
data(dunkley2006)
set.seed(1)
nax <- makeNaData(dunkley2006, pNA = 0.10)
pcol <- factor(ifelse(dunkley2006$fraction <= 5, "A", "B"))
sel1 <- pcol == "A"
## missing values in each sample
barplot(colSums(is.na(nax)), col = pcol)
## table of missing values in proteins
par(mfrow = c(3, 1))
barplot(table(rowSums(is.na(nax))), main = "All")
barplot(table(rowSums(is.na(nax)[sel1,])), main = "Group A")
barplot(table(rowSums(is.na(nax)[!sel1,])), main = "Group B")
fData(nax)$nNA1 <- rowSums(is.na(nax)[, sel1])
fData(nax)$nNA2 <- rowSums(is.na(nax)[, !sel1])
fData(nax)$nNA <- rowSums(is.na(nax))
o <- MSnbase:::imageNA2(nax, pcol)
plot((fData(nax)$nNA1 - fData(nax)$nNA2)[o], type = "l")
grid()
plot(sort(fData(nax)$nNA1 - fData(nax)$nNA2), type = "l")
grid()
o2 <- order(fData(nax)$nNA1 - fData(nax)$nNA2)
MSnbase:::imageNA2(nax, pcol, Rowv=o2)
layout(matrix(c(rep(1, 10), rep(2, 5)), nc = 3))
MSnbase:::imageNA2(nax, pcol, Rowv=o2)
plot((fData(nax)$nNA1 - fData(nax)$nNA)[o2], type = "l", col = "red",
ylim = c(-9, 9), ylab = "")
lines((fData(nax)$nNA - fData(nax)$nNA2)[o2], col = "steelblue")
lines((fData(nax)$nNA1 - fData(nax)$nNA2)[o2], type = "l",
lwd = 2)
mzRident2dfr()
Coerce identification data to a data.frame
Description
A function to convert the identification data contained in an
mzRident
object to a data.frame
. Each row represents
a scan, which can however be repeated several times if the PSM
matches multiple proteins and/or contains two or more
modifications. To reduce the data.frame
so that rows/scans
are unique and use semicolon-separated values to combine
information pertaining a scan, use
reduce
.
Arguments
Argument | Description |
---|---|
from | An object of class mzRident defined in the mzR package. |
Details
See also the Tandem MS identification data section in the MSnbase-demo vignette.
Value
A data.frame
Author
Laurent Gatto
Examples
## find path to a mzIdentML file
identFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "dummyiTRAQ.mzid")
library("mzR")
x <- openIDfile(identFile)
x
as(x, "data.frame")
nFeatures()
How many features in a group?
Description
This function computes the number of features in the group defined
by the feature variable fcol
and appends this information
in the feature data of object
.
Usage
nFeatures(object, fcol)
Arguments
Argument | Description |
---|---|
object | An instance of class MSnSet . |
fcol | Feature variable defining the feature grouping structure. |
Value
An updated MSnSet
with a new feature variable
fcol.nFeatures
.
Author
Laurent Gatto
Examples
library(pRolocdata)
data("hyperLOPIT2015ms3r1psm")
hyperLOPIT2015ms3r1psm <- nFeatures(hyperLOPIT2015ms3r1psm,
"Protein.Group.Accessions")
i <- c("Protein.Group.Accessions", "Protein.Group.Accessions.nFeatures")
fData(hyperLOPIT2015ms3r1psm)[1:10, i]
nQuants()
Count the number of quantitfied features.
Description
This function counts the number of quantified features, i.e
non NA quantitation values, for each group of features
for all the samples in an "
object.
The group of features are defined by a feature variable names, i.e
the name of a column of fData(object)
.
Usage
nQuants(x, groupBy)
Arguments
Argument | Description |
---|---|
x | An instance of class " . |
groupBy | An object of class factor defining how to summerise the features. (Note that this parameter was previously named fcol and referred to a feature variable label. This has been updated in version 1.19.12 for consistency with other functions.) |
Details
This function is typically used after topN
and before
combineFeatures
, when the summerising function is
sum
, or any function that does not normalise to the number of
features aggregated. In the former case, sums of features might
be the result of 0 (if no feature was quantified) to n
(if all topN
's n
features were quantified) features,
and one might want to rescale the sums based on the number of
non-NA features effectively summed.
Value
A matrix
of dimensions
length(levels(groupBy))
by ncol(x)
A matrix
of dimensions
length(levels(factor(fData(object)[, fcol])))
by
ncol(object)
of integers.
Author
Laurent Gatto lg390@cam.ac.uk, Sebastian Gibb mail@sebastiangibb.de
Examples
data(msnset)
n <- 2
msnset <- topN(msnset, groupBy = fData(msnset)$ProteinAccession, n)
m <- nQuants(msnset, groupBy = fData(msnset)$ProteinAccession)
msnset2 <- combineFeatures(msnset,
groupBy = fData(msnset)$ProteinAccession,
method = sum)
stopifnot(dim(n) == dim(msnset2))
head(exprs(msnset2))
head(exprs(msnset2) * (n/m))
naplot()
Overview of missing value
Description
Visualise missing values as a heatmap and barplots along the samples and features.
Usage
naplot(object, verbose = isMSnbaseVerbose(), reorderRows = TRUE,
reorderColumns = TRUE, ...)
Arguments
Argument | Description |
---|---|
object | An object of class MSnSet . |
verbose | If verbose (default is isMSnbaseVerbose() ), print a table of missing values. |
reorderRows | If reorderRows (default is TRUE ) rows are ordered by number of NA. |
reorderColumns | If reorderColumns (default is TRUE ) columns are ordered by number of NA. |
... | Additional parameters passed to image2 . |
Value
Used for its side effect. Invisibly returns NULL
Author
Laurent Gatto
Examples
data(naset)
naplot(naset)
normToReference()
Combine peptides into proteins.
Description
This function combines peptides into their proteins by normalising the intensity values to a reference run/sample for each protein.
Usage
normToReference(x, group, reference = .referenceFractionValues(x = x,
group = group))
Arguments
Argument | Description |
---|---|
x | matrix , exprs matrix of an MSnSet object. |
group | double or factor , grouping variable, i.e. protein accession; has to be of length equal nrow(x) . |
reference | double , vector of reference values, has to be of the same length as group and nrow(x) . |
Details
This function is not intented to be used directly (that's why it is not
exported via NAMESPACE
). Instead the user should use
combineFeatures
.
The algorithm is described in Nikolovski et al., briefly it works as follows:
Find reference run (column) for each protein (grouped rows). We use the run (column) with the lowest number of
NA
. If multiple candidates are available we use the one with the highest intensity. This step is skipped if the user use his ownreference
vector.For each protein (grouped rows) and each run (column):
Find peptides (grouped rows) shared by the current run (column) and the reference run (column).
Sum the shared peptides (grouped rows) for the current run (column) and the reference run (column).
The ratio of the shared peptides (grouped rows) of the current run (column) and the reference run (column) is the new intensity for the current protein for the current run.
Value
a matrix with one row per protein.
Seealso
Author
Sebastian Gibb mail@sebastiangibb.de, Pavel Shliaha
References
Nikolovski N, Shliaha PV, Gatto L, Dupree P, Lilley KS. Label-free protein quantification for plant Golgi protein localization and abundance. Plant Physiol. 2014 Oct;166(2):1033-43. DOI: 10.1104/pp.114.245589. PubMed PMID: 25122472.
Examples
library("MSnbase")
data(msnset)
# choose the reference run automatically
combineFeatures(msnset, groupBy=fData(msnset)$ProteinAccession)
# use a user-given reference
combineFeatures(msnset, groupBy=fData(msnset)$ProteinAccession,
reference=rep(2, 55))
normalise_methods()
Normalisation of MSnExp
, MSnSet
and
Spectrum
objects
Description
The normalise
method (also available as normalize
)
performs basic normalisation on spectra
intensities of single spectra ( "
or
"
objects),
whole experiments ( "
objects) or
quantified expression data ( "
objects).
Raw spectra and experiments are normalised using max
or
sum
only. For MSMS spectra could be normalised to their
precursor
additionally. Each peak intensity is divided by the
highest intensity in the spectrum, the sum of intensities or the intensity
of the precursor.
These methods aim at facilitating relative peaks heights between
different spectra.
The method
parameter for "
can be
one of sum
, max
, quantiles
, center.mean
,
center.median
, .median
, quantiles.robust
or
vsn
. For sum
and max
, each feature's reporter
intensity is divided by the maximum or the sum respectively. These two
methods are applied along the features (rows).
center.mean
and center.median
translate the respective
sample (column) intensities according to the column mean or
median. diff.median
translates all samples (columns) so that
they all match the grand median. Using quantiles
or
quantiles.robust
applies (robust) quantile normalisation, as
implemented in normalize.quantiles
and
normalize.quantiles.robust
of the preprocessCore
package. vsn
uses the vsn2
function from the
vsn
package. Note that the latter also glog-transforms the
intensities. See respective manuals for more details and function
arguments.
A scale
method, mimicking the base scale
method exists
for "
instances. See
?base::
for details.
Arguments
Argument | Description |
---|---|
object | An object of class " , " , " or " . |
method | A character vector of length one that describes how to normalise the object. See description for details. |
... | Additional arguments passed to the normalisation function. |
Examples
## quantifying full experiment
data(msnset)
msnset.nrm <- normalise(msnset, "quantiles")
msnset.nrm
npcv()
Non-parametric coefficient of variation
Description
Calculates a non-parametric version of the coefficient of
variation where the standard deviation is replaced by the median
absolute deviations (see mad
for details) and
divided by the absolute value of the mean.
Usage
npcv(x, na.rm = TRUE)
Arguments
Argument | Description |
---|---|
x | A numeric . |
na.rm | A logical (default is TRUE indicating whether NA values should be stripped before the computation of the median absolute deviation and mean. |
Details
Note that the mad
of a single value is 0 (as opposed to
NA
for the standard deviation, see example below).
Value
A numeric
.
Author
Laurent Gatto
Examples
set.seed(1)
npcv(rnorm(10))
replicate(10, npcv(rnorm(10)))
npcv(1)
mad(1)
sd(1)
pSet_class()
Class to Contain Raw Mass-Spectrometry Assays and Experimental Metadata
Description
Container for high-throughput mass-spectrometry assays and
experimental metadata. This class is based on Biobase's
"
virtual class, with the notable exception
that 'assayData' slot is an environment contain objects of class
"
.
Seealso
"
for an instantiatable application of
pSet
.
Author
Laurent Gatto lg390@cam.ac.uk
References
The "
class, on which pSet
is based.
Examples
showClass("pSet")
pickPeaks_method()
Peak Detection for 'MSnExp' or 'Spectrum' instances
Description
This method performs a peak picking on individual spectra
( Spectrum
instances) or whole experiments ( MSnExp
instances) to
create centroided spectra.
For noisy spectra there are currently two different noise estimators
available, the Median Absolute Deviation ( method = "MAD"
) and
Friedman's Super Smoother ( method = "SuperSmoother"
),
as implemented in the MALDIquant::detectPeaks
and
MALDIquant::estimateNoise
functions respectively.
The method supports also to optionally refine the m/z value of the identified centroids by considering data points that belong (most likely) to the same mass peak. The m/z value is calculated as an intensity weighted average of the m/z values within the peak region. How the peak region is defined depends on the method chosen:
refineMz = "kNeighbors"
: m/z values (and their respective
intensities) of the 2 * k
closest signals to the centroid are
used in the intensity weighted average calculation. The number of
neighboring signals can be defined with the argument k
.
refineMz = "descendPeak"
: the peak region is defined by
descending from the identified centroid/peak on both sides until the
measured signal increases again. Within this defined region all
measurements with an intensity of at least signalPercentage
of
the centroid's intensity are used to calculate the refined m/z. By
default the descend is stopped when the first signal that is equal or
larger than the last observed one is encountered. Setting
stopAtTwo = TRUE
, two consecutively increasing signals are
required.
By default ( refineMz = "none"
, simply the m/z of the largest
signal (the identified centroid) is reported. See below for examples.
Seealso
clean
, removePeaks
smooth
,
estimateNoise
and trimMz
for other spectra
processing methods.
Author
Sebastian Gibb mail@sebastiangibb.de with contributions from Johannes Rainer.
References
S. Gibb and K. Strimmer. 2012. MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics 28: 2270-2271. http://strimmerlab.org/software/maldiquant/
Examples
sp1 <- new("Spectrum1",
intensity = c(1:6, 5:1),
mz = 1:11,
centroided = FALSE)
sp2 <- pickPeaks(sp1)
intensity(sp2)
data(itraqdata)
itraqdata2 <- pickPeaks(itraqdata)
processingData(itraqdata2)
## Examples for refineMz:
ints <- c(5, 3, 2, 3, 1, 2, 4, 6, 8, 11, 4, 7, 5, 2, 1, 0, 1, 0, 1, 1, 1, 0)
mzs <- 1:length(ints)
sp1 <- new("Spectrum1", intensity = ints, mz = mzs, centroided = FALSE)
plot(mz(sp1), intensity(sp1), type = "h")
## Without m/z refinement:
sp2 <- pickPeaks(sp1)
points(mz(sp2), intensity(sp2), col = "darkgrey")
## Using k = 1, closest signals
sp3 <- pickPeaks(sp1, refineMz = "kNeighbors", k = 1)
points(mz(sp3), intensity(sp3), col = "green", type = "h")
## Using descendPeak requiring at least 50% or the centroid's intensity
sp4 <- pickPeaks(sp1, refineMz = "descendPeak", signalPercentage = 50)
points(mz(sp4), intensity(sp4), col = "red", type = "h")
plot2d_methods()
The 'plot2d' method for 'MSnExp' quality assessment
Description
These methods plot the retention time vs. precursor MZ for the whole
"
experiment. Individual dots will be
colour-coded to describe individual spectra's peaks count, total ion
count, precursor charge (MS2 only) or file of origin.
The methods make use the ggplot2
system. An object of class
'ggplot' is returned invisibly.
Arguments
Argument | Description |
---|---|
object | An object of class " or a data.frame . In the latter case, the data frame must have numerical columns named 'retention.time' and 'precursor.mz' and one of 'tic', 'file', 'peaks.count' or 'charge', depending on the z parameter. Such a data frame is typically generated using the header method on " object. |
z | A character indicating according to what variable to colour the dots. One of, possibly abreviated, "ionCount" (total ion count), "file" (raw data file), "peaks.count" (peaks count) or "charge" (precursor charge). |
alpha | Numeric [0,1] indicating transparence level of points. |
plot | A logical indicating whether the plot should be printed (default is 'TRUE'). |
Seealso
The plotDensity
and plotMzDelta
methods
for other QC plots.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
itraqdata
plot2d(itraqdata,z="ionCount")
plot2d(itraqdata,z="peaks.count")
plot2d(itraqdata,z="charge")
plotDensity_methods()
The 'plotDensity' method for 'MSnExp' quality assessment
Description
These methods plot the distribution of several parameters of interest
for the different precursor charges for "
experiment.
The methods make use the ggplot2
system. An object of class
'ggplot' is returned invisibly.
Arguments
Argument | Description |
---|---|
object | An object of class " or and 'data.frame'. In the latter case, the data frame must have numerical columns named 'charge' and one of 'precursor.mz', 'peaks.count' or 'ionCount', depending on the z parameter. Such a data frame is typically generated using the header method on " object. |
z | A character indicating which parameter's densitiy to plot. One of, possibly abreviated, "ionCount" (total ion count), "peaks.count" (peaks count) or "precursor.mz" (precursor MZ). |
log | Logical, whether to log transform the data (default is 'FALSE'). |
plot | A logical indicating whether the plot should be printed (default is 'TRUE'). |
Seealso
The plot2d
and plotDensity
methods for
other QC plots.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
itraqdata
plotDensity(itraqdata,z="ionCount")
plotDensity(itraqdata,z="peaks.count")
plotDensity(itraqdata,z="precursor.mz")
plotMzDelta_methods()
The delta m/z plot
Description
The m/z delta plot illustrates the suitability of MS2 spectra for identification by plotting the m/z differences of the most intense peaks. The resulting histogram should optimally shown outstanding bars at amino acid residu masses. The plots have been described in Foster et al 2011.
Only a certain percentage of most intense MS2 peaks are taken into
account to use the most significant signal. Default value is 10% (see
percentage
argument). The difference between peaks is then
computed for all individual spectra and their distribution is plotted
as a histogram where single bars represent 1 m/z differences. Delta
m/z between 40 and 200 are plotted by default, to encompass the
residue masses of all amino acids and several common contaminants,
although this can be changes with the xlim
argument.
In addition to the processing described above, isobaric reporter tag
peaks (see the reporters
argument) and the precursor peak (see
the precMz
argument) can also be removed from the MS2 spectrum,
to avoid interence with the fragment peaks.
Note that figures in Foster et al 2011 have been produced and optimised for centroided data. Application of the plot as is for data in profile mode has not been tested thoroughly, although the example below suggest that it might work.
The methods make use the ggplot2
system. An object of class
ggplot
is returned invisibly.
Most of the code for plotMzDelta has kindly been contributed by Guangchuang Yu.
Arguments
Argument | Description |
---|---|
object | An object of class MSnExp or mzRramp (from the mzR package) containing MS2 spectra. |
reporters | An object of class class " that defines which reporter ion peaks to set to 0. The default value NULL leaves the spectra as they are. |
subset | A numeric between 0 and 1 to use a subset of object 's MS2 spectra. |
percentage | The percentage of most intense peaks to be used for the plot. Default is 0.1. |
precMz | A numeric of length one or NULL default. In the latter (and preferred) case, the precursor m/z values are extracted from the individual MS2 spectra using the precursorMz method. |
precMzWidth | A numeric of length 1 that specifies the width around the precursor m/z where peaks are set to 0. Default is 2. |
bw | A numeric specifying the bandwith to be used to bin the delta m/z value to plot the histogram. Default if 1. See geom_histogram for more details. |
xlim | A numeric of length 2 specifying the range of delta m/z to plot on the histogram. Default is c(40,200) . |
withLabels | A logical defining if amino acid residue labels are plotted on the figure. Default is TRUE . |
size | A numeric of length 1 specifying the font size of amino acids lables. Default is 2.5. |
plot | A logical of length 1 that defines whether the figure should be plotted on the active device. Default is TRUE . Note that the ggplot object is always returned invisibly. |
verbose | A logical of length 1 specifying whether textual output and a progress bar illustration the progress of data processing should be printed. Default is TRUE |
Seealso
The plotDensity
and plot2d
methods for
other QC plots.
Author
Laurent Gatto lg390@cam.ac.uk and Guangchuang Yu
References
Foster JM, Degroeve S, Gatto L, Visser M, Wang R, Griss J, Apweiler R, Martens L. "A posteriori quality control for the curation and reuse of public proteomics data." Proteomics , 2011 Jun;11(11):2182-94. doi:10.1002/pmic.201000602. Epub 2011 May 2. PMID: 21538885
Examples
mzdplot <- plotMzDelta(itraqdata,
subset = 0.5,
reporters = iTRAQ4,
verbose = FALSE, plot = FALSE)
## let's retrieve peptide sequence information
## and get a table of amino acids
peps <- as.character(fData(itraqdata)$PeptideSequence)
aas <- unlist(strsplit(peps,""))
## table of aas
table(aas)
## mzDelta plot
print(mzdplot)
plotNA_methods()
Exploring missing data in 'MSnSet' instances
Description
These methods produce plots that illustrate missing data.
is.na
returns the expression matrix of it MSnSet
argument as a matrix of logicals referring whether the corresponding
cells are NA
or not. It is generally used in conjunction with
table
and image
(see example below).
The plotNA
method produces plots that illustrate missing data.
The completeness of the full dataset or a set of proteins (ordered by
increasing NA content along the x axis) is represented.
The methods make use the ggplot2
system. An object of class
'ggplot' is returned invisibly.
Seealso
See also the filterNA
method to filter out features with
a specified proportion if missing values.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
data(msnset)
exprs(msnset)[sample(prod(dim(msnset)), 120)] <- NA
head(is.na(msnset))
table(is.na(msnset))
image(msnset)
plotNA(msnset, pNA = 1/4)
plotSpectrumSpectrum_methods()
Plotting a 'Spectrum' vs another 'Spectrum' object.
Description
These method plot mass spectra MZ values against the intensities as line plots. The first spectrum is plotted in the upper panel and the other in upside down in the lower panel. Common peaks are drawn in a slightly darker colour. If a peptide sequence is provided it automatically calculates and labels the fragments.
Arguments
Argument | Description |
---|---|
x | Object of class " . |
y | Object of class " . |
list() | Further arguments passed to internal functions. |
Seealso
More spectrum plotting available in plot.Spectrum
.
More details about fragment calculation: calculateFragments
.
Author
Sebastian Gibb mail@sebastiangibb.de
Examples
## find path to a mzXML file
file <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "mzXML$")
## create basic MSnExp
msexp <- readMSData(file, centroided.=FALSE)
## centroid them
msexp <- pickPeaks(msexp)
## plot the first against the second spectrum
plot(msexp[[1]], msexp[[2]])
## add sequence information
plot(msexp[[1]], msexp[[2]], sequences=c("VESITARHGEVLQLRPK",
"IDGQWVTHQWLKK"))
itraqdata2 <- pickPeaks(itraqdata)
(k <- which(fData(itraqdata2)[, "PeptideSequence"] == "TAGIQIVADDLTVTNPK"))
mzk <- precursorMz(itraqdata2)[k]
zk <- precursorCharge(itraqdata2)[k]
mzk * zk
plot(itraqdata2[[k[1]]], itraqdata2[[k[2]]])
plot_methods()
Plotting 'MSnExp' and 'Spectrum' object(s)
Description
These methods provide the functionality to plot mass spectrometry data provided as MSnExp , OnDiskMSnExp or Spectrum objects. Most functions plot mass spectra M/Z values against intensities.
Full spectra (using the full
parameter) or specific peaks of
interest can be plotted using the reporters
parameter. If
reporters
are specified and full
is set to 'TRUE', a
sub-figure of the reporter ions is inlaid inside the full spectrum.
If an "
is provided as argument, all the
spectra are aligned vertically. Experiments can be subset to
extract spectra of interest using the [
operator or
extractPrecSpectra
methods.
Most methods make use the ggplot2
system in which case an
object of class 'ggplot' is returned invisibly.
If a single "
and a "character"
representing a valid peptide sequence are passed as argument, the
expected fragement ions are calculated and matched/annotated on the
spectum plot.
Arguments
Argument | Description |
---|---|
x | Objects of class " , " or " to be plotted. |
y | Missing, " or "character" . |
reporters | An object of class " that defines the peaks to be plotted. If not specified, full must be set to 'TRUE'. |
full | Logical indicating whether full spectrum (respectively spectra) of only reporter ions of interest should be plotted. Default is 'FALSE', in which case reporters must be defined. |
centroided. | Logical indicating if spectrum or spectra are in centroided mode, in which case peaks are plotted as histograms, rather than curves. |
plot | Logical specifying whether plot should be printed to current device. Default is 'TRUE'. |
w1 | Width of sticks for full centroided spectra. Default is to use maximum MZ value divided by 500. |
w2 | Width of histogram bars for centroided reporter ions plots. Default is 0.01. |
Seealso
calculateFragments
to calculate ions produced by
fragmentation and plot.Spectrum.Spectrum
to plot and
compare 2 spectra and their shared peaks.
Chromatogram
for plotting of chromatographic data.
Author
Laurent Gatto lg390@cam.ac.uk, Johannes Rainer and Sebastian Gibb
Examples
data(itraqdata)
## plotting experiments
plot(itraqdata[1:2], reporters = iTRAQ4)
plot(itraqdata[1:2], full = TRUE)
## plotting spectra
plot(itraqdata[[1]],reporters = iTRAQ4, full = TRUE)
itraqdata2 <- pickPeaks(itraqdata)
i <- 14
s <- as.character(fData(itraqdata2)[i, "PeptideSequence"])
plot(itraqdata2[[i]], s, main = s)
## Load profile-mode LC-MS files
library(msdata)
od <- readMSData(dir(system.file("sciex", package = "msdata"),
full.names = TRUE), mode = "onDisk")
## Restrict the MS data to signal for serine
serine <- filterMz(filterRt(od, rt = c(175, 190)), mz = c(106.04, 106.06))
plot(serine, type = "XIC")
## Same plot but using heat.colors, rectangles and no point border
plot(serine, type = "XIC", pch = 22, colramp = heat.colors, col = NA)
precSelection()
Number of precursor selection events
Description
precSelection
computes the number of selection events each
precursor ions has undergone in an tandem MS experiment. This will be
a function of amount of peptide loaded, chromatography efficiency,
exclusion time,... and is useful when optimising and experimental
setup. This function returns a named integer vector or length equal to
the number of unique precursor MZ values in the original
experiment. See n
parameter to set the number of MZ significant
decimals.
precSelectionTable
is a wrapper around precSelection
and
returns a table with the number of single, 2-fold, ... selection events.
Usage
precSelection(object,n)
Arguments
Argument | Description |
---|---|
object | An instane of class " . |
n | The number of decimal places to round the precursor MZ to. Is passed to the round function. |
Value
A named integer in case of precSelection
and a table
for
precSelectionTable
.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
precSelection(itraqdata)
precSelection(itraqdata,n=2)
precSelectionTable(itraqdata)
## only single selection event in this reduced exeriment
purityCorrect_methods()
Performs reporter ions purity correction
Description
Manufacturers sometimes provide purity correction values indicating the percentages of each reporter ion that have masses differing by +/- n Da from the nominal reporter ion mass due to isotopic variants. This correction is generally applied after reporter peaks quantitation.
Purity correction here is applied using solve
from the
base
package using the purity correction values as coefficient of
the linear system and the reporter quantities as the right-hand side
of the linear system. 'NA' values are ignored and negative
intensities after correction are also set to 'NA'.
A more elaborated purity correction method is described in Shadforth et al. , i-Tracker: for quantitative proteomics using iTRAQ. BMC Genomics. 2005 Oct 20;6:145. (PMID 16242023).
Function makeImpuritiesMatrix(x, filename, edit = TRUE)
helps
the user to create such a matrix. The function can be used in two ways.
If given an integer x
, it is used as the dimension of the
square matrix (i.e the number of reporter ions). For TMT6-plex and
iTRAQ4-plex, default values taken from manufacturer's certification
sheets are used as templates, but batch specific values should be used
whenever possible. Alternatively, the filename
of a csv
spreadsheet can be provided. The sheet should define the correction
factors as illustrated below (including reporter names in the first
column and header row) and the corresponding correction matrix is
calculated. Examples of such csv
files are available in the
package's extdata
directory. Use
dir(system.file("extdata", package = "MSnbase"), pattern =
to locate them.
If edit = TRUE
, the the matrix can be edited before
it is returned.
Arguments
Argument | Description |
---|---|
object | An object of class " . |
|impurities
| A square 'matrix' of dim equal to ncol(object) defining the correction coefficients to be applied. The reporter ions should be ordered along the columns and the relative percentages along the rows. As an example, below is the correction factors as provided in an ABI iTRAQ 4-plex certificate of analysis: list(list("lrrrr"), list("
", " reporter ", list(), " % of -2 ", list(), " % of -1 ", list(), " % of +1 ", list(), " % of +2 ", list(), "
", " 114 ", list(), " 0.0 ", list(), " 1.0 ", list(), " 5.9 ", list(), " 0.2 ", list(), "
", " 115 ", list(), " 0.0 ", list(), " 2.0 ", list(), " 5.6 ", list(), " 0.1 ", list(), "
", " 116 ", list(), " 0.0 ", list(), " 3.0 ", list(), " 4.5 ", list(), " 0.1 ", list(), "
", " 117 ", list(), " 0.1 ", list(), " 4.0 ", list(), " 3.5 ", |
list(), " 0.1 ", list(), "
", " ")) The impurity table will be list(list("rrrr"), list(" ", " 0.929 ", list(), " 0.059 ", list(), " 0.002 ", list(), " 0.000 ", list(), " ", " 0.020 ", list(), " 0.923 ", list(), " 0.056 ", list(), " 0.001 ", list(), " ", " 0.000 ", list(), " 0.030 ", list(), " 0.924 ", list(), " 0.045 ", list(), " ", " 0.000 ", list(), " 0.001 ", list(), " 0.040 ", list(), " 0.923 ", list(), " ", " ")) where, the diagonal is computed as 100 - sum of rows of the original table and subsequent cells are directly filled in. Similarly, for TMT 6-plex tags, we observe list(list("lrrrrrr"), list(" ", " reporter ", list(), " % of -3 ", list(), " % of -2 ", list(), " % of -1 ", list(), " % of +1 % ", list(), " % of +2 ", list(), " % of +3 ", list(), " ", " 126 ", list(), " 0 ", list(), " 0 ", list(), " 0 ", list(), " 6.1 ", list(), " 0 ", list(), " 0 ", list(), " ", " 127 ", list(), " 0 ", list(), " 0 ", list(), " 0.5 ", list(), " 6.7 ", list(), " 0 ", list(), " 0 ", list(), " ", " 128 ", list(), " 0 ", list(), " 0 ", list(), " 1.1 ", list(),
" 4.2 ", list(), " 0 ", list(), " 0 ", list(), "
", " 129 ", list(), " 0 ", list(), " 0 ", list(), " 1.7 ", list(), " 4.1 ", list(), " 0 ", list(), " 0 ", list(), " ", " 130 ", list(), " 0 ", list(), " 0 ", list(), " 1.6 ", list(), " 2.1 ", list(), " 0 ", list(), " 0 ", list(), " ", " 131 ", list(), " 0 ", list(), " 0.2 ", list(), " 3.2 ", list(), " 2.8 ", list(), " 0 ", list(), " 0 ", list(), " ", " ")) and obtain the following impurity correction matrix list(list("rrrrrr"), list(" ", " 0.939 ", list(), " 0.061 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " ", " 0.005 ", list(), " 0.928 ", list(), " 0.067 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " ", " 0.000 ", list(), " 0.011 ", list(), " 0.947 ", list(), " 0.042 ", list(), " 0.000 ", list(), " 0.000 ", list(), " ", " 0.000 ", list(), " 0.000 ", list(), " 0.017 ", list(), " 0.942 ", list(), " 0.041 ", list(),
" 0.000 ", list(), "
", " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.016 ", list(), " 0.963 ", list(), " 0.021 ", list(), " ", " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.002 ", list(), " 0.032 ", list(), " 0.938 ", list(), " ", " ")) For iTRAQ 8-plex, given the following correction factors (to make such a matrix square, if suffices to add -4, -3, +3 and +4 columns filled with zeros): list(list("lllll"), list(" ", " TAG ", list(), " -2 ", list(), " -1 ", list(), " +1 ", list(), " +2 ", list(), " ", " 113 ", list(), " 0 ", list(), " 2.5 ", list(), " 3 ", list(), " 0.1 ", list(), " ", " 114 ", list(), " 0 ", list(), " 1 ", list(), " 5.9 ", list(), " 0.2 ", list(), " ", " 115 ", list(), " 0 ", list(), " 2 ", list(), " 5.6 ", list(), " 0.1 ", list(), " ", " 116 ", list(), " 0 ", list(), " 3 ", list(), " 4.5 ", list(), " 0.1 ", list(), " ",
" 117 ", list(), " 0.1 ", list(), " 4 ", list(), " 3.5 ", list(), " 0.1 ", list(), "
", " 118 ", list(), " 0.1 ", list(), " 2 ", list(), " 3 ", list(), " 0.1 ", list(), " ", " 119 ", list(), " 0.1 ", list(), " 2 ", list(), " 4 ", list(), " 0.1 ", list(), " ", " 121 ", list(), " 0.1 ", list(), " 2 ", list(), " 3 ", list(), " 0.1 ", list(), " ", " ")) we calculate the impurity correction matrix shown below list(list("lrrrrrrrr"), list(" ", " ", list(), " 113 ", list(), " 114 ", list(), " 115 ", list(), " 116 ", list(), " 117 ", list(), " 118 ", list(), " 119 ", list(), " 121 ", list(), " ", " % reporter 113 ", list(), " 0.944 ", list(), " 0.030 ", list(), " 0.001 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " ", " % reporter 114 ", list(), " 0.010 ", list(), " 0.929 ", list(), " 0.059 ", list(), " 0.002 ",
list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), "
", " % reporter 115 ", list(), " 0.000 ", list(), " 0.020 ", list(), " 0.923 ", list(), " 0.056 ", list(), " 0.001 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " ", " % reporter 116 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.030 ", list(), " 0.924 ", list(), " 0.045 ", list(), " 0.001 ", list(), " 0.000 ", list(), " 0.000 ", list(), " ", " % reporter 117 ",
list(), " 0.000 ", list(), " 0.000 ", list(), " 0.001 ", list(), " 0.040 ", list(), " 0.923 ", list(), " 0.035 ", list(), " 0.001 ", list(), " 0.000 ", list(), "
", " % reporter 118 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.001 ", list(), " 0.020 ", list(), " 0.948 ", list(), " 0.030 ", list(), " 0.001 ", list(), " ", " % reporter 119 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.001 ", list(), " 0.020 ", list(),
" 0.938 ", list(), " 0.040 ", list(), "
", " % reporter 121 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.001 ", list(), " 0.020 ", list(), " 0.948 ", list(), " ", " ")) Finally, for a TMT 10-plex impurity matrix (for example lot RH239932 ) list(list("llllllllll"), list(" ", " . ", list(), " -2 ", list(), " -1 ", list(), " 1 ", list(), " 2 ", list(), " ", " 126 ", list(), " 0.0 ", list(), " 0.0 ", list(), " 5.0 (127C) ", list(), " 0.0 (128C) ", list(), " ", " 127N ", list(), " 0.0 ", list(), " 0.2 ", list(), " 5.8 (128N) ", list(), " 0.0 (129N) ", list(), " ", " 127C ", list(), " 0.0 ", list(), " 0.3 (126) ", list(), " 4.8 (128C) ", list(),
" 0.0 (129C) ", list(), "
", " 128N ", list(), " 0.0 ", list(), " 0.4 (127N) ", list(), " 4.1 (129N) ", list(), " 0.0 (130N) ", list(), " ", " 128C ", list(), " 0.0 (126) ", list(), " 0.6 (127C) ", list(), " 3.0 (129C) ", list(), " 0.0 (130C) ", list(), " ", " 129N ", list(), " 0.0 (127N) ", list(), " 0.8 (128N) ", list(), " 3.5 (130N) ", list(), " 0.0 (131) ", list(), " ", " 129C ", list(), " 0.0 (127C) ", list(), " 1.4 (128C) ", list(), " 2.4 (130C) ", list(),
" 0.0 ", list(), "
", " 130N ", list(), " 0.1 (128N) ", list(), " 1.5 (129N) ", list(), " 2.4 (131) ", list(), " 3.2 ", list(), " ", " 130C ", list(), " 0.0 (128C) ", list(), " 1.7 (129C) ", list(), " 1.8 ", list(), " 0.0 ", list(), " ", " 131 ", list(), " 0.2 (129N) ", list(), " 2.0 (130N) ", list(), " 2.2 ", list(), " 0.0 ", list(), " ", " ")) (Note that a previous example, taken from lot PB199188A , contained a typo.) the impurity correction matrix is list(list("lllllllllll"), list(" ", " ", " . ", list(), " 126 ", list(), " 127N ", list(), " 127C ", list(), " 128N ", list(), " 128C ", list(), " 129N ", list(), " 129C ", list(), " 130N ", list(), " 130C ", list(), " 131 ", list(), " ", " % reporter 126 ", list(), " 0.950 ", list(), " 0.000 ", list(), " 0.050 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " ", " % reporter 127N ",
list(), " 0.000 ", list(), " 0.940 ", list(), " 0.000 ", list(), " 0.058 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), "
", " % reporter 127C ", list(), " 0.003 ", list(), " 0.000 ", list(), " 0.949 ", list(), " 0.000 ", list(), " 0.048 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " ", " % reporter 128N ", list(), " 0.000 ", list(), " 0.004 ",
list(), " 0.000 ", list(), " 0.955 ", list(), " 0.000 ", list(), " 0.041 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), "
", " % reporter 128C ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.006 ", list(), " 0.000 ", list(), " 0.964 ", list(), " 0.000 ", list(), " 0.030 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " ", " % reporter 129N ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.008 ",
list(), " 0.000 ", list(), " 0.957 ", list(), " 0.000 ", list(), " 0.035 ", list(), " 0.000 ", list(), " 0.000 ", list(), "
", " % reporter 129C ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.014 ", list(), " 0.000 ", list(), " 0.962 ", list(), " 0.000 ", list(), " 0.024 ", list(), " 0.000 ", list(), " ", " % reporter 130N ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.001 ", list(), " 0.000 ", list(), " 0.015 ",
list(), " 0.000 ", list(), " 0.928 ", list(), " 0.000 ", list(), " 0.024 ", list(), "
", " % reporter 130C ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.017 ", list(), " 0.000 ", list(), " 0.965 ", list(), " 0.000 ", list(), " ", " % reporter 131 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.000 ", list(), " 0.002 ", list(), " 0.000 ", list(), " 0.020 ",
list(), " 0.000 ", list(), " 0.956 ", list(), "
", " ")) These examples are provided as defaults impurity correction matrices in makeImpuritiesMatrix
.
Examples
## quantifying full experiment
data(msnset)
impurities <- matrix(c(0.929,0.059,0.002,0.000,
0.020,0.923,0.056,0.001,
0.000,0.030,0.924,0.045,
0.000,0.001,0.040,0.923),
nrow=4, byrow = TRUE)
## or, using makeImpuritiesMatrix()
impurities <- makeImpuritiesMatrix(4)
msnset.crct <- purityCorrect(msnset, impurities)
head(exprs(msnset))
head(exprs(msnset.crct))
processingData(msnset.crct)
## default impurity matrix for iTRAQ 8-plex
makeImpuritiesMatrix(8, edit = FALSE)
## default impurity matrix for TMT 10-plex
makeImpuritiesMatrix(10, edit = FALSE)
quantify_methods()
Quantifies 'MSnExp' and 'Spectrum' objects
Description
This method quantifies individual "
objects or full "
experiments. Current,
MS2-level isobar tagging using iTRAQ and TMT (or any arbitrary peaks
of interest, see "
) and MS2-level
label-free quantitation (spectral counting, spectral index or spectral
abundance factor) are available.
Isobaric tag peaks of single spectra or complete experiments can be
quantified using appropriate methods
. Label-free quantitation
is available only for MSnExp
experiments.
Since version 1.13.5, parallel quantitation is supported by the
BiocParallel
package and controlled by the BPPARAM
argument.
Arguments
Argument | Description |
---|---|
object | An instance of class " (isobaric tagging only) or " . |
method | Peak quantitation method. For isobaric tags, one of, possibly abreviated "trapezoidation" , "max" , or "sum" . These methods return respectively the area under the peak(s), the maximum of the peak(s) or the sum of all intensities of the peak(s). For label-free quantitation, one of "SI" (spectral index), "SIgi" (global intensity spectral index), "SIn" (normalised spectral index), "SAF" (spectral abundance factor) or "NSAF" (normalised spectral abundance factor). Finally, the simple "count" method counts the occurrence of the respective spectra (at this stage all 1s) that can then be used as input to combineFeatures to implement spectra counting. |
reporters | An instance of class " that defines the peak(s) to be quantified. For isobaric tagging only. |
strict | For isobaric tagging only. If strict is FALSE (default), the quantitation is performed using data points along the entire width of a peak. If strict is set to TRUE , once the apex(es) is/are identified, only data points within apex +/- width of reporter (see " ) are used for quantitation. |
BPPARAM | Support for parallel processing using the BiocParallel infrastructure. When missing (default), the default registered BiocParallelParam parameters are applied using bpparam() . Alternatively, one can pass a valid BiocParallelParam parameter instance: SnowParam , MulticoreParam , DoparParam , list() see the BiocParallel package for details. |
parallel | Deprecated. Please see BPPARAM . |
qual | Should the qual slot be populated. Default is TRUE . |
pepseq | A character giving the peptide sequence column in the feature data. Default is "sequence" . |
verbose | Verbose of the output (only for MSnExp objects). |
... | Further arguments passed to the quantitation functions. |
Details
"
define specific MZ at which peaks
are expected and a window around that MZ value. A peak of interest is
searched for in that window. Since version 1.1.2, warnings are not
thrown anymore in case no data is found in that region or if the peak
extends outside the window. This can be checked manually after
quantitation, by inspecting the quantitation data (using the
exprs
accessor) for NA
values or by comaring the
lowerMz
and upperMz
columns in the
"
qual
slot against the respective
expected mz(reporters)
+/- width(reporters)
.
Once the range of the curve is found, quantification is performed. If
no data points are found in the expected region, NA
is returned
for the reporter peak MZ.
Note that for label-free, spectra that have not been identified (the
corresponding fields in the feature data are populated with NA
values) or that have been uniquely assigned to a protein (the
nprot
feature data is greater that 1) are removed prior to
quantitation. The latter does not apply for method = "count"
but can be applied manually with
removeMultipleAssignment
.
Author
Laurent Gatto lg390@cam.ac.uk and Sebastian Gibb mail@sebastiangibb.de
References
For details about the spectral index (SI), see Griffin NM, Yu J, Long F, Oh P, Shore S, Li Y, Koziol JA, Schnitzer JE. list("Label-free, ", " normalized quantification of complex mass spectrometry data for ", " proteomic analysis") . Nat Biotechnol. 2010 Jan;28(1):83-9. doi: 10.1038/nbt.1592. PMID: 20010810; PubMed Central PMCID: PMC2805705.
For details about the spectra abundance factor, see Paoletti AC, Parmely TJ, Tomomori-Sato C, Sato S, Zhu D, Conaway RC, Conaway JW, Florens L, Washburn MP. list("Quantitative proteomic analysis of ", " distinct mammalian Mediator complexes using normalized spectral ", " abundance factors") . PNAS. 2006 Dec 12;103(50):18928-33. PMID: 17138671; PubMed Central PMCID: PMC1672612.
Examples
## Quantifying a full experiment using iTRAQ4-plex tagging
data(itraqdata)
msnset <- quantify(itraqdata, method = "trap", reporters = iTRAQ4)
msnset
## specifying a custom parallel framework
## bp <- MulticoreParam(2L) # on Linux/OSX
## bp <- SnowParam(2L) # on Windows
## quantify(itraqdata[1:10], method = "trap", iTRAQ4, BPPARAM = bp)
## Checking for non-quantified peaks
sum(is.na(exprs(msnset)))
## Quantifying a single spectrum
qty <- quantify(itraqdata[[1]], method = "trap", iTRAQ4[1])
qty$peakQuant
qty$curveStats
## Label-free quantitation
## Raw (mzXML) and identification (mzid) files
quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "mzXML$")
identFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "dummyiTRAQ.mzid")
msexp <- readMSData(quantFile)
msexp <- addIdentificationData(msexp, identFile)
fData(msexp)$DatabaseAccess
si <- quantify(msexp, method = "SIn")
processingData(si)
exprs(si)
saf <- quantify(msexp, method = "NSAF")
processingData(saf)
exprs(saf)
readMSData()
Imports mass-spectrometry raw data files as 'MSnExp' instances.
Description
Reads as set of XML-based mass-spectrometry data files and
generates an MSnExp object. This function uses the
functionality provided by the mzR
package to access data and
meta data in mzData
, mzXML
and mzML
.
Usage
readMSData(files, pdata = NULL, msLevel. = NULL,
verbose = isMSnbaseVerbose(), centroided. = NA, smoothed. = NA,
cache. = 1L, mode = c("inMemory", "onDisk"))
Arguments
Argument | Description |
---|---|
files | A character with file names to be read and parsed. |
pdata | An object of class AnnotatedDataFrame or NULL (default). |
msLevel. | MS level spectra to be read. In inMemory mode, use 1 for MS1 spectra or any larger numeric for MSn spectra. Default is 2 for InMemory mode. onDisk mode supports multiple levels and will, by default, read all the data. |
verbose | Verbosity flag. Default is to use isMSnbaseVerbose() . |
centroided. | A logical , indicating whether spectra are centroided or not. Default is NA in which case the information is extracted from the raw file (for mzML or mzXML files). In onDisk , it can also be set for different MS levels by a vector of logicals, where the first element is for MS1, the second element is for MS2, ... See OnDiskMSnExp for an example. |
smoothed. | A logical indicating whether spectra already smoothed or not. Default is NA . |
cache. | Numeric indicating caching level. Default is 0 for MS1 and 1 MS2 (or higher). Only relevant for inMemory mode. |
mode | On of "inMemory" (default) or "onDisk" . The former loads the raw data in memory, while the latter only generates the object and the raw data is accessed on disk when needed. See the benchmarking vignette for memory and speed implications. |
Details
When using the inMemory
mode, the whole MS data is read from
file and kept in memory as Spectrum objects within the
MSnExp 'es assayData
slot.
To reduce the memory footpring especially for large MS1 data sets
it is also possible to read only selected information from the MS
files and fetch the actual spectrum data (i.e. the M/Z and
intensity values) only on demand from the original data
files. This can be achieved by setting mode = "onDisk"
. The
function returns then an OnDiskMSnExp object instead of a
MSnExp object.
Value
An MSnExp object for inMemory
mode and a
OnDiskMSnExp object for onDisk
mode.
Seealso
readMgfData()
to read mgf
peak lists.
Note
readMSData
uses normalizePath
to replace relative with
absolute file paths.
Author
Laurent Gatto
Examples
file <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE,
pattern = "mzXML$")
mem <- readMSData(file, mode = "inMemory")
mem
dsk <- readMSData(file, mode = "onDisk")
dsk
readMSnSet()
Read 'MSnSet'
Description
This function reads data files to generate an
MSnSet instance. It is a wrapper around
Biobase
's readExpressionSet
function with an
additional featureDataFile
parameter to include feature data.
See also readExpressionSet
for more details.
readMSnSet2
is a simple version that takes a single text
spreadsheet as input and extracts the expression data and feature
meta-data to create and MSnSet
.
Note that when using readMSnSet2
, one should not set
rownames
as additional argument to defined feature names. It is
ignored and used to set fnames
if not provided otherwise.
Usage
readMSnSet(exprsFile,
phenoDataFile,
featureDataFile,
experimentDataFile,
notesFile,
path, annotation,
exprsArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, ...),
phenoDataArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, stringsAsFactors = stringsAsFactors, ...),
featureDataArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, stringsAsFactors = stringsAsFactors, ...),
experimentDataArgs = list(sep = sep, header = header, row.names = row.names, quote = quote, stringsAsFactors = stringsAsFactors, ...),
sep = " ",
header = TRUE,
quote = "",
stringsAsFactors = FALSE,
row.names = 1L,
widget = getOption("BioC")$Base$use.widgets, ...)
readMSnSet2(file, ecol, fnames, ...)
Arguments
Argument | Description |
---|---|
exprsFile | (character) File or connection from which to read expression values. The file should contain a matrix with rows as features and columns as samples. read.table is called with this as its file argument and further arguments given by exprsArgs . |
phenoDataFile | (character) File or connection from which to read phenotypic data. read.AnnotatedDataFrame is called with this as its file argument and further arguments given by phenoDataArgs . |
experimentDataFile | (character) File or connection from which to read experiment data. read.MIAME is called with this as its file argument and further arguments given by experimentDataArgs . |
notesFile | (character) File or connection from which to read notes; readLines is used to input the file. |
path | (optional) directory in which to find all the above files. |
annotation | (character) A single character string indicating the annotation associated with this ExpressionSet. |
exprsArgs | A list of arguments to be used with read.table when reading in the expression matrix. |
phenoDataArgs | A list of arguments to be used (with read.AnnotatedDataFrame ) when reading the phenotypic data. |
experimentDataArgs | A list of arguments to be used (with read.MIAME ) when reading the experiment data. |
sep, header, quote, stringsAsFactors, row.names | arguments used by the read.table -like functions. |
widget | A boolean value indicating whether widgets can be used. Widgets are NOT yet implemented for read.AnnotatedDataFrame . |
list() | Further arguments that can be passed on to the read.table -like functions. |
featureDataFile | (character) File or connection from which to read feature data. read.AnnotatedDataFrame is called with this as its file argument and further arguments given by phenoDataArgs . |
featureDataArgs | A list of arguments to be used (with read.AnnotatedDataFrame ) when reading the phenotypic data. |
file | A character indicating the spreadsheet file or a data.frame (new in version 1.19.8). Default, when file is a character , is to read the file as a comma-separated values (csv). If different, use the additional arguments, passed to read.csv , to parametrise file import. Passing a data.frame can be particularly useful if the spreadsheet is in Excel format. The appropriate sheet can first be read into R as a data.frame using, for example readxl::read_excel , and then pass it to readMSnSet2 . |
ecol | A numeric indicating the indices of the columns to be used as expression values. Can also be a character indicating the names of the columns. Caution must be taken if the column names are composed of special characters like ( or - that will be converted to a . . If ecol does not match, the error message will dislpay the column names are see by R . |
fnames | An optional character or numeric of length 1 indicating the column to be used as feature names. |
Value
An instance of the MSnSet class.
Seealso
The grepEcols
and getEcols
helper
functions to identify the ecol
values. The MSnbase-io
vignette illustrates these functions in detail. It can be accessed
with vignette("MSnbase-io")
.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
exprsFile <- "path_to_intensity_file.csv"
fdatafile <- "path_to_featuredata_file.csv"
pdatafile <- "path_to_sampledata_file.csv"
## Read ExpressionSet with appropriate parameters
res <- readMSnSet(exprsFile, pdataFile, fdataFile, sep = " ", header=TRUE)
library("pRolocdata")
f0 <- dir(system.file("extdata", package = "pRolocdata"),
full.names = TRUE,
pattern = "Dunkley2006")
basename(f0)
res <- readMSnSet2(f0, ecol = 5:20)
res
head(exprs(res)) ## columns 5 to 20
head(fData(res)) ## other columns
readMgfData()
Import mgf files as 'MSnExp' instances.
Description
Reads a mgf file and generates an "
object.
Usage
readMgfData(filename, pdata = NULL, centroided = TRUE, smoothed = FALSE,
verbose = isMSnbaseVerbose(), cache = 1)
Arguments
Argument | Description |
---|---|
filename | character vector with file name to be read. |
pdata | an object of class " . |
smoothed | Logical indicating whether spectra already smoothed or not. Default is 'FALSE'. Used to initialise " object in processingData slot. |
centroided | Logical indicating whether spectra are centroided or not. Default is 'TRUE'. Used to initialise " object in processingData slot. |
cache | Numeric indicating caching level. Default is 1. Under development. |
verbose | verbosity flag. |
Details
Note that when reading an mgf file, the original order of the spectra
is lost. Thus, if the data was originally written to mgf from an
MSnExp
object using writeMgfData
, although the feature
names will be identical, the spectra are not as a result of the
reordering. See example below.
Value
An instance of
Seealso
writeMgfData
method to write the content of
"
or "
objects to mgf files. Raw data files can also be read with the
readMSData
function.
Author
Guangchuang Yu guangchuangyu@gmail.com and Laurent Gatto lg390@cam.ac.uk
Examples
data(itraqdata)
writeMgfData(itraqdata, con="itraqdata.mgf", COM="MSnbase itraqdata")
itraqdata2 <- readMgfData("itraqdata.mgf")
## note that the order of the spectra is altered
## and precision of some values (precursorMz for instance)
match(signif(precursorMz(itraqdata2),4),signif(precursorMz(itraqdata),4))
## [1] 1 10 11 12 13 14 15 16 17 18 ...
## ... but all the precursors are there
all.equal(sort(precursorMz(itraqdata2)),
sort(precursorMz(itraqdata)),
check.attributes=FALSE,
tolerance=10e-5)
## is TRUE
all.equal(as.data.frame(itraqdata2[[1]]),as.data.frame(itraqdata[[1]]))
## is TRUE
all.equal(as.data.frame(itraqdata2[[3]]),as.data.frame(itraqdata[[11]]))
## is TRUE
f <- dir(system.file(package="MSnbase",dir="extdata"),
full.name=TRUE,
pattern="test.mgf")
(x <- readMgfData(f))
x[[2]]
precursorMz(x[[2]])
precursorIntensity(x[[2]])
precursorMz(x[[1]])
precursorIntensity(x[[1]]) ## was not in test.mgf
scanIndex(x)
readMzIdData()
Import peptide-spectrum matches
Description
Reads as set of mzId
files containing PSMs an generates a
data.frame
.
Usage
readMzIdData(files)
Arguments
Argument | Description |
---|---|
files | A character of mzid files. |
Details
This function uses the functionality provided by the mzR
package
to access data in the mzId
files. An object of class mzRident
can also be coerced to a data.frame
using as(, "data.frame")
.
Value
A data.frame
containing the PSMs stored in the mzId
files.
Seealso
filterIdentificationDataFrame()
to filter out unreliable PSMs.
Author
Laurent Gatto
Examples
idf <- "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid"
f <- msdata::ident(full.names = TRUE, pattern = idf)
basename(f)
readMzIdData(f)
readMzTabData()
Read an 'mzTab' file
Description
This function can be used to create an
"
by reading and parsing an
mzTab
file. The metadata section is always used to populate
the MSnSet
's experimentData()@other$mzTab
slot.
Usage
readMzTabData(file, what = c("PRT", "PEP", "PSM"), version = c("1.0",
"0.9"), verbose = isMSnbaseVerbose())
Arguments
Argument | Description |
---|---|
file | A character with the mzTab file to be read in. |
what | One of "PRT" , "PEP" or "PSM" , defining which of protein, peptide PSMs section should be returned as an MSnSet . |
version | A character defining the format specification version of the mzTab file. Default is "1.0" . Version "0.9" is available of backwards compatibility. See readMzTabData_v0.9 for details. |
verbose | Produce verbose output. |
Value
An instance of class MSnSet
.
Seealso
See MzTab
and MSnSetList
for
details about the inners of readMzTabData
.
Author
Laurent Gatto
Examples
testfile <- "https://raw.githubusercontent.com/HUPO-PSI/mzTab/master/examples/1_0-Proteomics-Release/PRIDE_Exp_Complete_Ac_16649.xml-mztab.txt"
prot <- readMzTabData(testfile, "PRT")
prot
head(fData(prot))
head(exprs(prot))
psms <- readMzTabData(testfile, "PSM")
psms
head(fData(psms))
readMzTabData_v09()
Read an 'mzTab' file
Description
This function can be used to create a "
by reading and parsing an mzTab
file. The metadata section
is always used to populate the MSnSet
's experimentData
slot.
Usage
readMzTabData_v0.9(file, what = c("PRT", "PEP"),
verbose = isMSnbaseVerbose())
Arguments
Argument | Description |
---|---|
file | A character with the mzTab file to be read in. |
what | One of "PRT" or "PEP" , defining which of protein of peptide section should be parse. The metadata section, when available, is always used to populate the experimentData slot. |
verbose | Produce verbose output. |
Value
An instance of class MSnSet
.
Seealso
writeMzTabData
to save an
"
as an mzTab
file.
Author
Laurent Gatto
Examples
testfile <- "https://raw.githubusercontent.com/HUPO-PSI/mzTab/master/legacy/jmztab-1.0/examples/mztab_itraq_example.txt"
prot <- readMzTabData_v0.9(testfile, "PRT")
prot
pep <- readMzTabData_v0.9(testfile, "PEP")
pep
readSRMData()
Read SRM/MRM chromatographic data
Description
The readSRMData
function reads MRM/SRM data from provided mzML files and
returns the results as a Chromatograms()
object.
Usage
readSRMData(files, pdata = NULL)
Arguments
Argument | Description |
---|---|
files | character with the files containing the SRM/MRM data. |
pdata | data.frame or AnnotatedDataFrame with file/sample descriptions. |
Details
readSRMData
supports reading chromatogram entries from mzML files. If
multiple files are provided the same precursor and product m/z for SRM/MRM
chromatograms are expected across files. The number of columns of the
resulting Chromatograms()
object corresponds to the number of files. Each
row in the Chromatograms()
object is supposed to contain chromatograms
with same polarity, precursor and product m/z. If chromatograms with
redundant polarity, precursor and product m/z values are found, they are
placed into multiple consecutive rows in the Chromatograms()
object.
Value
A Chromatograms()
object. See details above for more information.
Note
readSRMData
reads only SRM/MRM chromatogram data, i.e. chromatogram data
from mzML files with precursorIsolationWindowTargetMZ
and
productIsolationWindowTargetMZ
attributes. Total ion chromatogram data is
hence not extracted.
The number of features and hence rows of the resulting Chromatograms
object depends on the total list of unique precursor and product m/z
isolation windows found across all input files. In cases in which not each
file has chromatgraphic data for the same polarity, precursor and product
m/z, an empty Chromatogram()
object is reported for the specific precursor
and product m/z combination of the respective file (and a warning is
thrown).
Author
Johannes Rainer
Examples
## Read an example MRM/SRM data
library(msdata)
fl <- proteomics(full.names = TRUE, pattern = "MRM")
## Read the data
mrm <- readSRMData(fl)
## The data is represented as a Chromatograms object, each column
## containing the data from one input file
mrm
## Access the polarity for each chromatogram (row)
polarity(mrm)
## Access the precursor m/z. The result is returned as a matrix with
## columns representing the minimum and maximum m/z (will be identical in
## most cases).
precursorMz(mrm)
## Access the product m/z.
productMz(mrm)
## Plot one chromatogram
plot(mrm[1, ])
reduce_dataframe_method()
Reduce a data.frame
Description
Reduce a data.frame so that the (primary) key column contains only unique entries and other columns pertaining to that entry are combined into semicolon-separated values into a single row/observation.
Usage
list(list("reduce"), list("data.frame"))(x, key, sep = ";")
Arguments
Argument | Description |
---|---|
x | A data.frame . |
key | The column name (currenly only one is supported) to be used as primary key. |
sep | The separator. Default is ; . |
Details
An important side-effect of reducing a data.frame
is that all
columns other than the key are converted to characters when they
are collapsed to a semi-column separated value (even if only one
value is present) as soon as one observation of transformed.
Value
A reduced data.frame
.
Author
Laurent Gatto
Examples
dfr <- data.frame(A = c(1, 1, 2),
B = c("x", "x", "z"),
C = LETTERS[1:3])
dfr
dfr2 <- reduce(dfr, key = "A")
dfr2
## column A used as key is still num
str(dfr2)
dfr3 <- reduce(dfr, key = "B")
dfr3
## A is converted to chr; B remains factor
str(dfr3)
dfr4 <- data.frame(A = 1:3,
B = LETTERS[1:3],
C = c(TRUE, FALSE, NA))
## No effect of reducing, column classes are maintained
str(reduce(dfr4, key = "B"))
removeNoId_methods()
Removes non-identified features
Description
The method removes non-identifed features in MSnExp
and MSnSet
instances using relevant information from the
feaureData
slot of a user-provide filtering vector of logicals.
Seealso
MSnExp and MSnSet
.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "mzXML$")
identFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "dummyiTRAQ.mzid")
msexp <- readMSData(quantFile)
msexp <- addIdentificationData(msexp, identFile)
fData(msexp)$sequence
length(msexp)
## using default fcol
msexp2 <- removeNoId(msexp)
length(msexp2)
fData(msexp2)$sequence
## using keep
print(fvarLabels(msexp))
(k <- fData(msexp)$'MS.GF.EValue' > 75)
k[is.na(k)] <- FALSE
k
msexp3 <- removeNoId(msexp, keep = k)
length(msexp3)
fData(msexp3)$sequence
removePeaks_methods()
Removes low intensity peaks
Description
This method sets low intensity peaks from individual spectra
( Spectrum
instances) or whole experiments ( MSnExp
instances) to 0. The intensity threshold is set with the t
parameter. Default is the "min"
character. The threshold is
then set as the non-0 minimum intensity found in the spectrum. Any
other numeric values is valid. All peaks with maximum intensity
smaller or equal to t
are set to 0.
If the spectrum is in profile mode, ranges of successive non-0 peaks
<= t
are set to 0. If the spectrum is centroided, then
individual peaks <= t
are set to 0. See the example below for
an illustration.
Note that the number of peaks is not changed; the peaks below the
threshold are set to 0 and the object is not cleanded out (see
clean
). An illustrative example is shown below.
Seealso
clean
and trimMz
for other spectra
processing methods.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
int <- c(2, 0, 0, 0, 1, 5, 1, 0, 0, 1, 3, 1, 0, 0, 1, 4, 2, 1)
sp1 <- new("Spectrum2",
intensity = int,
mz = 1:length(int),
centroided = FALSE)
sp2 <- removePeaks(sp1) ## no peaks are removed here
## as min intensity is 1 and
## no peak has a max int <= 1
sp3 <- removePeaks(sp1, 3)
intensity(sp1)
intensity(sp2)
intensity(sp3)
peaksCount(sp1) == peaksCount(sp2)
peaksCount(sp3) <= peaksCount(sp1)
data(itraqdata)
itraqdata2 <- removePeaks(itraqdata, t = 2.5e5)
table(unlist(intensity(itraqdata)) == 0)
table(unlist(intensity(itraqdata2)) == 0)
processingData(itraqdata2)
## difference between centroided and profile peaks
int <- c(104, 57, 32, 33, 118, 76, 38, 39, 52, 140, 52, 88, 394, 71,
408, 94, 2032)
sp <- new("Spectrum2",
intensity = int,
centroided = FALSE,
mz = seq_len(length(int)))
## unchanged, as ranges of peaks <= 500 considered
intensity(removePeaks(sp, 500))
stopifnot(identical(intensity(sp), intensity(removePeaks(sp, 500))))
centroided(sp) <- TRUE
## different!
intensity(removePeaks(sp, 500))
removeReporters_methods()
Removes reporter ion tag peaks
Description
This methods sets all the reporter tag ion peaks from one MS2
spectrum or all the MS2 spectra of an experiment to 0. Reporter data
is specified using an "
instance. The peaks are selected around the expected reporter ion
m/z value +/- the reporter width.
Optionally, the spectrum/spectra can be cleaned
to
remove successive 0 intensity data points (see the clean
function for details).
Note that this method only works for MS2 spectra or experiments that contain MS2 spectra. It will fail for MS1 spectrum.
Seealso
clean
and removePeaks
for other spectra
processing methods.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
sp1 <- itraqdata[[1]]
sp2 <- removeReporters(sp1,reporters=iTRAQ4)
sel <- mz(sp1) > 114 & mz(sp1) < 114.2
mz(sp1)[sel]
intensity(sp1)[sel]
plot(sp1,full=TRUE,reporters=iTRAQ4)
intensity(sp2)[sel]
plot(sp2,full=TRUE,reporters=iTRAQ4)
selectFeatureData()
Select feature variables of interest
Description
Select feature variables to be retained.
requiredFvarLabels
returns a character
vector with the
required feature data variable names ( fvarLabels
, i.e. the column
names in the fData
data.frame
) for the specified object.
Usage
selectFeatureData(object, graphics = TRUE, fcol)
requiredFvarLabels(x = c("OnDiskMSnExp", "MSnExp", "MSnSet"))
Arguments
Argument | Description |
---|---|
object | An MSnSet , MSnExp or OnDiskMSnExp . |
graphics | A logical (default is TRUE ) indicating whether a shiny application should be used if available. Otherwise, a text menu is used. Ignored if k is not missing. |
fcol | A numeric , logical or character of valid feature variables to be passed directly. |
x | character(1) specifying the class name for which the required feature data variable names should be returned. |
Value
For selectFeatureData
: updated object containing only
selected feature variables.
For requiredFvarLabels
: character
with the required feature
variable names.
Author
Laurent Gatto
Examples
library("pRolocdata")
data(hyperLOPIT2015)
## 5 first feature variables
x <- selectFeatureData(hyperLOPIT2015, fcol = 1:5)
fvarLabels(x)
## select via GUI
x <- selectFeatureData(hyperLOPIT2015)
fvarLabels(x)
## Subset the feature data of an OnDiskMSnExp object to the minimal
## required columns
f <- system.file("microtofq/MM14.mzML", package = "msdata")
od <- readMSData(f, mode = "onDisk")
## what columns do we have?
fvarLabels(od)
## Reduce the feature data data.frame to the required columns only
od <- selectFeatureData(od, fcol = requiredFvarLabels(class(od)))
fvarLabels(od)
smooth_methods()
Smooths 'MSnExp' or 'Spectrum' instances
Description
This method smooths individual spectra ( Spectrum
instances)
or whole experiments ( MSnExp
instances).
Currently, the Savitzky-Golay-Smoothing ( method = "SavitzkyGolay"
)
and the Moving-Average-Smoothing ( method = "MovingAverage"
) are
available, as implemented in the MALDIquant::smoothIntensity
function.
Additional methods might be added at a later stage.
Seealso
clean
, pickPeaks
, removePeaks
and
trimMz
for other spectra processing methods.
Author
Sebastian Gibb mail@sebastiangibb.de
References
A. Savitzky and M. J. Golay. 1964. Smoothing and differentiation of data by simplified least squares procedures. Analytical chemistry, 36(8), 1627-1639.
M. U. Bromba and H. Ziegler. 1981. Application hints for Savitzky-Golay digital smoothing filters. Analytical Chemistry, 53(11), 1583-1586.
S. Gibb and K. Strimmer. 2012. MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics 28: 2270-2271. http://strimmerlab.org/software/maldiquant/
Examples
sp1 <- new("Spectrum1",
intensity = c(1:6, 5:1),
mz = 1:11)
sp2 <- smooth(sp1, method = "MovingAverage", halfWindowSize = 2)
intensity(sp2)
data(itraqdata)
itraqdata2 <- smooth(itraqdata,
method = "MovingAverage",
halfWindowSize = 2)
processingData(itraqdata2)
trimMz_methods()
Trims 'MSnExp' or 'Spectrum' instances
Description
This method selects a range of MZ values in a single spectrum
( Spectrum
instances) or all the spectra of an experiment
( MSnExp
instances). The regions to trim are defined by the
range of mz
argument, such that MZ values <= min(mz)
and
MZ values >= max(mz)
are trimmed away.
Seealso
removePeaks
and clean
for other spectra
processing methods.
Author
Laurent Gatto lg390@cam.ac.uk
Examples
mz <- 1:100
sp1 <- new("Spectrum2",
mz = mz,
intensity = abs(rnorm(length(mz))))
sp2 <- trimMz(sp1, c(25, 75))
range(mz(sp1))
range(mz(sp2))
data(itraqdata)
itraqdata2 <- filterMz(itraqdata, c(113, 117))
range(mz(itraqdata))
range(mz(itraqdata2))
processingData(itraqdata2)
updateObject_methods()
Update MSnbase objects
Description
Methods for function updateObject
for objects from the MSnbase
package. See updateObject
for details.
writeMSData()
Write MS data to mzML or mzXML files
Description
The writeMSData,MSnExp
and writeMSData,OnDiskMSnExp
saves
the content of a MSnExp or OnDiskMSnExp object to MS file(s) in
either mzML or mzXML format.
Usage
list(list("writeMSData"), list("MSnExp,character"))(object, file,
outformat = c("mzml", "mzxml"), merge = FALSE,
verbose = isMSnbaseVerbose(), copy = FALSE,
software_processing = NULL)
Arguments
Argument | Description |
---|---|
object | OnDiskMSnExp or MSnExp object. |
file | character with the file name(s). Its length has to match the number of samples/files of x . |
outformat | character(1) defining the format of the output files. Default output format is "mzml" . |
merge | logical(1) whether the data should be saved into a single mzML file. Default is merge = FALSE , i.e. each sample is saved to a separate file. Note : merge = TRUE is not yet implemented. |
verbose | logical(1) if progress messages should be displayed. |
copy | logical(1) if metadata (data processings, original file names etc) should be copied from the original files. See details for more information. |
software_processing | optionally provide specific data processing steps. See documentation of the software_processing parameter of mzR::writeMSData() . |
Details
The writeMSData
method uses the proteowizard libraries through
the mzR
package to save the MS data. The data can be written to
mzML or mzXML files with or without copying additional metadata
information from the original files from which the data was read by the
readMSData()
function. This can be set using the copy
parameter.
Note that copy = TRUE
requires the original files to be available and
is not supported for input files in other than mzML or mzXML format.
All metadata related to the run is copied, such as instrument
information, data processings etc. If copy = FALSE
only processing
information performed in R (using MSnbase
) are saved to the mzML file.
Currently only spectrum data is supported, i.e. if the original mzML file contains also chromatogram data it is not copied/saved to the new mzML file.
Note
General spectrum data such as total ion current, peak count, base peak m/z or base peak intensity are calculated from the actual spectrum data before writing the data to the files.
For MSn data, if the OnDiskMSnExp
or MSnExp
does not contain also
the precursor scan of a MS level > 1 spectrum (e.g. due to filtering on
the MS level) precursorScanNum
is set to 0 in the output file to
avoid potentially linking to a wrong spectrum.
The exported mzML
file should be valid according to the mzML 1.1.2
standard. For exported mzXML
files it can not be guaranteed that they
are valid and can be opened with other software than mzR
/ MSnbase
.
Author
Johannes Rainer
writeMgfData_methods()
Write an experiment or spectrum to an mgf file
Description
Methods writeMgfData
write individual
"
instances of whole
"
experiments to a file
in Mascot Generic Format (mgf) (see
http://www.matrixscience.com/help/data_file_help.html
for more details). Function readMgfData
read spectra from and
mgf file and creates an "
object.
Arguments
Argument | Description |
---|---|
object | An instance of class " or " . |
con | A valid connection or a character string with the name of the file to save the object. In case of the latter, a file connection is created. If not specified, 'spectrum.mgf' or 'experiment.mgf' are used depending on the class of object . Note that existing files are overwritted. |
COM | Optional character vector with the value for the 'COM' field. |
TITLE | Optional character vector with the value for the spectrum 'TITLE' field. Not applicable for experiments. |
Details
Note that when reading an mgf file, the original order of the spectra
is lost. Thus, if the data was originally written to mgf from an
MSnExp
object using writeMgfData
, although the feature
names will be identical, the spectra are not as a result of the
reordering. See example below.
Seealso
readMgfData
function to read data from and mgf file.
Examples
data(itraqdata)
writeMgfData(itraqdata,file="itraqdata.mgf",COM="MSnbase itraqdata")
itraqdata2 <- readMgfData("itraqdata.mgf")
## note that the order of the spectra
## and precision of some values (precursorMz for instance)
## are altered
match(signif(precursorMz(itraqdata2),4),signif(precursorMz(itraqdata),4))
## [1] 1 10 11 12 13 14 15 16 17 18 ...
## ... but all the precursors are there
all.equal(sort(precursorMz(itraqdata2)),sort(precursorMz(itraqdata)),
check.attributes=FALSE,
tolerance=10e-5)
## is TRUE
all.equal(as.data.frame(itraqdata2[[1]]),as.data.frame(itraqdata[[1]]))
## is TRUE
all.equal(as.data.frame(itraqdata2[[3]]),as.data.frame(itraqdata[[11]]))
## is TRUE
## But, beware that
all(featureNames(itraqdata2)==featureNames(itraqdata))
## is TRUE too!