bioconductor v3.9.0 MzR
mzR provides a unified API to the common file formats and
Link to this section Summary
Functions
Write MS spectrum data to a MS file copying metadata from the originating file
Returns the ion selection isolation window
Access the metadata from an mzR object.
Class mzR and sub-classes
Create and check mzR objects from netCDF, mzXML, mzData or mzML files.
Access the raw data from an mzR object.
Get the version number of pwiz backend.
Write MS spectrum data to an MS file
Link to this section Functions
copyWriteMSData()
Write MS spectrum data to a MS file copying metadata from the originating file
Description
Copy general information from the originating MS file and write this,
along with the provided spectra data, to a new file. The expected
workflow is the following: data is first loaded from an MS file,
e.g. using peaks and header methods,
processed in R and then saved again to an MS file providing the
(eventually) manipulated spectra and header data with arguments
header and data .
Usage
copyWriteMSData(object, file, original_file, header, backend =
"pwiz", outformat = "mzml", rtime_seconds = TRUE, software_processing)
Arguments
| Argument | Description |
|---|---|
object | list containing for each spectrum one matrix with columns mz (first column) and intensity (second column). See also peaks for the method that reads such data from an MS file. |
file | character(1) defining the name of the file. |
original_file | character(1) with the name of the original file from which the spectrum data was first read. |
header | data.frame with the header data for the spectra. Has to be in the format as the data.frame returned by the header method. |
backend | character(1) defining the backend that should be used for writing. Currently only "pwiz" backend is supported. |
outformat | character(1) the format of the output file. One of "mzml" or "mzxml" . |
rtime_seconds | logical(1) whether the retention time is provided in seconds or minutes (defaults to TRUE ). |
software_processing | list of character vectors (or single character vector). Each character vector providing information about the software that was used to process the data with optional additional description of processing steps. The length of each character vector has to be >= 3: the first element being the name of the software, the second string its version and the third element the MS CV ID of the software (or "MS:-1" if not known). All additional elements are optional and represent the MS CV ID of each processing step performed with the software. |
Seealso
writeMSData for a function to save MS data to a new mzML
or mzXML file.
Note
copyWriteMSData supports at present copying data from
mzXML and mzML and exporting to mzML . Copying and
exporting to mzXML can fail for some input files.
The intention of this function is to copy data from an existing file
and save it along with eventually modified data to a new file. To
write new MS data files use the writeMSData function
instead.
Author
Johannes Rainer
Examples
## Open a MS file and read the spectrum and header information
library(msdata)
fl <- system.file("threonine", "threonine_i2_e35_pH_tree.mzXML",
package = "msdata")
ms_fl <- openMSfile(fl, backend = "pwiz")
## Get the spectra
pks <- spectra(ms_fl)
## Get the header
hdr <- header(ms_fl)
## Modify the spectrum data adding 100 to each intensity.
pks <- lapply(pks, function(z) {
z[, 2] <- z[, 2] + 100
z
})
## Copy metadata and additional information from the originating file
## and save it, along with the modified data, to a new mzML file.
out_file <- tempfile()
copyWriteMSData(pks, file = out_file, original_file = fl,
header = hdr)
isolationWindow_methods()
Returns the ion selection isolation window
Description
The methods return matrices of lower (column low ) and upper
(column high ) isolation window offsets. Matrices are returned
as a list of length equal to the number of input files (provided as
file names of raw mass spectrometry data objects, see below). By
default (i.e when unique. = TRUE ), only unique offsets are
returned, as they are expected to identical for all spectra per
acquisition. If this is not the case, a message is displayed.
Author
Laurent Gatto lg390@cam.ac.uk based on the functionality from the
msPurity:::get_isolation_offsets function.
Examples
library("msdata")
f <- msdata::proteomics(full.names = TRUE,
pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz")
isolationWindow(f)
rw <- openMSfile(f)
isolationWindow(rw)
str(isolationWindow(rw, unique = FALSE))
metadata()
Access the metadata from an mzR object.
Description
Accessors to the analytical setup metadata of a run.
runInfo will show a summary of the experiment as a named list,
including scanCount , lowMZ , highMZ ,
dStartTime , dEndTime and startTimeStamp . Note
that startTimeStamp can only be extracted from mzML
files using the pwiz backend or from CDF files. A
NA is reported if its value is not available.
The instrumentInfo method returns a named list including
instrument manufacturer, model, ionisation technique, analyzer and
detector. mzRpwiz will give more additional information including
information on sample, software using and original source file.
These individual pieces of information can also be directly
accessed by the specific methods.
mzidInfo is used for the mzR object created from a mzid file.
It returns basic information on this mzid file including file provider,
creation date, software, database, enzymes and spectra data format.
The mzidInfo will return the scoring results in identification.
It will return different results for different searching software used.
Usage
runInfo(object)
chromatogramsInfo(object)
analyzer(object)
detector(object)
instrumentInfo(object)
ionisation(object)
softwareInfo(object)
sampleInfo(object)
sourceInfo(object)
model(object)
mzidInfo(object)
modifications(object, ...)
psms(object, ...)
substitutions(object)
database(object, ...)
enzymes(object)
tolerance(object)
score(x, ...)
para(object)
specParams(object)
Arguments
| Argument | Description |
|---|---|
object | An instantiated mzR object. |
x | An instantiated mzR object. |
... | Additional arguments, currently ignored. |
Seealso
See for example peaks to access the data for the spectra
in a " class.
Author
Steffen Neumann, Laurent Gatto and Qiang Kou
Examples
library(msdata)
file <- system.file("microtofq/MM8.mzML", package = "msdata")
mz <- openMSfile(file)
fileName(mz)
instrumentInfo(mz)
runInfo(mz)
close(mz)
file <- system.file("cdf/ko15.CDF", package = "msdata")
mz <- openMSfile(file, backend = "netCDF")
fileName(mz)
instrumentInfo(mz)
runInfo(mz)
close(mz)
file <- system.file("mzid", "Tandem.mzid.gz", package="msdata")
mzid <- openIDfile(file)
softwareInfo(mzid)
enzymes(mzid)
mzR_class()
Class mzR and sub-classes
Description
The class mzR is the main class for the common mass spectrometry
formats. It is a virtual class and thus not supposed to be instanciated
directly.
The sub-classes implement specific APIs to access the underlying data
and metadata in the files. Currently, mzRramp and
mzRpwiz are available implementations. mzRramp uses the
ISB 'RAMP' random access C/C++ API, and mzRpwiz uses
Proteowizard to access the relevant information in mzData ,
mzXML and mzML files. mzRident is used as an
interface to mzIdentML files.
IMPORTANT: New developers that need to access and manipulate raw mass
spectrometry data are advised against using this infrastucture
directly. They are invited to use the corresponding MSnExp
(with on disk mode) from the MSnbase package instead. The
latter supports reading multiple files at once and offers access to
the spectra data (m/z and intensity) as well as all the spectra
metadata using a coherent interface. The MSnbase infrastructure itself
used the low level classes in mzR, thus offering fast and efficient
access.
Author
Steffen Neumann, Laurent Gatto, Qiang Kou
References
RAMP: http://tools.proteomecenter.org/wiki/index.php?title=Software:RAMP Proteowizard: http://proteowizard.sourceforge.net/
Examples
library(msdata)
filepath <- system.file("microtofq", package = "msdata")
file <- list.files(filepath, pattern="MM14.mzML",
full.names=TRUE, recursive = TRUE)
mzml <- openMSfile(file)
close(mzml)
## using the pwiz backend
mzml <- openMSfile(file, backend = "pwiz")
openMSfile()
Create and check mzR objects from netCDF, mzXML, mzData or mzML files.
Description
The openMSfile constructor will create a new format-specifc
mzR object, open 'filename' file and all header information is
loaded as a Rcpp module and made accessible through the ramp or pwiz
slot of the resulting object.
The openIDfile constructor will create a new format-specifc
mzR object, open 'filename' file and all information is
loaded as a Rcpp module. The mzid format is supported throught
pwiz backend. Only mzIdentML 1.1 is supported.
% If the object was created manually with e.g. new("mzR"), then % initializeRamp() will open the given file. An mzR object can be % queried, whether it is assigned to a raw file with isInitialized(), % and finally fileName(object) will return the path to the file "behind" % the mzR object.
Usage
openMSfile(filename, backend = NULL, verbose = FALSE)
initializeRamp(object)
isInitialized(object)
fileName(object, ...)
openIDfile(filename, verbose = FALSE)
Arguments
| Argument | Description |
|---|---|
filename | Path name of the netCDF, mzData, mzXML or mzML file to read/write. |
backend | A character(1) specifiying which backend API to use. Currently 'Ramp', 'netCDF' and 'pwiz' are supported. If backend = NULL (the default), the function tries to determine the backend to be used based on either the file extension of the file content. |
object | An instantiated mzR object. |
verbose | Enable verbose output. |
... | Additional arguments, currently ignored. |
Author
Steffen Neumann, Laurent Gatto, Qiang Kou
Examples
library(msdata)
filepath <- system.file("microtofq", package = "msdata")
file <- list.files(filepath, pattern="MM14.mzML",
full.names=TRUE, recursive = TRUE)
mz <- openMSfile(file)
fileName(mz)
runInfo(mz)
close(mz)
## to use another backend
mz <- openMSfile(file, backend = "pwiz")
mz
file <- system.file("mzid", "Tandem.mzid.gz", package="msdata")
mzid <- openIDfile(file)
softwareInfo(mzid)
enzymes(mzid)
peaks()
Access the raw data from an mzR object.
Description
Access the MS raw data. The peaks , spectra (can be used
interchangeably) and peaksCount functions return the (m/z,
intensity) pairs and the number peaks in the
spectrum/spectra. peaks and spectra return a single
matrix if scans is a numeric of length 1 and a list of
matrices if several scans are asked for or no scans argument is
provided (i.e all spectra in the oject are retured). peaksCount
will return a numeric of length n .
The header function returns a list containing
seqNum , acquisitionNum , msLevel ,
peaksCount , totIonCurrent , retentionTime (in
seconds), basePeakMZ , basePeakIntensity ,
collisionEnergy , ionisationEnergy , lowM ,
highMZ , precursorScanNum , precursorMZ ,
precursorCharge , precursorIntensity ,
mergedScan , mergedResultScanNum ,
mergedResultStartScanNum , mergedResultEndScanNum ,
filterString , spectrumId , centroided
( logical whether the data of the spectrum is in centroid mode
or profile mode; only for pwiz backend), injectionTime (ion
injection time, in milliseconds), ionMobilityDriftTime (in
milliseconds), isolationWindowTargetMZ ,
isolationWindowLowerOffset and
isolationWindowUpperOffset . If multiple scans are queried, a
data.frame is returned with the scans reported along the
rows.
The get3Dmap function performs a simple resampling between
lowMz and highMz with reMz resolution. A matrix
of dimensions length(scans) times
seq(lowMz,highMz,resMz) is returned.
The chromatogram ( chromatograms ) accessors return
chromatograms for the MS file handle. If a single index is provided,
as data.frame containing the retention time (1st columns) and
intensities (2nd column) is returned. The name of the former is always
time , while the latter will depend on the run parameters.
If more than 1 or no chromatogram indices are provided, then a list of
chromatograms is returned; either those passed as argument or all of
them. By default, the first (and possibly only) chromatogram is the
total ion count, which can also be accessed with the tic
method.
The nChrom function returns the number of chromatograms,
including the total ion chromatogram.
The chromatogramHeader returns (similar to the header
function for spectra) a data.frame with metadata information
for the individual chromatograms. The data.frame has the
columns: "chromatogramId" (the ID of the chromatogram as
specified in the file), "chromatogramIndex" (the index of the
chromatogram within the file), "polarity" (the polarity for the
chromatogram, 0 for negative, +1 for positive and -1 for not set),
"precursorIsolationWindowTargetMZ" (the isolation window m/z of
the precursor), "precursorIsolationWindowLowerOffset" ,
"precursorIsolationWindowUpperOffset" (lower and upper offset
for the isolation window m/z), "precursorCollisionEnergy"
(collision energy),
"productIsolationWindowTargetMZ" ,
"productIsolationWindowLowerOffset" and
"productIsolationWindowUpperOffset" (definition of the m/z
isolation window for the product).
Note that access to chromatograms is only supported in the pwiz
backend.
Usage
header(object, scans, ...)
peaksCount(object, scans, ...)
list(list("peaks"), list("mzRpwiz"))(object, scans)
list(list("peaks"), list("mzRramp"))(object, scans)
list(list("peaks"), list("mzRnetCDF"))(object, scans)
list(list("spectra"), list("mzRpwiz"))(object, scans) ## same as peaks
list(list("spectra"), list("mzRramp"))(object, scans)
list(list("spectra"), list("mzRnetCDF"))(object, scans)
get3Dmap(object, scans, lowMz, highMz, resMz, ...)
list(list("chromatogram"), list("mzRpwiz"))(object, chrom)
list(list("chromatograms"), list("mzRpwiz"))(object, chrom) ## same as chromatogram
list(list("chromatogramHeader"), list("mzRpwiz"))(object, chrom)
tic(object, ...)
nChrom(object)
Arguments
| Argument | Description |
|---|---|
object | An instantiated mzR object. |
scans | A numeric specifying which scans to return. Optional for the header , peaks , spectra and peaksCount methods. If ommited, the requested data for all peaks is returned. |
lowMz, highMz | Numeric s defining the m/z range to be returned. |
resMz | a numeric defining the m/z resolution. |
chrom | For chromatogram , chromatograms and chromatogramHeader : numeric specifying the index of the chromatograms to be extracted from the file. If omitted, data for all chromatograms is returned. |
... | Other arguments. A scan parameter can be passed to peaks . |
Details
The column acquisitionNum in the data.frame returned by
the header method contains the index during the scan in which
the signal from the spectrum was measured. The pwiz backend
extracts this number from the spectrum's ID provided in the mzML
file. In contrast, column seqNum contains the index of each
spectrum within the file and is thus consecutively numbered. Spectra
from files with multiple MS levels are linked to each other via
their acquisitionNum : column precursorScanNum of an e.g. MS
level 2 spectrum contains the acquisitionNum of the related MS
level 1 spectrum.
Seealso
instrumentInfo for metadata access and the
" class.
writeMSData and copyWriteMSData for
functions to write MS data in mzML or mzXML format.
Note
Spectrum identifiers are only specified in mzML files, thus,
for all other file types the column "spectrumId" of the result
data.frame returned by header contains "scan="
followed by the acquisition number of the spectrum. Also, only the
pwiz backend supports extraction of the spectras' IDs from
mzML files. Thus, only mzML files read with
backend = "pwiz" provide the spectrum IDs defined in the file.
The content of the spectrum identifier depends on the vendor and the
instrument acquisition settings and is reported here as a character,
in its raw form, without further parsing.
Author
Steffen Neumann and Laurent Gatto
Examples
library(msdata)
filepath <- system.file("microtofq", package = "msdata")
file <- list.files(filepath, pattern="MM14.mzML",
full.names=TRUE, recursive = TRUE)
mz <- openMSfile(file)
runInfo(mz)
colnames(header(mz))
close(mz)
## A shotgun LCMSMS experiment
f <- proteomics(full.names = TRUE,
pattern = "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz")
x <- openMSfile(f, backend = "pwiz")
x
nChrom(x)
head(tic(x))
head(chromatogram(x, 1L)) ## same as tic(x)
str(chromatogram(x)) ## as a list
p <- peaks(x) ## extract all peak information
head(peaks(x, scan=4)) ## extract just peaks from the 4th scan
## An MRM experiment
f <- proteomics(full.names = TRUE, pattern = "MRM")
x <- openMSfile(f, backend = "pwiz")
x
nChrom(x)
head(tic(x))
head(chromatogram(x, 1L)) ## same as tic(x)
str(chromatogram(x, 10:12))
## get the header information for the chromatograms
ch <- chromatogramHeader(x)
head(ch)
pwiz_version()
Get the version number of pwiz backend.
Description
Get the version number of pwiz backend.
Usage
pwiz.version()
writeMSData()
Write MS spectrum data to an MS file
Description
writeMSData exports the MS spectrum data provided with
parameters header and data to an MS file in mzML or
mzXML format.
Usage
list(list("writeMSData"), list("list,character"))(object, file, header,
backend = "pwiz", outformat = "mzml", rtime_seconds = TRUE,
software_processing)
Arguments
| Argument | Description |
|---|---|
object | list containing for each spectrum one matrix with columns mz (first column) and intensity (second column). See also peaks for the method that reads such data from an MS file. |
file | character(1) defining the name of the file. |
header | data.frame with the header data for the spectra. Has to be in the format as the data.frame returned by the header method. |
backend | character(1) defining the backend that should be used for writing. Currently only "pwiz" backend is supported. |
outformat | character(1) the format of the output file. One of "mzml" or "mzxml" . |
rtime_seconds | logical(1) whether the retention time is provided in seconds or minutes (defaults to TRUE ). |
software_processing | list of character vectors (or single character vector). Each character vector providing information about the software that was used to process the data with optional additional description of processing steps. The length of each character vector has to be >= 3: the first element being the name of the software, the second string its version and the third element the MS CV ID of the software (or "MS:-1" if not known). All additional elements are optional and represent the MS CV ID of each processing step performed with the software. |
Seealso
copyWriteMSData for a function to copy general
information from a MS data file and writing eventually modified MS
data from that originating file.
Author
Johannes Rainer
Examples
## Open a MS file and read the spectrum and header information
library(msdata)
fl <- system.file("threonine", "threonine_i2_e35_pH_tree.mzXML",
package = "msdata")
ms_fl <- openMSfile(fl, backend = "pwiz")
## Get the spectra
pks <- spectra(ms_fl)
## Get the header
hdr <- header(ms_fl)
## Modify the spectrum data adding 100 to each intensity.
pks <- lapply(pks, function(z) {
z[, 2] <- z[, 2] + 100
z
})
## Write the data to a mzML file.
out_file <- tempfile()
writeMSData(object = pks, file = out_file, header = hdr)