bioconductor v3.9.0 Affxparser

Package for parsing Affymetrix files (CDF, CEL, CHP, BPMAP, BAR). It provides methods for fast and memory efficient parsing of Affymetrix files using the Affymetrix' Fusion SDK. Both ASCII- and binary-based files are supported. Currently, there are methods for reading chip definition file (CDF) and a cell intensity file (CEL). These files can be read either in full or in part. For example, probe signals from a few probesets can be extracted very quickly from a set of CEL files into a convenient list structure.

Link to this section Summary

Functions

  1. Dictionary

Description

This part describes non-obvious terms used in this package.

  1. Cell coordinates and cell indices

Description

This part describes how Affymetrix cells , also known as probes or features , are addressed.

  1. Advanced - Cell-index maps for reading and writing

Description

This part defines read and write maps that can be used to remap cell indices before reading and writing data from and to file, respectively.

Package affxparser

Applies a function to a list of fields of each group in a CDF structure

Applies a function over the groups in a CDF structure

Moves CEL files to subdirectories with names corresponding to the chip types

Adds the number of allele A and allele B mismatching nucleotides of the probes in a CDF structure

Adds the PLASQ types for the probes in a CDF structure

Adds probe offsets to the groups in a CDF structure

Gets a subset of groups fields in a CDF structure

Gets a subset of groups in a CDF structure

Function to imitate Affymetrix' gtype_cel_to_pq software

Creates a valid CEL header from a CDF header

Function to join CDF allele A and allele B groups strand by strand

Function to join CDF groups with the same names

Function to re-arrange CDF groups values in quartets

Orders the fields according to the value of another field in the same CDF group

Orders the columns of fields according to the values in a certain row of another field in the same CDF group

Sets the dimension of an object

Compares the contents of two CDF files

Compares the contents of two CEL files

Converts a CDF into the same CDF but with another format

Converts a CEL into the same CEL but with another format

Copies a CEL file

Creates an empty CEL file

Search for CDF files in multiple directories

Finds one or several files in multiple directories

Inverts a read or a write map

Checks if a file is a CEL file or not

Parses a DAT header string

Parses a Bpmap file

Reads an Affymetrix Command Console Generic (CCG) Data file

Reads an the header of an Affymetrix Command Console Generic (CCG) file

Parsing a CDF file using Affymetrix Fusion SDK

Reads (one-based) cell indices of units (probesets) in an Affymetrix CDF file

Reads units (probesets) from an Affymetrix CDF file

Reads group names for a set of units (probesets) in an Affymetrix CDF file

Reads the header associated with an Affymetrix CDF file

Checks if cells in a CDF file are perfect-match probes or not

Gets the number of cells (probes) that each group of each unit in a CDF file

Reads the QC units of CDF file

Reads unit (probeset) names from an Affymetrix CDF file

Reads units (probesets) from an Affymetrix CDF file

Generates an Affymetrix cell-index write map from a CDF file

Reads an Affymetrix CEL file

Parsing the header of an Affymetrix CEL file

Reads the intensities contained in several Affymetrix CEL files

Reads a spatial subset of probe-level data from Affymetrix CEL files

Reads probe-level data ordered as units (probesets) from one or several Affymetrix CEL files

A function to read Affymetrix CHP files

Parsing a CLF file using Affymetrix Fusion SDK

Parsing a CLF file using Affymetrix Fusion SDK

Read the header of a CLF file.

Parsing a PGF file using Affymetrix Fusion SDK

Parsing a PGF file using Affymetrix Fusion SDK

Read the header of a PGF file into a list.

Updates a CEL file

Updates a CEL file unit by unit

Creates a binary CDF file

Writes a CDF header

Writes CDF QC units

Writes CDF units

Writes a CEL header to a connection

Writes BPMAP and TPMAP files.

Link to this section Functions

  1. Dictionary

Description

This part describes non-obvious terms used in this package.

list(" ", " ", list(list("affxparser"), list("The name of this package.")), " ", " ", list(list("API"), list("Application program interface, which describes the ", " functional interface of underlying methods.")), " ", " ", list(list("block"), list("(aka group).")), " ", " ", list(list("BPMAP"), list("A file format containing information ", " related to the design of the tiling arrays.")), " ", " ", list(list("Calvin"), list("A special binary file format.")), " ", " ", list(list(

"CDF"), list("A file format: chip definition file.")), "

", " ", list(list("CEL"), list("A file format: cell intensity file.")), " ", " ", list(list("cell"), list("(aka feature) A probe.")), " ", " ", list(list("cell index"), list("An integer that identifies a probe uniquely.")), " ", " ", list(list("chip"), list("An array.")), " ", " ", list(list("chip type"), list("An identifier specifying a chip design ", " uniquely, e.g. ", list(""Mapping50K_Xba240""), ".")), " ", " ", list(

list("DAT"), list("A file format: contains pixel intensity

", " values collected from an Affymetrix GeneArray scanner.")), " ", " ", list(list("feature"), list("A probe.")), " ", " ", list(list("Fusion SDK"), list("Open-source software development kit (SDK) provided ", " by Affymetrix to access their data files.")), " ", " ", list(list("group"), list("(aka block) ", " Defines a unique subset of the cells in a unit. ", " Expression arrays typically only have one group per unit, whereas ",

"    SNP arrays have either two or four groups per unit, one for each of

", " the two allele times possibly repeated for both strands.")), " ", " ", list(list("MM"), list("Mismatch-match, e.g. MM probe.")), " ", " ", list(list("PGF"), list("A file format: probe group file.")), " ", " ", list(list("TPMAP"), list("A file format storing the relationship between (PM,MM) ", " pairs (or PM probes) and positions on a set of sequences.")), " ", " ", list(list("QC"), list("Quality control, e.g. QC probes and QC probe sets.")),

"

", " ", list(list("unit"), list("A probeset.")), " ", " ", list(list("XDA"), list("A file format, aka as the binary file format.")), " ", " ")

Link to this function

2_Cell_coordinates_and_cell_indices()

  1. Cell coordinates and cell indices

Description

This part describes how Affymetrix cells , also known as probes or features , are addressed.

Author

Henrik Bengtsson

Link to this function

9_Advanced___Cell_index_maps_for_reading_and_writing()

  1. Advanced - Cell-index maps for reading and writing

Description

This part defines read and write maps that can be used to remap cell indices before reading and writing data from and to file, respectively.

This package provides methods to create read and write (cell-index) maps from Affymetrix CDF files. These can be used to store the cell data in an optimal order so that when data is read it is read in contiguous blocks, which is faster.

In addition to this, read maps may also be used to read CEL files that have been "reshuffled" by other software. For instance, the dChip software ( http://www.dchip.org/ ) rotates Affymetrix Exon, Tiling and Mapping 500K data. See example below how to read such data "unrotated".

For more details how cell indices are defined, see 2. Cell coordinates and cell indices .

Author

Henrik Bengtsson

Link to this function

affxparser_package()

Package affxparser

Description

The affxparser package provides methods for fast and memory efficient parsing of Affymetrix files [1] using the Affymetrix' Fusion SDK [2,3]. Both traditional ASCII- and binary (XDA)-based files are supported, as well as Affymetrix future binary format "Calvin". The efficiency of the parsing is dependent on whether a specific file is binary or ASCII.

Currently, there are methods for reading chip definition file (CDF) and a cell intensity file (CEL). These files can be read either in full or in part. For example, probe signals from a few probesets can be extracted very quickly from a set of CEL files into a convenient list structure.

Author

Henrik Bengtsson [aut], James Bullard [aut], Robert Gentleman [ctb], Kasper Daniel Hansen [aut, cre], Martin Morgan [ctb]

References

[1] Affymetrix Inc, Affymetrix GCOS 1.x compatible file formats, April, 2006. http://www.affymetrix.com/support/developer/ list() [2] Affymetrix Inc, Fusion Software Developers Kit (SDK), 2006. http://www.affymetrix.com/support/developer/fusion/ list() [3] Henrik Bengtsson, unofficial archive of Affymetrix Fusion Software Developers Kit (SDK), https://github.com/HenrikBengtsson/Affx-Fusion-SDK list()

Link to this function

applyCdfGroupFields()

Applies a function to a list of fields of each group in a CDF structure

Description

Applies a function to a list of fields of each group in a CDF structure.

Usage

applyCdfGroupFields(cdf, fcn, ...)

Arguments

ArgumentDescription
cdfA CDF list structure.
fcnA function that takes a list structure of fields and returns an updated list of fields.
...Arguments passed to the fcn function.

Value

Returns an updated CDF list structure.

Seealso

applyCdfGroups ().

Author

Henrik Bengtsson

Link to this function

applyCdfGroups()

Applies a function over the groups in a CDF structure

Description

Applies a function over the groups in a CDF structure.

Usage

applyCdfGroups(cdf, fcn, ...)

Arguments

ArgumentDescription
cdfA CDF list structure.
fcnA function that takes a list structure of group elements and returns an updated list of groups.
...Arguments passed to the fcn function.

Value

Returns an updated CDF list structure.

Author

Henrik Bengtsson

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

cdfFile <- findCdf("Mapping10K_Xba131")

# Identify the unit index from the unit name
unitName <- "SNP_A-1509436"
unit <- which(readCdfUnitNames(cdfFile) == unitName)

# Read the CDF file
cdf0 <- readCdfUnits(cdfFile, units=unit, stratifyBy="pmmm", readType=FALSE, readDirection=FALSE)
cat("Default CDF structure:
")
print(cdf0)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Tabulate the information in each group
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
cdf <- readCdfUnits(cdfFile, units=unit)
cdf <- applyCdfGroups(cdf, lapply, as.data.frame)
print(cdf)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Infer the (true or the relative) offset for probe quartets.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
cdf <- applyCdfGroups(cdf0, cdfAddProbeOffsets)
cat("Probe offsets:
")
print(cdf)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Identify the number of nucleotides that mismatch the
# allele A and the allele B sequences, respectively.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
cdf <- applyCdfGroups(cdf, cdfAddBaseMmCounts)
cat("Allele A & B target sequence mismatch counts:
")
print(cdf)



# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Combine the signals from  the sense and the anti-sense
# strands in a SNP CEL files.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# First, join the strands in the CDF structure.
cdf <- applyCdfGroups(cdf, cdfMergeStrands)
cat("Joined CDF structure:
")
print(cdf)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Rearrange values of group fields into quartets.  This
# requires that the values are already arranged as PMs and MMs.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
cdf <- applyCdfGroups(cdf0, cdfMergeAlleles)
cat("Probe quartets:
")
print(cdf)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Get the x and y cell locations (note, zero-based)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
x <- unlist(applyCdfGroups(cdf, cdfGetFields, "x"), use.names=FALSE)
y <- unlist(applyCdfGroups(cdf, cdfGetFields, "y"), use.names=FALSE)

# Validate
ncol <- readCdfHeader(cdfFile)$cols
cells <- as.integer(y*ncol+x+1)
cells <- sort(cells)

cells0 <- readCdfCellIndices(cdfFile, units=unit)
cells0 <- unlist(cells0, use.names=FALSE)
cells0 <- sort(cells0)

stopifnot(identical(cells0, cells))

##############################################################
}                                                     # STOP #
##############################################################
Link to this function

arrangeCelFilesByChipType()

Moves CEL files to subdirectories with names corresponding to the chip types

Description

Moves CEL files to subdirectories with names corresponding to the chip types according to the CEL file headers. For instance, a HG_U95Av2 CEL file with pathname "data/foo.CEL" will be moved to subdirectory celFiles/HG_U95Av2/ .

Usage

|arrangeCelFilesByChipType(pathnames=list.files(pattern = "[.](cel|CEL)$"),|
  path="celFiles/", aliases=NULL, ...)

Arguments

ArgumentDescription
pathnamesA character vector of CEL pathnames to be moved.
pathA character string specifying the root output directory, which in turn will contain chip-type subdirectories. All directories will be created, if missing.
aliasesA named character string with chip type aliases. For instance, aliases=c("Focus"="HG-Focus") will treat a CEL file with chiptype label 'Focus' (early-access name) as if it was 'HG-Focus' (official name).
...Not used.

Value

Returns (invisibly) a named character vector of the new pathnames with the chip types as the names. Files that could not be moved or where not valid CEL files are set to missing values.

Seealso

The chip type is inferred from the CEL file header, cf. readCelHeader ().

Author

Henrik Bengtsson

Link to this function

cdfAddBaseMmCounts()

Adds the number of allele A and allele B mismatching nucleotides of the probes in a CDF structure

Description

Adds the number of allele A and allele B mismatching nucleotides of the probes in a CDF structure.

This function is design to be used with applyCdfGroups () on an Affymetrix Mapping (SNP) CDF list structure.

Identifies the number of nucleotides (bases) in probe sequences that mismatch the the target sequence for allele A and the allele B, as used by [1].

Usage

cdfAddBaseMmCounts(groups, ...)

Arguments

ArgumentDescription
groupsA list structure with groups. Each group must contain the fields tbase , pbase , and offset (from cdfAddProbeOffsets ()).
...Not used.

Details

Note that the above counts can be inferred from the CDF structure alone, i.e. no sequence information is required. Consider a probe group interrogating allele A. First, all PM probes matches the allele A target sequence perfectly regardless of shift. Moreover, all these PM probes mismatch the allele B target sequence at exactly one position. Second, all MM probes mismatches the allele A sequence at exactly one position. This is also true for the allele B sequence, except for an MM probe with zero offset, which only mismatch at one (the middle) position. For a probe group interrogating allele B, the same rules applies with labels A and B swapped. In summary, the mismatch counts for PM probes can take values 0 and 1, and for MM probes they can take values 0, 1, and 2.

Value

Returns a list structure with the same number of groups as the groups argument. To each group, two fields is added:

*

Seealso

To add required probe offsets, cdfAddProbeOffsets (). applyCdfGroups ().

Author

Henrik Bengtsson

References

[1] LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR, and Meyerson M. list("Allele-specific amplification in ", " cancer revealed by SNP array analysis") , PLoS Computational Biology, Nov 2005, Volume 1, Issue 6, e65. list() [2] Affymetrix, list("Understanding Genotyping Probe Set Structure") , 2005. http://www.affymetrix.com/support/developer/whitepapers/genotyping_probe_set_structure.affx list()

Link to this function

cdfAddPlasqTypes()

Adds the PLASQ types for the probes in a CDF structure

Description

Adds the PLASQ types for the probes in a CDF structure.

This function is design to be used with applyCdfGroups () on an Affymetrix Mapping (SNP) CDF list structure.

Usage

cdfAddPlasqTypes(groups, ...)

Arguments

ArgumentDescription
groupsA list structure with groups. Each group must contain the fields tbase , pbase , and expos .
...Not used.

Details

This function identifies the number of nucleotides (bases) in probe sequences that mismatch the the target sequence for allele A and the allele B, as used by PLASQ [1], and adds an integer [0,15] interpreted as one of 16 probe types. In PLASQ these probe types are referred to as: 0=MMoBR, 1=MMoBF, 2=MMcBR, 3=MMcBF, 4=MMoAR, 5=MMoAF, 6=MMcAR, 7=MMcAF, 8=PMoBR, 9=PMoBF, 10=PMcBR, 11=PMcBF, 12=PMoAR, 13=PMoAF, 14=PMcAR, 15=PMcAF. list()

Pseudo rule for finding out the probe-type value: list()

  • PM/MM: For MMs add 0, for PMs add 8.

  • A/B: For Bs add 0, for As add 4.

  • o/c: For shifted (o) add 0, for centered (c) add 2.

  • R/F: For antisense (R) add 0, for sense (F) add 1.
    Example: (PM,A,c,R) = 8 + 4 + 2 + 0 = 14 (=PMcAR)

Value

Returns a list structure with the same number of groups as the groups argument. To each group, one fields is added:

*

Author

Henrik Bengtsson

References

[1] LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR, and Meyerson M. list("Allele-specific amplification in ", " cancer revealed by SNP array analysis") , PLoS Computational Biology, Nov 2005, Volume 1, Issue 6, e65. list()

Link to this function

cdfAddProbeOffsets()

Adds probe offsets to the groups in a CDF structure

Description

Adds probe offsets to the groups in a CDF structure.

This function is design to be used with applyCdfGroups () on an Affymetrix Mapping (SNP) CDF list structure.

Usage

cdfAddProbeOffsets(groups, ...)

Arguments

ArgumentDescription
groupsA list structure with groups. Each group must contain the fields tbase , and expos .
...Not used.

Value

Returns a list structure with half the number of groups as the groups argument (since allele A and allele B groups have been joined).

Seealso

applyCdfGroups ().

Author

Henrik Bengtsson

References

[1] Affymetrix, list("Understanding Genotyping Probe Set Structure") , 2005. http://www.affymetrix.com/support/developer/whitepapers/genotyping_probe_set_structure.affx list()

Gets a subset of groups fields in a CDF structure

Description

Gets a subset of groups fields in a CDF structure.

This function is designed to be used with applyCdfGroups ().

Usage

cdfGetFields(groups, fields, ...)

Arguments

ArgumentDescription
groupsA list of groups.
fieldsA character vector of names of fields to be returned.
...Not used.

Details

Note that an error is not generated for missing fields. Instead the field is returned with value NA . The reason for this is that it is much faster.

Value

Returns a list structure of groups.

Seealso

applyCdfGroups ().

Author

Henrik Bengtsson

Gets a subset of groups in a CDF structure

Description

Gets a subset of groups in a CDF structure.

This function is designed to be used with applyCdfGroups ().

Usage

cdfGetGroups(groups, which, ...)

Arguments

ArgumentDescription
groupsA list of groups.
whichAn integer or character vector of groups be returned.
...Not used.

Value

Returns a list structure of groups.

Seealso

applyCdfGroups ().

Author

Henrik Bengtsson

Link to this function

cdfGtypeCelToPQ()

Function to imitate Affymetrix' gtype_cel_to_pq software

Description

Function to imitate Affymetrix' gtype_cel_to_pq software.

This function is design to be used with applyCdfGroups () on an Affymetrix Mapping (SNP) CDF list structure.

Usage

cdfGtypeCelToPQ(groups, ...)

Arguments

ArgumentDescription
groupsA list structure with groups.
...Not used.

Value

Returns a list structure with a single group. The fields in this groups are in turn vectors (all of equal length) where the elements are stored as subsequent quartets (PMA, MMA, PMB, MMB) with all forward-strand quartets first followed by all reverse-strand quartets.

Seealso

applyCdfGroups ().

Author

Henrik Bengtsson

References

[1] Affymetrix, list("Understanding Genotyping Probe Set Structure") , 2005. http://www.affymetrix.com/support/developer/whitepapers/genotyping_probe_set_structure.affx list()

Link to this function

cdfHeaderToCelHeader()

Creates a valid CEL header from a CDF header

Description

Creates a valid CEL header from a CDF header.

Usage

cdfHeaderToCelHeader(cdfHeader, sampleName="noname", date=Sys.time(), ..., version="4")

Arguments

ArgumentDescription
cdfHeaderA CDF list structure.
sampleNameThe name of the sample to be added to the CEL header.
dateThe (scan) date to be added to the CEL header.
...Not used.
versionThe file-format version of the generated CEL file. Currently only version 4 is supported.

Value

Returns a CDF list structure.

Author

Henrik Bengtsson

Link to this function

cdfMergeAlleles()

Function to join CDF allele A and allele B groups strand by strand

Description

Function to join CDF allele A and allele B groups strand by strand.

This function is design to be used with applyCdfGroups () on an Affymetrix Mapping (SNP) CDF list structure.

Usage

cdfMergeAlleles(groups, compReverseBases=FALSE, collapse="", ...)

Arguments

ArgumentDescription
groupsA list structure with groups.
compReverseBasesIf TRUE , the group names, which typically are names for bases, are turned into their complementary bases for the reverse strand.
collapseThe character string used to collapse the allele A and the allele B group names.
...Not used.

Details

Allele A and allele B are merged into a matrix where first row hold the elements for allele A and the second elements for allele B.

Value

Returns a list structure with the two groups forward and reverse , if the latter exists.

Seealso

applyCdfGroups ().

Author

Henrik Bengtsson

References

[1] Affymetrix, list("Understanding Genotyping Probe Set Structure") , 2005. http://www.affymetrix.com/support/developer/whitepapers/genotyping_probe_set_structure.affx list()

Link to this function

cdfMergeStrands()

Function to join CDF groups with the same names

Description

Function to join CDF groups with the same names.

This function is design to be used with applyCdfGroups () on an Affymetrix Mapping (SNP) CDF list structure.

This can be used to join the sense and anti-sense groups of the same allele in SNP arrays.

Usage

cdfMergeStrands(groups, ...)

Arguments

ArgumentDescription
groupsA list structure with groups.
...Not used.

Details

If a unit has two strands, they are merged such that the elements for the second strand are concatenated to the end of the elements of first strand (This is done separately for the two alleles).

Value

Returns a list structure with only two groups.

Seealso

applyCdfGroups ().

Author

Henrik Bengtsson

References

[1] Affymetrix, list("Understanding Genotyping Probe Set Structure") , 2005. http://www.affymetrix.com/support/developer/whitepapers/genotyping_probe_set_structure.affx list()

Link to this function

cdfMergeToQuartets()

Function to re-arrange CDF groups values in quartets

Description

Function to re-arrange CDF groups values in quartets.

This function is design to be used with applyCdfGroups () on an Affymetrix Mapping (SNP) CDF list structure.

Note, this requires that the group values have already been arranged in PMs and MMs.

Usage

cdfMergeToQuartets(groups, ...)

Arguments

ArgumentDescription
groupsA list structure with groups.
...Not used.

Value

Returns a list structure with the two groups forward and reverse , if the latter exists.

Seealso

applyCdfGroups ().

Author

Henrik Bengtsson

References

[1] Affymetrix, list("Understanding Genotyping Probe Set Structure") , 2005. http://www.affymetrix.com/support/developer/whitepapers/genotyping_probe_set_structure.affx list()

Orders the fields according to the value of another field in the same CDF group

Description

Orders the fields according to the value of another field in the same CDF group.

This function is design to be used with applyCdfGroups () on an Affymetrix Mapping (SNP) CDF list structure.

Usage

cdfOrderBy(groups, field, ...)

Arguments

ArgumentDescription
groupsA list of groups.
fieldThe field whose values are used to order the other fields.
...Optional arguments passed order ().

Value

Returns a list structure of groups.

Seealso

cdfOrderColumnsBy (). applyCdfGroups ().

Author

Henrik Bengtsson

Link to this function

cdfOrderColumnsBy()

Orders the columns of fields according to the values in a certain row of another field in the same CDF group

Description

Orders the columns of fields according to the values in a certain row of another field in the same CDF group. Note that this method requires that the group fields are matrices.

This function is design to be used with applyCdfGroups () on an Affymetrix Mapping (SNP) CDF list structure.

Usage

cdfOrderColumnsBy(groups, field, row=1, ...)

Arguments

ArgumentDescription
groupsA list of groups.
fieldThe field whose values in row row are used to order the other fields.
rowThe row of the above field to be used to find the order.
...Optional arguments passed order ().

Value

Returns a list structure of groups.

Seealso

cdfOrderBy (). applyCdfGroups ().

Author

Henrik Bengtsson

Link to this function

cdfSetDimension()

Sets the dimension of an object

Description

Sets the dimension of an object.

This function is designed to be used with applyCdfGroupFields ().

Usage

cdfSetDimension(field, dim, ...)

Arguments

ArgumentDescription
groupsA list of groups.
whichAn integer or character vector of groups be returned.
...Not used.

Value

Returns a list structure of groups.

Seealso

applyCdfGroupFields ().

Author

Henrik Bengtsson

Compares the contents of two CDF files

Description

Compares the contents of two CDF files.

Usage

compareCdfs(pathname, other, quick=FALSE, verbose=0, ...)

Arguments

ArgumentDescription
pathnameThe pathname of the first CDF file.
otherThe pathname of the seconds CDF file.
quickIf TRUE , only a subset of the units are compared, otherwise all units are compared.
verboseAn integer . The larger the more details are printed.
...Not used.

Details

The comparison is done with an upper-limit memory usage, regardless of the size of the CDFs.

Value

Returns TRUE if the two CDF are equal, otherwise FALSE . If FALSE , the attribute reason contains a string explaining what difference was detected, and the attributes value1 and value2 contain the two objects/values that differs.

Seealso

convertCdf ().

Author

Henrik Bengtsson

Compares the contents of two CEL files

Description

Compares the contents of two CEL files.

Usage

compareCels(pathname, other, readMap=NULL, otherReadMap=NULL, verbose=0, ...)

Arguments

ArgumentDescription
pathnameThe pathname of the first CEL file.
otherThe pathname of the seconds CEL file.
readMapAn optional read map for the first CEL file.
otherReadMapAn optional read map for the second CEL file.
verboseAn integer . The larger the more details are printed.
...Not used.

Value

Returns TRUE if the two CELs are equal, otherwise FALSE . If FALSE , the attribute reason contains a string explaining what difference was detected, and the attributes value1 and value2 contain the two objects/values that differs.

Seealso

convertCel ().

Author

Henrik Bengtsson

Converts a CDF into the same CDF but with another format

Description

Converts a CDF into the same CDF but with another format. Currently only CDF files in version 4 (binary/XDA) can be written. However, any input format is recognized.

Usage

convertCdf(filename, outFilename, version="4", force=FALSE, ..., .validate=TRUE,
  verbose=FALSE)

Arguments

ArgumentDescription
filenameThe pathname of the original CDF file.
outFilenameThe pathname of the destination CDF file. If the same as the source file, an exception is thrown.
versionThe version of the output file format.
forceIf FALSE , and the version of the original CDF is the same as the output version, the new CDF will not be generated, otherwise it will.
...Not used.
.validateIf TRUE , a consistency test between the generated and the original CDF is performed. Note that the memory overhead for this can be quite large, because two complete CDF structures are kept in memory at the same time.
verboseIf TRUE , extra details are written while processing.

Value

Returns (invisibly) TRUE if a new CDF was generated, otherwise FALSE .

Seealso

See compareCdfs () to compare two CDF files. writeCdf ().

Author

Henrik Bengtsson

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################


chipType <- "Test3"
cdfFiles <- findCdf(chipType, firstOnly=FALSE)
cdfFiles <- list(
ASCII=grep("ASCII", cdfFiles, value=TRUE),
XDA=grep("XDA", cdfFiles, value=TRUE)
)

outFile <- file.path(tempdir(), sprintf("%s.cdf", chipType))
convertCdf(cdfFiles$ASCII, outFile, verbose=TRUE)

##############################################################
}                                                     # STOP #
##############################################################

Converts a CEL into the same CEL but with another format

Description

Converts a CEL into the same CEL but with another format. Currently only CEL files in version 4 (binary/XDA) can be written. However, any input format is recognized.

Usage

convertCel(filename, outFilename, readMap=NULL, writeMap=NULL, version="4",
  newChipType=NULL, ..., .validate=FALSE, verbose=FALSE)

Arguments

ArgumentDescription
filenameThe pathname of the original CEL file.
outFilenameThe pathname of the destination CEL file. If the same as the source file, an exception is thrown.
readMapAn optional read map for the input CEL file.
writeMapAn optional write map for the output CEL file.
versionThe version of the output file format.
newChipType(Only for advanced users who fully understands the Affymetrix CEL file format!) An optional string for overriding the chip type (label) in the CEL file header.
...Not used.
.validateIf TRUE , a consistency test between the generated and the original CEL is performed.
verboseIf TRUE , extra details are written while processing.

Value

Returns (invisibly) TRUE if a new CEL was generated, otherwise FALSE .

Seealso

createCel ().

Author

Henrik Bengtsson

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

# Search for some available Calvin CEL files
path <- system.file("rawData", package="AffymetrixDataTestFiles")
|files <- findFiles(pattern="[.](cel|CEL)$", path=path, recursive=TRUE, firstOnly=FALSE)|
files <- grep("FusionSDK_Test3", files, value=TRUE)
files <- grep("Calvin", files, value=TRUE)
file <- files[1]


outFile <- file.path(tempdir(), gsub("[.]CEL$", ",XBA.CEL", basename(file)))
if (file.exists(outFile))
file.remove(outFile)
convertCel(file, outFile, .validate=TRUE)


##############################################################
}                                                     # STOP #
##############################################################

Copies a CEL file

Description

Copies a CEL file.

The file must be a valid CEL file, if not an exception is thrown.

Usage

copyCel(from, to, overwrite=FALSE, ...)

Arguments

ArgumentDescription
fromThe filename of the CEL file to be copied.
toThe filename of destination file.
overwriteIf FALSE and the destination file already exists, an exception is thrown, otherwise not.
...Not used.

Value

Return TRUE if file was successfully copied, otherwise FALSE .

Seealso

isCelFile ().

Author

Henrik Bengtsson

Creates an empty CEL file

Description

Creates an empty CEL file.

Usage

createCel(filename, header, nsubgrids=0, overwrite=FALSE, ..., cdf=NULL, verbose=FALSE)

Arguments

ArgumentDescription
filenameThe filename of the CEL file to be created.
headerA list structure describing the CEL header, similar to the structure returned by readCelHeader (). This header can be of any CEL header version.
overwriteIf FALSE and the file already exists, an exception is thrown, otherwise the file is created.
nsubgridsThe number of subgrids.
...Not used.
cdf(optional) The pathname of a CDF file for the CEL file to be created. If given, the CEL header (argument header ) is validated against the CDF header, otherwise not. If TRUE , a CDF file is located automatically based using findCdf(header$chiptype) .
verboseAn integer specifying how much verbose details are outputted.

Details

Currently only binary (v4) CEL files are supported. The current version of the method does not make use of the Fusion SDK, but its own code to create the CEL file.

Value

Returns (invisibly) the pathname of the file created.

Author

Henrik Bengtsson

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

# Search for first available ASCII CEL file
path <- system.file("rawData", package="AffymetrixDataTestFiles")
|files <- findFiles(pattern="[.](cel|CEL)$", path=path, recursive=TRUE, firstOnly=FALSE)|
files <- grep("ASCII", files, value=TRUE)
file <- files[1]


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Read the CEL header
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
hdr <- readCelHeader(file)

# Assert that we found an ASCII CEL file, but any will do
stopifnot(hdr$version == 3)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Create a CEL v4 file of the same chip type
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
outFile <- file.path(tempdir(), "zzz.CEL")
if (file.exists(outFile))
file.remove(outFile)
createCel(outFile, hdr, overwrite=TRUE)
str(readCelHeader(outFile))

# Verify correctness by update and re-read a few cells
intensities <- as.double(1:100)
indices <- seq(along=intensities)
updateCel(outFile, indices=indices, intensities=intensities)
value <- readCel(outFile, indices=indices)$intensities
stopifnot(identical(intensities, value))


##############################################################
}                                                     # STOP #
##############################################################

Search for CDF files in multiple directories

Description

Search for CDF files in multiple directories.

Usage

|findCdf(chipType=NULL, paths=NULL, recursive=TRUE, pattern="[.](c|C)(d|D)(f|F)$", ...)|

Arguments

ArgumentDescription
chipTypeA character string of the chip type to search for.
pathsA character vector of paths to be searched. The current directory is always searched at the beginning. If NULL , default paths are searched. For more details, see below.
recursiveIf TRUE , directories are searched recursively.
patternA regular expression file name pattern to match.
...Additional arguments passed to findFiles ().

Details

Note, the current directory is always searched first, but never recursively (unless it is added to the search path explicitly). This provides an easy way to override other files in the search path.

If paths is NULL , then a set of default paths are searched. The default search path constitutes:

  • getOption("AFFX_CDF_PATH")

  • Sys.getenv("AFFX_CDF_PATH")

One of the easiest ways to set system variables for list() is to set them in an .Renviron file, e.g. list(" ", " # affxparser: Set default CDF path ", " AFFX_CDF_PATH=${AFFX_CDF_PATH};M:/Affymetrix_2004-100k_trios/cdf ", " AFFX_CDF_PATH=${AFFX_CDF_PATH};M:/Affymetrix_2005-500k_data/cdf ", " ") See Startup for more details.

Value

Returns a vector of the full pathnames of the files found.

Seealso

This method is used internally by readCelUnits () if the CDF file is not specified.

Author

Henrik Bengtsson

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

# Find a specific CDF file
cdfFile <- findCdf("Mapping10K_Xba131")
print(cdfFile)

# Find the first CDF file (no matter what it is)
cdfFile <- findCdf()
print(cdfFile)

# Find all CDF files in search path and display their headers
cdfFiles <- findCdf(firstOnly=FALSE)
for (cdfFile in cdfFiles) {
cat("=======================================
")
hdr <- readCdfHeader(cdfFile)
str(hdr)
}

##############################################################
}                                                     # STOP #
##############################################################

Finds one or several files in multiple directories

Description

Finds one or several files in multiple directories.

Usage

findFiles(pattern=NULL, paths=NULL, recursive=FALSE, firstOnly=TRUE, allFiles=TRUE, ...)

Arguments

ArgumentDescription
patternA regular expression file name pattern to match.
pathsA character vector of paths to be searched.
recursiveIf TRUE , the directory structure is searched breath-first, in lexicographic order.
firstOnlyIf TRUE , the method returns as soon as a matching file is found, otherwise not.
allFilesIf FALSE , files and directories starting with a period will be skipped, otherwise not.
...Arguments passed to list.files ().

Value

Returns a vector of the full pathnames of the files found.

Author

Henrik Bengtsson

Inverts a read or a write map

Description

Inverts a read or a write map.

Usage

invertMap(map, ...)

Arguments

ArgumentDescription
mapAn integer vector .
...Not used.

Details

An map is defined to be a vector of n with unique finite values in $[1,n]$ . Finding the inverse of a map is the same as finding the rank of each element, cf. order (). However, this method is much faster, because it utilizes the fact that all values are unique and in $[1,n]$ . Moreover, for any map it holds that taking the inverse twice will result in the same map.

Value

Returns an integer vector .

Seealso

To generate an optimized write map for a CDF file, see readCdfUnitsWriteMap ().

Author

Henrik Bengtsson

Examples

set.seed(1)

# Simulate a read map for a chip with 1.2 million cells
nbrOfCells <- 1200000
readMap <- sample(nbrOfCells)

# Get the corresponding write map
writeMap <- invertMap(readMap)

# A map inverted twice should be equal itself
stopifnot(identical(invertMap(writeMap), readMap))

# Another example illustrating that the write map is the
# inverse of the read map
idx <- sample(nbrOfCells, size=1000)
stopifnot(identical(writeMap[readMap[idx]], idx))

# invertMap() is much faster than order()
t1 <- system.time(invertMap(readMap))[3]
cat(sprintf("invertMap()  : %5.2fs [ 1.00x]
", t1))

t2 <- system.time(writeMap2 <- sort.list(readMap, na.last=NA, method="quick"))[3]
cat(sprintf("'quick sort' : %5.2fs [%5.2fx]
", t2, t2/t1))
stopifnot(identical(writeMap, writeMap2))

t3 <- system.time(writeMap2 <- order(readMap))[3]
cat(sprintf("order()      : %5.2fs [%5.2fx]
", t3, t3/t1))
stopifnot(identical(writeMap, writeMap2))

# Clean up
rm(nbrOfCells, idx, readMap, writeMap, writeMap2)

Checks if a file is a CEL file or not

Description

Checks if a file is a CEL file or not.

Usage

isCelFile(filename, ...)

Arguments

ArgumentDescription
filenameA filename.
...Not used.

Value

Returns TRUE if a CEL file, otherwise FALSE . ASCII (v3), binary (v4;XDA), and binary (CCG v1;Calvin) CEL files are recognized. If file does not exist, an exception is thrown.

Seealso

readCel (), readCelHeader (), readCelUnits ().

Author

Henrik Bengtsson

Link to this function

parseDatHeaderString()

Parses a DAT header string

Description

Parses a DAT header string.

Usage

parseDatHeaderString(header, timeFormat="%m/%d/%y %H:%M:%S", ...)

Arguments

ArgumentDescription
headerA character string.
timeFormatThe format string used to parse the timestamp. For more details, see strptime . If NULL , no parsing is done.
...Not used.

Value

Returns named list structure.

Seealso

readCelHeader ().

Author

Henrik Bengtsson

Parses a Bpmap file

Description

Parses (parts of) a Bpmap (binary probe mapping) file from Affymetrix.

Usage

readBpmap(filename, seqIndices = NULL, readProbeSeq = TRUE, readSeqInfo
= TRUE, readPMXY = TRUE, readMMXY = TRUE, readStartPos = TRUE,
readCenterPos = FALSE, readStrand = TRUE, readMatchScore = FALSE,
readProbeLength = FALSE, verbose = 0)
readBpmapHeader(filename)
readBpmapSeqinfo(filename, seqIndices = NULL, verbose = 0)

Arguments

ArgumentDescription
filenameThe filename as a character.
seqIndicesA vector of integers, detailing the indices of the sequences being read. If NULL , the entire file is being read.
readProbeSeqDo we read the probe sequences.
readSeqInfoDo we read the sequence information (a list containing information such as sequence name, number of hits etc.)
readPMXYDo we read the (x,y) coordinates of the PM-probes.
readMMXYDo we read the (x,y) coordinates of the MM-probes (only relevant if the file has MM information)
readStartPosDo we read the start position of the probes.
readCenterPosDo we return the start position of the probes.
readStrandDo we return the strand of the hits.
readMatchScoreDo we return the matchscore.
readProbeLengthDoe we return the probelength.
verboseHow verbose do we want to be.

Details

readBpmap reads a BPMAP file, which is a binary file containing information about a given probe's location in a sequence. Here sequence means some kind of reference sequence, typically a chromosome or a scaffold. readBpmapHeader reads the header of the BPMAP file, and readBpmapSeqinfo reads the sequence info of the sequences (so this function is merely a convenience function).

Value

For readBpmap : A list of lists, one list for every sequence read. The components of the sequence lists, depends on the argument of the function call. For readBpmapheader a list with two components version and numSequences . For readBpmapSeqinfo a list of lists containing the sequence info.

Seealso

tpmap2bpmap for information on how to write Bpmap files.

Author

Kasper Daniel Hansen

Reads an Affymetrix Command Console Generic (CCG) Data file

Description

Reads an Affymetrix Command Console Generic (CCG) Data file. The CCG data file format is also known as the Calvin file format.

Usage

readCcg(pathname, verbose=0, .filter=NULL, ...)

Arguments

ArgumentDescription
pathnameThe pathname of the CCG file.
verboseAn integer specifying the verbose level. If 0, the file is parsed quietly. The higher numbers, the more details.
.filterA list .
...Not used.

Details

Note, the current implementation of this methods does not utilize the Affymetrix Fusion SDK library. Instead, it is implemented in R from the file format definition [1].

Value

A named list structure consisting of ...

Seealso

readCcgHeader (). readCdfUnits ().

Author

Henrik Bengtsson

References

[1] Affymetrix Inc, Affymetrix GCOS 1.x compatible file formats, April, 2006. http://www.affymetrix.com/support/developer/ list()

Link to this function

readCcgHeader()

Reads an the header of an Affymetrix Command Console Generic (CCG) file

Description

Reads an the header of an Affymetrix Command Console Generic (CCG) file.

Usage

readCcgHeader(pathname, verbose=0, .filter=list(fileHeader = TRUE, dataHeader = TRUE),
  ...)

Arguments

ArgumentDescription
pathnameThe pathname of the CCG file.
verboseAn integer specifying the verbose level. If 0, the file is parsed quietly. The higher numbers, the more details.
.filterA list .
...Not used.

Details

Note, the current implementation of this methods does not utilize the Affymetrix Fusion SDK library. Instead, it is implemented in R from the file format definition [1].

Value

A named list structure consisting of ...

Seealso

readCcg ().

Author

Henrik Bengtsson

References

[1] Affymetrix Inc, Affymetrix GCOS 1.x compatible file formats, April, 2006. http://www.affymetrix.com/support/developer/ list()

Parsing a CDF file using Affymetrix Fusion SDK

Description

Parsing a CDF file using Affymetrix Fusion SDK. This function parses a CDF file using the Affymetrix Fusion SDK. list("This function will most likely be replaced by the more ", " general ", list(list("readCdfUnits"), "()"), " function.")

Usage

readCdf(filename, units=NULL,
         readXY=TRUE, readBases=TRUE,
         readIndexpos=TRUE, readAtoms=TRUE,
         readUnitType=TRUE, readUnitDirection=TRUE,
         readUnitNumber=TRUE, readUnitAtomNumbers=TRUE,
         readGroupAtomNumbers=TRUE, readGroupDirection=TRUE,
         readIndices=FALSE, readIsPm=FALSE,
         stratifyBy=c("nothing", "pmmm", "pm", "mm"),
         verbose=0)

Arguments

ArgumentDescription
filenameThe filename of the CDF file.
unitsAn integer vector of unit indices specifying which units to be read. If NULL , all units are read.
readXYIf TRUE , cell row and column (x,y) coordinates are retrieved, otherwise not.
readBasesIf TRUE , cell P and T bases are retrieved, otherwise not.
readIndexposIf TRUE , cell indexpos are retrieved, otherwise not.
readExposIf TRUE , cell "expos" values are retrieved, otherwise not.
readUnitTypeIf TRUE , unit types are retrieved, otherwise not.
readUnitDirectionIf TRUE , unit directions are retrieved, otherwise not.
readUnitNumberIf TRUE , unit numbers are retrieved, otherwise not.
readUnitAtomNumbersIf TRUE , unit atom numbers are retrieved, otherwise not.
readGroupAtomNumbersIf TRUE , group atom numbers are retrieved, otherwise not.
readGroupDirectionIf TRUE , group directions are retrieved, otherwise not.
readIndicesIf TRUE , cell indices calculated from the row and column (x,y) coordinates are retrieved, otherwise not. Note that these indices are one-based .
readIsPmIf TRUE , cell flags indicating whether the cell is a perfect-match (PM) probe or not are retrieved, otherwise not.
stratifyByA character string specifying which and how elements in group fields are returned. If "nothing" , elements are returned as is, i.e. as vector s. If "pm" / "mm" , only elements corresponding to perfect-match (PM) / mismatch (MM) probes are returned (as vector s). If "pmmm" , elements are returned as a matrix where the first row holds elements corresponding to PM probes and the second corresponding to MM probes. Note that in this case, it is assumed that there are equal number of PMs and MMs; if not, an error is generated. Moreover, the PMs and MMs may not even be paired, i.e. there is no guarantee that the two elements in a column corresponds to a PM-MM pair.
verboseAn integer specifying the verbose level. If 0, the file is parsed quietly. The higher numbers, the more details.

Value

A list with one component for each unit. Every component is again a list with three components

*

Seealso

It is recommended to use readCdfUnits () instead of this method. readCdfHeader () for getting the header of a CDF file.

Note

This version of the function does not return information on the QC probes. This will be added in a (near) future release. In addition we expect the header to be part of the returned object.

So expect changes to the structure of the value of the function in next release. Please contact the developers for details.

Author

James Bullard and Kasper Daniel Hansen.

References

[1] Affymetrix Inc, Affymetrix GCOS 1.x compatible file formats, June 14, 2005. http://www.affymetrix.com/support/developer/

Link to this function

readCdfCellIndices()

Reads (one-based) cell indices of units (probesets) in an Affymetrix CDF file

Description

Reads (one-based) cell indices of units (probesets) in an Affymetrix CDF file.

Usage

readCdfCellIndices(filename, units=NULL, stratifyBy=c("nothing", "pmmm", "pm", "mm"),
  verbose=0)

Arguments

ArgumentDescription
filenameThe filename of the CDF file.
unitsAn integer vector of unit indices specifying which units to be read. If NULL , all units are read.
stratifyByA character string specifying which and how elements in group fields are returned. If "nothing" , elements are returned as is, i.e. as vector s. If "pm" / "mm" , only elements corresponding to perfect-match (PM) / mismatch (MM) probes are returned (as vector s). If "pmmm" , elements are returned as a matrix where the first row holds elements corresponding to PM probes and the second corresponding to MM probes. Note that in this case, it is assumed that there are equal number of PMs and MMs; if not, an error is generated. Moreover, the PMs and MMs may not even be paired, i.e. there is no guarantee that the two elements in a column corresponds to a PM-MM pair.
verboseAn integer specifying the verbose level. If 0, the file is parsed quietly. The higher numbers, the more details.

Value

A named list where the names corresponds to the names of the units read. Each unit element of the list is in turn a list structure with one element groups which in turn is a list . Each group element in groups is a list with a single field named indices . Thus, the structure is | list(" ", " cdf ", " +- unit #1 ", " | +- "groups" ", " | +- group #1 ", " | | +- "indices" ", " | | group #2 ", " | | +- "indices" ", " | . ", " | +- group #K ", " | +- "indices" ", " +- unit #2 ", " . ", " +- unit #J ", " ") |

This is structure is compatible with what readCdfUnits () returns.

Note that these indices are list("one-based") .

Seealso

readCdfUnits ().

Author

Henrik Bengtsson

Link to this function

readCdfDataFrame()

Reads units (probesets) from an Affymetrix CDF file

Description

Reads units (probesets) from an Affymetrix CDF file. Gets all or a subset of units (probesets).

Usage

readCdfDataFrame(filename, units=NULL, groups=NULL, cells=NULL, fields=NULL, drop=TRUE,
  verbose=0)

Arguments

ArgumentDescription
filenameThe filename of the CDF file.
unitsAn integer vector of unit indices specifying which units to be read. If NULL , all are read.
groupsAn integer vector of group indices specifying which groups to be read. If NULL , all are read.
cellsAn integer vector of cell indices specifying which cells to be read. If NULL , all are read.
fieldsA character vector specifying what fields to read. If NULL , all unit, group and cell fields are returned.
dropIf TRUE and only one field is read, then a vector (rather than a single-column data.frame ) is returned.
verboseAn integer specifying the verbose level. If 0, the file is parsed quietly. The higher numbers, the more details.

Value

An NxK data.frame or a vector of length N.

Seealso

For retrieving the CDF as a list structure, see readCdfUnits .

Author

Henrik Bengtsson

References

[1] Affymetrix Inc, Affymetrix GCOS 1.x compatible file formats, June 14, 2005. http://www.affymetrix.com/support/developer/

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

# Find any CDF file
cdfFile <- findCdf()

units <- 101:120
fields <- c("unit", "unitName", "group", "groupName", "cell")
df <- readCdfDataFrame(cdfFile, units=units, fields=fields)
stopifnot(identical(sort(unique(df$unit)), units))

fields <- c("unit", "unitName", "unitType")
fields <- c(fields, "group", "groupName")
fields <- c(fields, "x", "y", "cell", "pbase", "tbase")
df <- readCdfDataFrame(cdfFile, units=units, fields=fields)
stopifnot(identical(sort(unique(df$unit)), units))


##############################################################
}                                                     # STOP #
##############################################################
Link to this function

readCdfGroupNames()

Reads group names for a set of units (probesets) in an Affymetrix CDF file

Description

Reads group names for a set of units (probesets) in an Affymetrix CDF file.

This is for instance useful for SNP arrays where the nucleotides used for the A and B alleles are the same as the group names.

Usage

readCdfGroupNames(filename, units=NULL, truncateGroupNames=TRUE, verbose=0)

Arguments

ArgumentDescription
filenameThe filename of the CDF file.
unitsAn integer vector of unit indices specifying which units to be read. If NULL , all units are read.
truncateGroupNamesA logical variable indicating whether unit names should be stripped from the beginning of group names.
verboseAn integer specifying the verbose level. If 0, the file is parsed quietly. The higher numbers, the more details.

Value

A named list structure where the names of the elements are the names of the units read. Each element is a character vector with group names for the corresponding unit.

Seealso

readCdfUnits ().

Author

Henrik Bengtsson

Link to this function

readCdfHeader()

Reads the header associated with an Affymetrix CDF file

Description

Reads the header of an Affymetrix CDF file using the Fusion SDK.

Usage

readCdfHeader(filename)

Arguments

ArgumentDescription
filenamename of the CDF file.

Value

A named list with the following components:

*

Seealso

readCdfUnits .

Author

James Bullard and Kasper Daniel Hansen

Examples

for (zzz in 0) {

# Find any CDF file
cdfFile <- findCdf()
if (is.null(cdfFile))
break

header <- readCdfHeader(cdfFile)
print(header)

} # for (zzz in 0)

Checks if cells in a CDF file are perfect-match probes or not

Description

Checks if cells in a CDF file are perfect-match probes or not.

Usage

readCdfIsPm(filename, units=NULL, verbose=0)

Arguments

ArgumentDescription
filenameThe filename of the CDF file.
unitsAn integer vector of unit indices specifying which units to be read. If NULL , all units are read.
verboseAn integer specifying the verbose level. If 0, the file is parsed quietly. The higher numbers, the more details.

Value

A named list of named logical vectors. The name of the list elements are unit names and the names of the logical vector are group names.

Author

Henrik Bengtsson

Link to this function

readCdfNbrOfCellsPerUnitGroup()

Gets the number of cells (probes) that each group of each unit in a CDF file

Description

Gets the number of cells (probes) that each group of each unit in a CDF file.

Usage

readCdfNbrOfCellsPerUnitGroup(filename, units=NULL, verbose=0)

Arguments

ArgumentDescription
filenameThe filename of the CDF file.
unitsAn integer vector of unit indices specifying which units to be read. If NULL , all units are read.
verboseAn integer specifying the verbose level. If 0, the file is parsed quietly. The higher numbers, the more details.

Value

A named list of named integer vectors. The name of the list elements are unit names and the names of the integer vector are group names.

Author

Henrik Bengtsson

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

cdfFile <- findCdf("Mapping10K_Xba131")

groups <- readCdfNbrOfCellsPerUnitGroup(cdfFile)

# Number of units read
print(length(groups))
##   11564

# Details on two units
print(groups[56:57])
## $`SNP_A-1516438`
## SNP_A-1516438C SNP_A-1516438T SNP_A-1516438C SNP_A-1516438T
##             10             10             10             10
##
## $`SNP_A-1508602`
## SNP_A-1508602A SNP_A-1508602G SNP_A-1508602A SNP_A-1508602G
##             10             10             10             10


# Number of groups with different number of cells
print(table(unlist(groups)))
##    10    60
## 46240     4


# Number of cells per unit
nbrOfCellsPerUnit <- unlist(lapply(groups, FUN=sum))
print(table(nbrOfCellsPerUnit))
nbrOfCellsPerUnit
##    40    60
## 11560     4


# Number of groups per unit
nbrOfGroupsPerUnit <- unlist(lapply(groups, FUN=length))

# Details on a few units
print(nbrOfGroupsPerUnit[20:30])
## SNP_A-1512666 SNP_A-1512740 SNP_A-1512132 SNP_A-1516082 SNP_A-1511962
##             4             4             4             4             4
## SNP_A-1515637 SNP_A-1515878 SNP_A-1518789 SNP_A-1518296 SNP_A-1519701
##             4             4             4             4             4
## SNP_A-1511743
##             4

# Number of units for each unique number of groups
print(table(nbrOfGroupsPerUnit))
## nbrOfGroupsPerUnit
##     1     4
##     4 11560

x <- list()
for (size in unique(nbrOfGroupsPerUnit)) {
subset <- groups[nbrOfGroupsPerUnit==size]
t <- matrix(unlist(subset), nrow=size)
colnames(t) <- names(subset)
x[[as.character(size)]] <- t
rm(subset, t)
}

# Check if there are any quartet units where the number
# of cells in Group 1 & 2 or Group 3 & 4 does not have
# the same number of cells.
# Group 1 & 2
print(sum(x[["4"]][1,]-x[["4"]][2,] != 0))
# 0

# Group 3 & 4
print(sum(x[["4"]][3,]-x[["4"]][4,] != 0))
# 0

##############################################################
}                                                     # STOP #
##############################################################

Reads the QC units of CDF file

Description

Reads the QC units of CDF file.

Usage

readCdfQc(filename, units = NULL, verbose = 0)

Arguments

ArgumentDescription
filenamename of the CDF file.
unitsThe QC unit indices as a vector of integers. NULL indicates that all units should be read.
verbosehow verbose should the output be. 0 means no output, with higher numbers being more verbose.

Value

A list with one component for each QC unit.

Seealso

readCdf .

Author

Kasper Daniel Hansen

Link to this function

readCdfUnitNames()

Reads unit (probeset) names from an Affymetrix CDF file

Description

Gets the names of all or a subset of units (probesets) in an Affymetrix CDF file. This can be used to get a map between unit names an the internal unit indices used by the CDF file.

Usage

readCdfUnitNames(filename, units=NULL, verbose=0)

Arguments

ArgumentDescription
filenameThe filename of the CDF file.
unitsAn integer vector of unit indices specifying which units to be read. If NULL , all units are read.
verboseAn integer specifying the verbose level. If 0, the file is parsed quietly. The higher numbers, the more details.

Value

A character vector of unit names.

Seealso

readCdfUnits ().

Author

Henrik Bengtsson ( http://www.braju.com/R/ )

Examples

See help(readCdfUnits) for an example

Reads units (probesets) from an Affymetrix CDF file

Description

Reads units (probesets) from an Affymetrix CDF file. Gets all or a subset of units (probesets).

Usage

readCdfUnits(filename, units=NULL, readXY=TRUE, readBases=TRUE, readExpos=TRUE,
  readType=TRUE, readDirection=TRUE, stratifyBy=c("nothing", "pmmm", "pm", "mm"),
  readIndices=FALSE, verbose=0)

Arguments

ArgumentDescription
filenameThe filename of the CDF file.
unitsAn integer vector of unit indices specifying which units to be read. If NULL , all units are read.
readXYIf TRUE , cell row and column (x,y) coordinates are retrieved, otherwise not.
readBasesIf TRUE , cell P and T bases are retrieved, otherwise not.
readExposIf TRUE , cell "expos" values are retrieved, otherwise not.
readTypeIf TRUE , unit types are retrieved, otherwise not.
readDirectionIf TRUE , unit and group directions are retrieved, otherwise not.
stratifyByA character string specifying which and how elements in group fields are returned. If "nothing" , elements are returned as is, i.e. as vector s. If "pm" / "mm" , only elements corresponding to perfect-match (PM) / mismatch (MM) probes are returned (as vector s). If "pmmm" , elements are returned as a matrix where the first row holds elements corresponding to PM probes and the second corresponding to MM probes. Note that in this case, it is assumed that there are equal number of PMs and MMs; if not, an error is generated. Moreover, the PMs and MMs may not even be paired, i.e. there is no guarantee that the two elements in a column corresponds to a PM-MM pair.
readIndicesIf TRUE , cell indices calculated from the row and column (x,y) coordinates are retrieved, otherwise not. Note that these indices are one-based .
verboseAn integer specifying the verbose level. If 0, the file is parsed quietly. The higher numbers, the more details.

Value

A named list where the names corresponds to the names of the units read. Each element of the list is in turn a list structure with three components:

*

Seealso

readCdfCellIndices ().

Author

James Bullard and Kasper Daniel Hansen. Modified by Henrik Bengtsson ( http://www.braju.com/R/ ) to read any subset of units and/or subset of parameters, to stratify by PM/MM, and to return cell indices.d

References

[1] Affymetrix Inc, Affymetrix GCOS 1.x compatible file formats, June 14, 2005. http://www.affymetrix.com/support/developer/

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

# Find any CDF file
cdfFile <- findCdf()

# Read all units in a CDF file [~20s => 0.34ms/unit]
cdf0 <- readCdfUnits(cdfFile, readXY=FALSE, readExpos=FALSE)

# Read a subset of units in a CDF file [~6ms => 0.06ms/unit]
units1 <- c(5, 100:109, 34)
cdf1 <- readCdfUnits(cdfFile, units=units1, readXY=FALSE, readExpos=FALSE)
stopifnot(identical(cdf1, cdf0[units1]))
rm(cdf0)

# Create a unit name to index map
names <- readCdfUnitNames(cdfFile)
units2 <- match(names(cdf1), names)
stopifnot(all.equal(units1, units2))
cdf2 <- readCdfUnits(cdfFile, units=units2, readXY=FALSE, readExpos=FALSE)

stopifnot(identical(cdf1, cdf2))

##############################################################
}                                                     # STOP #
##############################################################
Link to this function

readCdfUnitsWriteMap()

Generates an Affymetrix cell-index write map from a CDF file

Description

Generates an Affymetrix cell-index write map from a CDF file.

The purpose of this method is to provide a re-ordering of cell elements such that cells in units (probesets) can be stored in contiguous blocks. When reading cell elements unit by unit, minimal file re-position is required resulting in a faster reading.

Note: At the moment does this package not provide methods to write/reorder CEL files. In the meanwhile, you have to write and re-read using your own file format. That's not too hard using writeBin() and readBin ().

Usage

readCdfUnitsWriteMap(filename, units=NULL, ..., verbose=FALSE)

Arguments

ArgumentDescription
filenameThe pathname of the CDF file.
unitsAn integer vector of unit indices specifying which units to listed first. All other units are added in order at the end. If NULL , units are in order.
...Additional arguments passed to readCdfUnits ().
verboseEither a logical , a numeric , or a Verbose object specifying how much verbose/debug information is written to standard output. If a Verbose object, how detailed the information is is specified by the threshold level of the object. If a numeric, the value is used to set the threshold of a new Verbose object. If TRUE , the threshold is set to -1 (minimal). If FALSE , no output is written (and neither is the R.utils package required).

Value

A integer vector which is a write map.

Seealso

To invert maps, see invertMap (). readCel () and readCelUnits ().

Author

Henrik Bengtsson

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

# Find any CDF file
cdfFile <- findCdf()

# Create a cell-index map (for writing)
writeMap <- readCdfUnitsWriteMap(cdfFile)

# Inverse map to be used to read cell elements such that, when read
# read unit by unit, they are read much faster.
readMap <- invertMap(writeMap)

# Validate the two maps
stopifnot(identical(readMap[writeMap], 1:length(readMap)))


cat("Summary of the "randomness" of the cell indices:
")
moves <- diff(readMap) - 1
cat(sprintf("Number of unnecessary file re-positioning: %d (%.1f%%)
",
sum(moves != 0), 100*sum(moves != 0)/length(moves)))
cat(sprintf("Extra positioning: %.1fGb
", sum(abs(moves))/1024^3))

smallMoves <- moves[abs(moves) <= 25];
largeMoves <- moves[abs(moves)  > 25];
layout(matrix(1:2))
main <- "Non-signed file moves required in unorded file"
hist(smallMoves, nclass=51, main=main, xlab="moves <=25 bytes")
hist(largeMoves, nclass=101, main="", xlab="moves >25 bytes")

# Clean up
layout(1)
rm(cdfFile, readMap, writeMap, moves, smallMoves, largeMoves, main)

##############################################################
}                                                     # STOP #
##############################################################



##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Function to read Affymetrix probeset annotations
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
readAffymetrixProbesetAnnotation <- function(pathname, ...) {
# Get headers
header <- scan(pathname, what="character", sep=",", quote=""",
quiet=TRUE, nlines=1);

# Read only a subset of columns (unique to this example)
cols <- c("Probe Set ID"="probeSet",
"Chromosome"="chromosome",
"Physical Position"="physicalPosition",
"dbSNP RS ID"="dbSnpId");

colClasses <- rep("NULL", length(header));
colClasses[header %in% names(cols)] <- "character";

# Read the data (this is what takes time)
df <- read.table(pathname, colClasses=colClasses, header=TRUE, sep=",",
quote=""", na.strings="---", strip.white=TRUE, check.names=FALSE,
blank.lines.skip=FALSE, fill=FALSE, comment.char="", ...);

# Re-order columns
df <- df[,match(names(cols),colnames(df))];
colnames(df) <- cols;

# Use "Probe Set ID" as rownames. Note that if we use 'row.names=1'
# or similar something goes wrong. /HB 2006-03-06
rownames(df) <- df[[1]];
df <- df[,-1];

# Change types of columns
df[[1]] <- factor(df[[1]], levels=c(1:22,"X","Y",NA), ordered=TRUE);
df[[2]] <- as.integer(df[[2]]);

df;
} # readAffymetrixProbesetAnnotation()



# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Main
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
for (zz in 1) {
# Chip to be remapped
chipType <- "Mapping50K_Xba240"

annoFile <- paste(chipType, "_annot.csv", sep="")
cdfFile <- findCdf(chipType)
|if (is.null(cdfFile) || !file.exists(annoFile))|
break;

# Read SNP location details
snpInfo <- readAffymetrixProbesetAnnotation(annoFile)

# Order by chromsome and then physical position
o <- order(snpInfo[[1]], snpInfo[[2]])
snpInfo <- snpInfo[o,]
rm(o)

# Read unit names in CDF file
unitNames <- readCdfUnitNames(cdfFile)

# The CDF unit indices sorted by chromsomal position
units <- match(rownames(snpInfo), unitNames)

# ...and cell indices in the same order
writeMap <- readCdfUnitsWriteMap(cdfFile, units=units)

# Inverse map to be used to write cell elements such that, if they
# later are read unit by unit, they are read in contiguous blocks.
readMap <- invertMap(writeMap)

# Clean up
rm(chipType, annoFile, cdfFile, snpInfo, unitNames, units, readMap, writeMap)

} # for (zz in 1)
##############################################################
}                                                     # STOP #
##############################################################

Reads an Affymetrix CEL file

Description

This function reads all or a subset of the data in an Affymetrix CEL file.

Usage

readCel(filename, 
        indices = NULL, 
        readHeader = TRUE, 
        readXY = FALSE, readIntensities = TRUE,
        readStdvs = FALSE, readPixels = FALSE,
        readOutliers = TRUE, readMasked = TRUE, 
        readMap = NULL,
        verbose = 0,
        .checkArgs = TRUE)

Arguments

ArgumentDescription
filenamethe name of the CEL file.
indicesa vector of indices indicating which features to read. If the argument is NULL all features will be returned.
readXYa logical: will the (x,y) coordinates be returned.
readIntensitiesa logical: will the intensities be returned.
readStdvsa logical: will the standard deviations be returned.
readPixelsa logical: will the number of pixels be returned.
readOutliersa logical: will the outliers be return.
readMaskeda logical: will the masked features be returned.
readHeadera logical: will the header of the file be returned.
readMapA vector remapping cell indices to file indices. If NULL , no mapping is used.
verbosehow verbose do we want to be. 0 is no verbosity, higher numbers mean more verbose output. At the moment the values 0, 1 and 2 are supported.

|.checkArgs | If TRUE , the arguments will be validated, otherwise not. list("Warning: This should only be used if the ", " arguments have been validated elsewhere!")|

Value

A CEL files consists of a header , a set of cell values , and information about outliers and masked cells.

The cell values, which are values extract for each cell (aka feature or probe), are the (x,y) coordinate, intensity and standard deviation estimates, and the number of pixels in the cell. If readIndices=NULL , cell values for all cells are returned, Only cell values specified by argument readIndices are returned.

This value returns a named list with components described below:

The elements of the cell values are ordered according to argument indices . The lengths of the cell-value elements equals the number of cells read.

Which of the above elements that are returned are controlled by the readNnn arguments. If FALSE , the corresponding element above is NULL , e.g. if readStdvs=FALSE then stdvs is NULL .

Seealso

readCelHeader for a description of the header output. Often a user only wants to read the intensities, look at readCelIntensities for a function specialized for that use.

Author

James Bullard and Kasper Daniel Hansen

Examples

for (zzz in 0) {  # Only so that 'break' can be used

# Scan current directory for CEL files
|celFiles <- list.files(pattern="[.](c|C)(e|E)(l|L)$")|
if (length(celFiles) == 0)
break;

celFile <- celFiles[1]

# Read a subset of cells
idxs <- c(1:5, 1250:1500, 450:440)
cel <- readCel(celFile, indices=idxs, readOutliers=TRUE)
str(cel)

# Clean up
rm(celFiles, celFile, cel)

} # for (zzz in 0)
Link to this function

readCelHeader()

Parsing the header of an Affymetrix CEL file

Description

Reads in the header of an Affymetrix CEL file using the Fusion SDK.

Usage

readCelHeader(filename)

Arguments

ArgumentDescription
filenamethe name of the CEL file.

Details

This function returns the header of a CEL file. Affymetrix operates with different versions of this file format. Depending on what version is being read, different information is accessible.

Value

A named list with components described below. The entries are obtained from the Fusion SDK interface functions. We try to obtain all relevant information from the file.

*

Seealso

readCel for reading in the entire CEL file. That function also returns the header. See affxparserInfo for general comments on the package and the Fusion SDK.

Note

Memory usage:the Fusion SDK allocates memory for the entire CEL file, when the file is accessed. The memory footprint of this function will therefore seem to be (rather) large.

Speed: CEL files of version 2 (standard text files) needs to be completely read in order to report the number of outliers and masked features.

Author

James Bullard and Kasper Daniel Hansen

Examples

# Scan current directory for CEL files
|files <- list.files(pattern="[.](c|C)(e|E)(l|L)$")|
if (length(files) > 0) {
header <- readCelHeader(files[1])
print(header)
rm(header)
}

# Clean up
rm(files)
Link to this function

readCelIntensities()

Reads the intensities contained in several Affymetrix CEL files

Description

Reads the intensities of several Affymetrix CEL files (as opposed to readCel () which only reads a single file).

Usage

readCelIntensities(filenames, indices = NULL, ..., verbose = 0)

Arguments

ArgumentDescription
filenamesthe names of the CEL files as a character vector.
indicesa vector of which indices should be read. If the argument is NULL all features will be returned.
...Additional arguments passed to readCel ().
verbosean integer: how verbose do we want to be, higher means more verbose.

Details

The function will initially allocate a matrix with the same memory footprint as the final object.

Value

A matrix with a number of rows equal to the length of the indices argument (or the number of features on the entire chip), and a number of columns equal to the number of files. The columns are ordered according to the filenames argument.

Seealso

readCel () for a discussion of a more versatile function, particular with details of the indices argument.

Note

Currently this function builds on readCel (), and simply calls this function multiple times. If testing yields sufficient reasons for doing so, it may be re-implemented in C++.

Author

James Bullard and Kasper Daniel Hansen

Examples

# Scan current directory for CEL files
|files <- list.files(pattern="[.](c|C)(e|E)(l|L)$")|
if (length(files) >= 2) {
cel <- readCelIntensities(files[1:2])
str(cel)
rm(cel)
}

# Clean up
rm(files)
Link to this function

readCelRectangle()

Reads a spatial subset of probe-level data from Affymetrix CEL files

Description

Reads a spatial subset of probe-level data from Affymetrix CEL files.

Usage

readCelRectangle(filename, xrange=c(0, Inf), yrange=c(0, Inf), ..., asMatrix=TRUE)

Arguments

ArgumentDescription
filenameThe pathname of the CEL file.
xrangeA numeric vector of length two giving the left and right coordinates of the cells to be returned.
yrangeA numeric vector of length two giving the top and bottom coordinates of the cells to be returned.
...Additional arguments passed to readCel ().
asMatrixIf TRUE , the CEL data fields are returned as matrices with element (1,1) corresponding to cell (xrange[1],yrange[1]).

Value

A named list CEL structure similar to what readCel (). In addition, if asMatrix is TRUE , the CEL data fields are returned as matrices, otherwise not.

Seealso

The readCel () method is used internally.

Author

Henrik Bengtsson

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

rotate270 <- function(x, ...) {
x <- t(x)
nc <- ncol(x)
if (nc < 2) return(x)
x[,nc:1,drop=FALSE]
}


# Search for some available CEL files
path <- system.file("rawData", package="AffymetrixDataTestFiles")
|file <- findFiles(pattern="[.](cel|CEL)$", path=path, recursive=TRUE)|


# Read CEL intensities in the upper left corner
cel <- readCelRectangle(file, xrange=c(0,250), yrange=c(0,250))
z <- rotate270(cel$intensities)
sub <- paste("Chip type:", cel$header$chiptype)
image(z, col=gray.colors(256), axes=FALSE, main=basename(file), sub=sub)
text(x=0, y=1, labels="(0,0)", adj=c(0,-0.7), cex=0.8, xpd=TRUE)
text(x=1, y=0, labels="(250,250)", adj=c(1,1.2), cex=0.8, xpd=TRUE)

# Clean up
rm(rotate270, files, file, cel, z, sub)


##############################################################
}                                                     # STOP #
##############################################################

Reads probe-level data ordered as units (probesets) from one or several Affymetrix CEL files

Description

Reads probe-level data ordered as units (probesets) from one or several Affymetrix CEL files by using the unit and group definitions in the corresponding Affymetrix CDF file.

Usage

readCelUnits(filenames, units=NULL, stratifyBy=c("nothing", "pmmm", "pm", "mm"),
  cdf=NULL, ..., addDimnames=FALSE, dropArrayDim=TRUE, transforms=NULL, readMap=NULL,
  verbose=FALSE)

Arguments

ArgumentDescription
filenamesThe filenames of the CEL files.
unitsAn integer vector of unit indices specifying which units to be read. If NULL , all units are read.
stratifyByArgument passed to low-level method readCdfCellIndices .
cdfA character filename of a CDF file, or a CDF list structure. If NULL , the CDF file is searched for by findCdf () first starting from the current directory and then from the directory where the first CEL file is.
...Arguments passed to low-level method readCel , e.g. readXY and readStdvs .
addDimnamesIf TRUE , dimension names are added to arrays, otherwise not. The size of the returned CEL structure in bytes increases by 30-40% with dimension names.
dropArrayDimIf TRUE and only one array is read, the elements of the group field do not have an array dimension.
transformsA list of exactly length(filenames) function s. If NULL , no transformation is performed. Intensities read are passed through the corresponding transform function before being returned.
readMapA vector remapping cell indices to file indices. If NULL , no mapping is used.
verboseEither a logical , a numeric , or a Verbose object specifying how much verbose/debug information is written to standard output. If a Verbose object, how detailed the information is is specified by the threshold level of the object. If a numeric, the value is used to set the threshold of a new Verbose object. If TRUE , the threshold is set to -1 (minimal). If FALSE , no output is written (and neither is the R.utils package required).

Value

A named list with one element for each unit read. The names corresponds to the names of the units read. Each unit element is in turn a list structure with groups (aka blocks). Each group contains requested fields, e.g. intensities , stdvs , and pixels . If more than one CEL file is read, an extra dimension is added to each of the fields corresponding, which can be used to subset by CEL file.

Note that neither CEL headers nor information about outliers and masked cells are returned. To access these, use readCelHeader () and readCel ().

Seealso

Internally, readCelHeader (), readCdfUnits () and readCel () are used.

Author

Henrik Bengtsson

References

[1] Affymetrix Inc, Affymetrix GCOS 1.x compatible file formats, June 14, 2005. http://www.affymetrix.com/support/developer/

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

# Search for some available CEL files
path <- system.file("rawData", package="AffymetrixDataTestFiles")
|files <- findFiles(pattern="[.](cel|CEL)$", path=path, recursive=TRUE, firstOnly=FALSE)|
files <- grep("FusionSDK_Test3", files, value=TRUE)
files <- grep("Calvin", files, value=TRUE)

# Fake more CEL files if not enough
files <- rep(files, length.out=5)
print(files);
rm(files);


##############################################################
}                                                     # STOP #
##############################################################

A function to read Affymetrix CHP files

Description

This function will parse any type of CHP file and return the results in a list. The contents of the list will depend on the type of CHP file that is parsed and readers are referred to Affymetrix documentation of what should be there, and how to interpret it.

Usage

readChp(filename, withQuant = TRUE)

Arguments

ArgumentDescription
filenameThe name of the CHP file to read.
withQuantA boolean value, currently largely unused.

Details

This is an interface to the Affymetrix Fusion SDK. The Affymetrix documentation should be consulted for explicit details.

Value

A list is returned. The contents of the list depend on the type of CHP file that was read. Users may want to translate the different outputs into specific containers.

Seealso

readCel

Author

R. Gentleman

Examples

if (require("AffymetrixDataTestFiles")) {
path <- system.file("rawData", package="AffymetrixDataTestFiles")
|files <- findFiles(pattern="[.](chp|CHP)$", path=path,|
recursive=TRUE, firstOnly=FALSE)

s1 = readChp(files[1])
length(s1)
names(s1)
names(s1[[7]])
}

Parsing a CLF file using Affymetrix Fusion SDK

Description

This function parses a CLF file using the Affymetrix Fusion SDK. CLF (chip layout) files contain information associating probe ids with chip x- and y- coordinates.

Usage

readClf(file)

Arguments

ArgumentDescription
filecharacter(1) providing a path to the CLF file to be input.

Value

An list. The header element is always present.

*

Seealso

https://www.affymetrix.com/support/developer/fusion/File_Format_CLF_aptv161.pdf describes CLF file content.

Author

Martin Morgan

Parsing a CLF file using Affymetrix Fusion SDK

Description

This function parses a CLF file using the Affymetrix Fusion SDK. CLF (chip layout) files contain information associating probe ids with chip x- and y- coordinates.

Usage

readClfEnv(file, readBody = TRUE)

Arguments

ArgumentDescription
filecharacter(1) providing a path to the CLF file to be input.
readBodylogical(1) indicating whether the entire file should be parsed ( TRUE ) or only the file header information describing the chips to which the file is relevant.

Value

An environment. The header element is always present; the remainder are present when readBody=TRUE .

*

Seealso

https://www.affymetrix.com/support/developer/fusion/File_Format_CLF_aptv161.pdf describes CLF file content.

Author

Martin Morgan

Link to this function

readClfHeader()

Read the header of a CLF file.

Description

Reads the header of a CLF file. The exact information stored in this file can be viewed in the readClfEnv documentation which reads the header in addition to the body.

Usage

readClfHeader(file)

Arguments

ArgumentDescription
filefile a CLF file

Value

A list of header elements.

Parsing a PGF file using Affymetrix Fusion SDK

Description

This function parses a PGF file using the Affymetrix Fusion SDK. PGF (probe group) files describe probes present within probe sets, including the type (e.g., pm, mm) of the probe and probeset.

Usage

readPgf(file, indices = NULL)

Arguments

ArgumentDescription
filecharacter(1) providing a path to the PGF file to be input.
indicesinteger(n) a vector of indices of the probesets to be read.

Value

An list. The header element is always present; the remainder are present when readBody=TRUE .

The elements present when readBody=TRUE describe probe sets, atoms, and probes. Elements within probe sets, for instance, are coordinated such that the i th index of one vector (e.g., probesetId ) corresponds to the i th index of a second vector (e.g., probesetType ). The atoms contained within probeset i are in positions probesetStartAtom[i]:(probesetStartAtom[i+1]-1) of the atom vectors. A similar map applies to probes within atoms, using atomStartProbe as the index.

The PGF file format includes optional elements; these elements are always present in the list, but with appropriate default values.

*

Seealso

https://www.affymetrix.com/support/developer/fusion/File_Format_PGF_aptv161.pdf describes PGF file content.

The internal function .pgfProbeIndexFromProbesetIndex provides a map between the indices of probe set entries and the indices of the probes contained in the probe set.

Author

Martin Morgan

Parsing a PGF file using Affymetrix Fusion SDK

Description

This function parses a PGF file using the Affymetrix Fusion SDK. PGF (probe group) files describe probes present within probe sets, including the type (e.g., pm, mm) of the probe and probeset.

Usage

readPgfEnv(file, readBody = TRUE, indices = NULL)

Arguments

ArgumentDescription
filecharacter(1) providing a path to the PGF file to be input.
readBodylogical(1) indicating whether the entire file should be parsed ( TRUE ) or only the file header information describing the chips to which the file is relevant.
indicesinteger(n) vector of positive integers indicating which probesets to read. These integers must be sorted (increasing) and unique.

Value

An environment. The header element is always present; the remainder are present when readBody=TRUE .

The elements present when readBody=TRUE describe probe sets, atoms, and probes. Elements within probe sets, for instance, are coordinated such that the i th index of one vector (e.g., probesetId ) corresponds to the i th index of a second vector (e.g., probesetType ). The atoms contained within probeset i are in positions probesetStartAtom[i]:(probesetStartAtom[i+1]-1) of the atom vectors. A similar map applies to probes within atoms, using atomStartProbe as the index.

The PGF file format includes optional elements; these elements are always present in the environment, but with appropriate default values.

*

Seealso

https://www.affymetrix.com/support/developer/fusion/File_Format_PGF_aptv161.pdf describes PGF file content.

The internal function .pgfProbeIndexFromProbesetIndex provides a map between the indices of probe set entries and the indices of the probes contained in the probe set.

Author

Martin Morgan

Link to this function

readPgfHeader()

Read the header of a PGF file into a list.

Description

This function reads the header of a PGF file into a list more details on what the exact fields are can be found in the details section.

Usage

readPgfHeader(file)

Arguments

ArgumentDescription
filefile :A file in PGF format

Details

https://www.affymetrix.com/support/developer/fusion/File_Format_PGF_aptv161.pdf

Value

A list corresponding to the elements in the header.

Updates a CEL file

Description

Updates a CEL file.

Usage

updateCel(filename, indices=NULL, intensities=NULL, stdvs=NULL, pixels=NULL,
  writeMap=NULL, ..., verbose=0)

Arguments

ArgumentDescription
filenameThe filename of the CEL file.
indicesA numeric vector of cell (probe) indices specifying which cells to updated. If NULL , all indices are considered.
intensitiesA numeric vector of intensity values to be stored. Alternatively, it can also be a named data.frame or matrix (or list ) where the named columns (elements) are the fields to be updated.
stdvsA optional numeric vector .
pixelsA optional numeric vector .
writeMapAn optional write map.
...Not used.
verboseAn integer specifying how much verbose details are outputted.

Details

Currently only binary (v4) CEL files are supported. The current version of the method does not make use of the Fusion SDK, but its own code to navigate and update the CEL file.

Value

Returns (invisibly) the pathname of the file updated.

Author

Henrik Bengtsson

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

# Search for some available Calvin CEL files
path <- system.file("rawData", package="AffymetrixDataTestFiles")
|files <- findFiles(pattern="[.](cel|CEL)$", path=path, recursive=TRUE, firstOnly=FALSE)|
files <- grep("FusionSDK_HG-U133A", files, value=TRUE)
files <- grep("Calvin", files, value=TRUE)
file <- files[1]

# Convert to an XDA CEL file
filename <- file.path(tempdir(), basename(file))
if (file.exists(filename))
file.remove(filename)
convertCel(file, filename)


fields <- c("intensities", "stdvs", "pixels")

# Cells to be updated
idxs <- 1:2

# Get CEL header
hdr <- readCelHeader(filename)

# Get the original data
cel <- readCel(filename, indices=idxs, readStdvs=TRUE, readPixels=TRUE)
print(cel[fields])
cel0 <- cel

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Square-root the intensities
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
updateCel(filename, indices=idxs, intensities=sqrt(cel$intensities))
cel <- readCel(filename, indices=idxs, readStdvs=TRUE, readPixels=TRUE)
print(cel[fields])


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Update a few cell values by a data frame
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
data <- data.frame(
intensities=cel0$intensities,
stdvs=c(201.1, 3086.1)+0.5,
pixels=c(9,9+1)
)
updateCel(filename, indices=idxs, data)

# Assert correctness of update
cel <- readCel(filename, indices=idxs, readStdvs=TRUE, readPixels=TRUE)
print(cel[fields])
for (ff in fields) {
stopifnot(all.equal(cel[[ff]], data[[ff]], .Machine$double.eps^0.25))
}

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Update a region of the CEL file
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Load pre-defined data
side <- 306
pathname <- system.file("extras/easternEgg.gz", package="affxparser")
con <- gzfile(pathname, open="rb")
z <- readBin(con=con, what="integer", size=1, signed=FALSE, n=side^2)
close(con)
z <- matrix(z, nrow=side)
side <- min(hdr$cols - 2*22, side)
z <- as.double(z[1:side,1:side])
x <- matrix(22+0:(side-1), nrow=side, ncol=side, byrow=TRUE)
idxs <- as.vector((1 + x) + hdr$cols*t(x))
# Load current data in the same region
z0 <- readCel(filename, indices=idxs)$intensities
# Mix the two data sets
z <- (0.3*z^2 + 0.7*z0)
# Update the CEL file
updateCel(filename, indices=idxs, intensities=z)

# Make some spatial changes
rotate270 <- function(x, ...) {
x <- t(x)
nc <- ncol(x)
if (nc < 2) return(x)
x[,nc:1,drop=FALSE]
}

# Display a spatial image of the updated CEL file
cel <- readCelRectangle(filename, xrange=c(0,350), yrange=c(0,350))
z <- rotate270(cel$intensities)
sub <- paste("Chip type:", cel$header$chiptype)
image(z, col=gray.colors(256), axes=FALSE, main=basename(filename), sub=sub)
text(x=0, y=1, labels="(0,0)", adj=c(0,-0.7), cex=0.8, xpd=TRUE)
text(x=1, y=0, labels="(350,350)", adj=c(1,1.2), cex=0.8, xpd=TRUE)


# Clean up
file.remove(filename)
rm(files, cel, cel0, idxs, data, ff, fields, rotate270)


##############################################################
}                                                     # STOP #
##############################################################
Link to this function

updateCelUnits()

Updates a CEL file unit by unit

Description

Updates a CEL file unit by unit. list()

list("Please note that, contrary to ", list(list("readCelUnits")), "(), this method ", " can only update a single CEL file at the time.")

Usage

updateCelUnits(filename, cdf=NULL, data, ..., verbose=0)

Arguments

ArgumentDescription
filenameThe filename of the CEL file.
cdfA (optional) CDF list structure either with field indices or fields x and y . If NULL , the unit names (and from there the cell indices) are inferred from the names of the elements in data .
dataA list structure in a format similar to what is returned by readCelUnits () for a single CEL file only .
...Optional arguments passed to readCdfCellIndices (), which is called if cdf is not given.
verboseAn integer specifying how much verbose details are outputted.

Value

Returns what updateCel () returns.

Seealso

Internally, updateCel () is used.

Author

Henrik Bengtsson

Examples

##############################################################
if (require("AffymetrixDataTestFiles")) {            # START #
##############################################################

# Search for some available Calvin CEL files
path <- system.file("rawData", package="AffymetrixDataTestFiles")
|files <- findFiles(pattern="[.](cel|CEL)$", path=path, recursive=TRUE, firstOnly=FALSE)|
files <- grep("FusionSDK_Test3", files, value=TRUE)
files <- grep("Calvin", files, value=TRUE)
file <- files[1]

# Convert to an XDA CEL file
pathname <- file.path(tempdir(), basename(file))
if (file.exists(pathname))
file.remove(pathname)
convertCel(file, pathname)




# Check for the CDF file
hdr <- readCelHeader(pathname)
cdfFile <- findCdf(hdr$chiptype)

hdr <- readCdfHeader(cdfFile)
nbrOfUnits <- hdr$nunits
print(nbrOfUnits);

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Example: Read and re-write the same data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
units <- c(101, 51)
data1 <- readCelUnits(pathname, units=units, readStdvs=TRUE)
cat("Original data:
")
str(data1)
updateCelUnits(pathname, data=data1)
data2 <- readCelUnits(pathname, units=units, readStdvs=TRUE)
cat("Updated data:
")
str(data2)
stopifnot(identical(data1, data2))


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Example: Random read and re-write "stress test"
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
for (kk in 1:10) {
nunits <- sample(min(1000,nbrOfUnits), size=1)
units <- sample(nbrOfUnits, size=nunits)
cat(sprintf("%02d. Selected %d random units: reading", kk, nunits));
t <- system.time({
data1 <- readCelUnits(pathname, units=units, readStdvs=TRUE)
}, gcFirst=TRUE)[3]
cat(sprintf(" [%.2fs=%.2fs/unit], updating", t, t/nunits))
t <- system.time({
updateCelUnits(pathname, data=data1)
}, gcFirst=TRUE)[3]
cat(sprintf(" [%.2fs=%.2fs/unit], validating", t, t/nunits))
data2 <- readCelUnits(pathname, units=units, readStdvs=TRUE)
stopifnot(identical(data1, data2))
cat(". done
")
}

##############################################################
}                                                     # STOP #
##############################################################

Creates a binary CDF file

Description

This function creates a binary CDF file given a valid CDF structure containing all necessary elements.

list("Warning: The API for this function is likely to be changed ", " in future versions.")

Usage

writeCdf(fname, cdfheader, cdf, cdfqc, overwrite=FALSE, verbose=0)

Arguments

ArgumentDescription
fnamename of the CDF file.
cdfheaderA list with a structure equal to the output of readCdfHeader .
cdfA list with a structure equal to the output of readCdf .
cdfqcA list with a structure equal to the output of readCdfQc .
overwriteOverwrite existing file?
verbosehow verbose should the output be. 0 means no output, with higher numbers being more verbose.

Details

This function has been validated mainly by reading in various ASCII or binary CDF files which are written back as new CDF files, and compared element by element with the original files.

Value

This function is used for its byproduct: creating a CDF file.

Seealso

To read the CDF "regular" and QC units with all necessary fields and values for writing a CDF file, see readCdf , readCdfQc () and readCdfHeader . To compare two CDF files, see compareCdfs .

Author

Kasper Daniel Hansen

Link to this function

writeCdfHeader()

Writes a CDF header

Description

Writes a CDF header. list("This method is not intended to be used explicitly. ", " To write a CDF, use ", list(list("writeCdf")), "() instead.")

Usage

writeCdfHeader(con, cdfHeader, unitNames, qcUnitLengths, unitLengths, verbose=0)

Arguments

ArgumentDescription
conAn open connection to which nothing has been written.
cdfHeaderA CDF header list structure.
unitNamesA character vector of all unit names.
qcUnitLengthsAn integer vector of all the number of bytes in each of the QC units.
unitLengthsAn integer vector of all the number of bytes in each of the (ordinary) units.
verboseAn integer specifying how much verbose details are outputted.

Value

Returns nothing.

Seealso

This method is called by writeCdf (). See also writeCdfQcUnits () and writeCdfUnits ().

Author

Henrik Bengtsson

Link to this function

writeCdfQcUnits()

Writes CDF QC units

Description

Writes CDF QC units. list("This method is not intended to be used explicitly. ", " To write a CDF, use ", list(list("writeCdf")), "() instead.")

Usage

writeCdfQcUnits(con, cdfQcUnits, verbose=0)

Arguments

ArgumentDescription
conAn open connection to which a CDF header already has been written by writeCdfHeader ().
cdfQcUnitsA list structure of CDF QC units as returned by readCdf () ( not readCdfUnits ()).
verboseAn integer specifying how much verbose details are outputted.

Value

Returns nothing.

Seealso

This method is called by writeCdf (). See also writeCdfHeader () and writeCdfUnits ().

Author

Henrik Bengtsson

Link to this function

writeCdfUnits()

Writes CDF units

Description

Writes CDF units. list("This method is not intended to be used explicitly. ", " To write a CDF, use ", list(list("writeCdf")), "() instead.")

Usage

writeCdfUnits(con, cdfUnits, verbose=0)

Arguments

ArgumentDescription
conAn open connection to which a CDF header and QC units already have been written by writeCdfHeader () and writeCdfQcUnits (), respectively.
cdfUnitsA list structure of CDF units as returned by readCdf () ( not readCdfUnits ()).
verboseAn integer specifying how much verbose details are outputted.

Value

Returns nothing.

Seealso

This method is called by writeCdf (). See also writeCdfHeader () and writeCdfQcUnits ().

Author

Henrik Bengtsson

Link to this function

writeCelHeader()

Writes a CEL header to a connection

Description

Writes a CEL header to a connection.

Usage

writeCelHeader(con, header, outputVersion=c("4"), ...)

Arguments

ArgumentDescription
conA connection .
headerA list structure describing the CEL header, similar to the structure returned by readCelHeader ().
outputFormatA character string specifying the output format. Currently only CEL version 4 (binary;XDA) are supported.
...Not used.

Details

Currently only CEL version 4 (binary;XDA) headers can be written.

Value

Returns (invisibly) the pathname of the file created.

Author

Henrik Bengtsson

Writes BPMAP and TPMAP files.

Description

Writes BPMAP and TPMAP files.

Usage

writeTpmap(filename, bpmaplist, verbose = 0)
tpmap2bpmap(tpmapname, bpmapname, verbose = 0)

Arguments

ArgumentDescription
filenameThe filename.
bpmaplistA list structure similar to the result of readBpmap .
tpmapnameFilename of the TPMAP file.
bpmapnameFilename of the BPMAP file.
verboseHow verbose do we want to be.

Details

writeTpmap writes a text probe map file, while tpmap2bpmap converts such a file to a binary probe mapping file. Somehow Affymetrix has different names for the same structure, depending on whether the file is binary or text. I have seen many TPMAP files referred to as BPMAP files.

Value

These functions are called for their side effects (creating files).

Seealso

readBpmap

Author

Kasper Daniel Hansen