bioconductor v3.9.0 Scater
A collection of tools for doing various analyses of
Link to this section Summary
Functions
The "Single Cell Expression Set" (SCESet) class
Additional accessors for the typical elements of a SingleCellExperiment object.
Accessor and replacement for bootstrap results in a
SingleCellExperiment
object
Calculate average counts, adjusting for size factors or library size
Calculate counts per million (CPM)
Calculate fragments per kilobase of exon per million reads mapped (FPKM)
Calculate QC metrics
Calculate transcripts-per-million (TPM)
Centre size factors at unity
Get feature annotation information from Biomart
Estimate the percentage of variance explained for each PC.
Estimate the percentage of variance explained for each gene.
Identify outlier values
Compute library size factors
Multiple plot function for ggplot2 plots
Count the number of non-zero counts per cell or feature
Normalize a SingleCellExperiment object using pre-computed size factors
Divide columns of a count matrix by the size factors
Plot column metadata
Plot the explanatory PCs for each variable
Plot explanatory variables ordered by percentage of variance explained
Plot expression values for all cells
Plot frequency against mean for each feature
Plot expression against transcript length
Plot heatmap of gene expression values
Plot the highest expressing features
Plot cells in plate positions
Plot a relative log expression (RLE) plot
Plot reduced dimensions
Plot row metadata
Plot an overview of expression for each cell
Plot specific reduced dimensions
Read sparse count matrix from file
Create a diffusion map from cell-level data
Perform MDS on cell-level data
Perform PCA on cell-level data
Perform t-SNE on cell-level data
Perform UMAP on cell-level data
Cell information for the small example single-cell counts dataset to demonstrate capabilities of scater
A small example of single-cell counts dataset to demonstrate capabilities of scater
Single-cell analysis toolkit for expression in R
General visualization parameters
Variable selection for visualization
Sum counts across a set of cells
Sum counts across a feature set
Convert an SCESet object to a SingleCellExperiment object
Make feature names unique
Link to this section Functions
SCESet()
The "Single Cell Expression Set" (SCESet) class
Description
S4 class and the main class used by scater to hold single cell expression data. SCESet extends the basic Bioconductor ExpressionSet class.
Details
This class is initialized from a matrix of expression values.
Methods that operate on SCESet objects constitute the basic scater workflow.
References
Thanks to the Monocle package (github.com/cole-trapnell-lab/monocle-release/) for their CellDataSet class, which provided the inspiration and template for SCESet.
accessors()
Additional accessors for the typical elements of a SingleCellExperiment object.
Description
Convenience functions to access commonly-used assays of the
SingleCellExperiment
object.
Usage
norm_exprs(object)
norm_exprs(object) <- value
stand_exprs(object)
stand_exprs(object) <- value
fpkm(object)
fpkm(object) <- value
Arguments
Argument | Description |
---|---|
object | SingleCellExperiment class object from which to access or to which to assign assay values. Namely: "exprs", norm_exprs", "stand_exprs", "fpkm". The following are imported from SingleCellExperiment : "counts", "normcounts", "logcounts", "cpm", "tpm". |
value | a numeric matrix (e.g. for exprs ) |
Value
a matrix of normalised expression data
a matrix of standardised expressiond data
a matrix of FPKM values
A matrix of numeric, integer or logical values.
Author
Davis McCarthy
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts), colData = sc_example_cell_info)
example_sce <- normalize(example_sce)
head(logcounts(example_sce)[,1:10])
head(exprs(example_sce)[,1:10]) # identical to logcounts()
example_sce <- SingleCellExperiment(
assays = list(norm_counts = sc_example_counts), colData = sc_example_cell_info)
counts(example_sce) <- sc_example_counts
norm_exprs(example_sce) <- log2(calculateCPM(example_sce, use_size_factors = FALSE) + 1)
stand_exprs(example_sce) <- log2(calculateCPM(example_sce, use_size_factors = FALSE) + 1)
tpm(example_sce) <- calculateTPM(example_sce, effective_length = 5e4)
cpm(example_sce) <- calculateCPM(example_sce, use_size_factors = FALSE)
fpkm(example_sce)
bootstraps()
Accessor and replacement for bootstrap results in a
SingleCellExperiment
object
Description
SingleCellExperiment
objects can contain bootstrap
expression values (for example, as generated by the kallisto software for
quantifying feature abundance). These functions conveniently access and
replace the 'bootstrap' elements in the assays
slot with the value
supplied, which must be an matrix of the correct size, namely the same
number of rows and columns as the SingleCellExperiment
object as a
whole.
Usage
bootstraps(object)
bootstraps(object) <- value
list(list("bootstraps"), list("SingleCellExperiment"))(object)
list(list("bootstraps"), list("SingleCellExperiment,array"))(object) <- value
Arguments
Argument | Description |
---|---|
object | a SingleCellExperiment object. |
value | an array of class "numeric" containing bootstrap expression values |
Value
If accessing bootstraps slot of an SingleCellExperiment
, then
an array with the bootstrap values, otherwise an SingleCellExperiment
object containing new bootstrap values.
Author
Davis McCarthy
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts), colData = sc_example_cell_info)
bootstraps(example_sce)
calculateAverage()
Calculate average counts, adjusting for size factors or library size
Description
Calculate average counts per feature, adjusting them to account for normalization due to size factors or library sizes.
Usage
calculateAverage(object, exprs_values = "counts",
use_size_factors = TRUE, subset_row = NULL,
BPPARAM = SerialParam())
calcAverage(object, exprs_values = "counts", use_size_factors = TRUE,
subset_row = NULL, BPPARAM = SerialParam())
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object or count matrix. |
exprs_values | A string specifying the assay of object containing the count matrix, if object is a SingleCellExperiment. |
use_size_factors | a logical scalar specifying whether the size factors in object should be used to construct effective library sizes. |
subset_row | A vector specifying the subset of rows of object for which to return a result. |
BPPARAM | A BiocParallelParam object specifying whether the calculations should be parallelized. |
Details
The size-adjusted average count is defined by dividing each count by the size factor and taking the average across cells. All sizes factors are scaled so that the mean is 1 across all cells, to ensure that the averages are interpretable on the scale of the raw counts.
Assuming that object
is a SingleCellExperiment:
If
use_size_factors=TRUE
, size factors are automatically extracted from the object. Note that different size factors may be used for features marked as spike-in controls. This is due to the presence of control-specific size factors inobject
, seenormalizeSCE
for more details.If
use_size_factors=FALSE
, all size factors inobject
are ignored. Size factors are instead computed from the library sizes, usinglibrarySizeFactors
.If
use_size_factors
is a numeric vector, it will override the any size factors for non-spike-in features inobject
. The spike-in size factors will still be used for the spike-in transcripts. If no size factors are available, they will be computed from the library sizes usinglibrarySizeFactors
.
If object
is a matrix or matrix-like object, size factors can be supplied by setting use_size_factors
to a numeric vector.
Otherwise, the sum of counts for each cell is used as the size factor through librarySizeFactors
.
Value
Vector of average count values with same length as number of features, or the number of features in subset_row
if supplied.
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
list(counts = sc_example_counts),
colData = sc_example_cell_info)
## calculate average counts
ave_counts <- calculateAverage(example_sce)
calculateCPM()
Calculate counts per million (CPM)
Description
Calculate count-per-million (CPM) values from the count data.
Usage
calculateCPM(object, exprs_values = "counts", use_size_factors = TRUE,
subset_row = NULL)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object or count matrix. |
exprs_values | A string specifying the assay of object containing the count matrix, if object is a SingleCellExperiment. |
use_size_factors | A logical scalar indicating whether size factors in object should be used to compute effective library sizes. If not, all size factors are deleted and library size-based factors are used instead (see librarySizeFactors . Alternatively, a numeric vector containing a size factor for each cell, which is used in place of sizeFactor(object) . |
subset_row | A vector specifying the subset of rows of object for which to return a result. |
Details
If requested, size factors are used to define the effective library sizes. This is done by scaling all size factors such that the mean scaled size factor is equal to the mean sum of counts across all features. The effective library sizes are then used to in the denominator of the CPM calculation.
Assuming that object
is a SingleCellExperiment:
If
use_size_factors=TRUE
, size factors are automatically extracted from the object. Note that effective library sizes may be computed differently for features marked as spike-in controls. This is due to the presence of control-specific size factors inobject
, seenormalizeSCE
for more details.If
use_size_factors=FALSE
, all size factors inobject
are ignored. The total count for each cell will be used as the library size for all features (endogenous genes and spike-in controls).If
use_size_factors
is a numeric vector, it will override the any size factors for non-spike-in features inobject
. The spike-in size factors will still be used for the spike-in transcripts. If no size factors are available, the library sizes will be used.
If object
is a matrix or matrix-like object, size factors will only be used if use_size_factors
is a numeric vector.
Otherwise, the sum of counts for each cell is directly used as the library size.
Value
Numeric matrix of CPM values.
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
list(counts = sc_example_counts),
colData = sc_example_cell_info)
cpm(example_sce) <- calculateCPM(example_sce, use_size_factors = FALSE)
calculateFPKM()
Calculate fragments per kilobase of exon per million reads mapped (FPKM)
Description
Calculate fragments per kilobase of exon per million reads mapped (FPKM) values for expression from counts for a set of features.
Usage
calculateFPKM(object, effective_length, ..., subset_row = NULL)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object or a numeric matrix of counts. |
effective_length | Numeric vector providing the effective length for each feature in object . |
... | Further arguments to pass to calculateCPM . |
subset_row | A vector specifying the subset of rows of object for which to return a result. |
Value
A numeric matrix of FPKM values.
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
list(counts = sc_example_counts),
colData = sc_example_cell_info)
eff_len <- runif(nrow(example_sce), 500, 2000)
fout <- calculateFPKM(example_sce, eff_len, use_size_factors = FALSE)
calculateQCMetrics()
Calculate QC metrics
Description
Compute quality control (QC) metrics for each feature and cell in a SingleCellExperiment object, accounting for specified control sets.
Usage
calculateQCMetrics(object, exprs_values = "counts",
feature_controls = NULL, cell_controls = NULL, percent_top = c(50,
100, 200, 500), detection_limit = 0, use_spikes = TRUE,
compact = FALSE, BPPARAM = SerialParam())
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object containing expression values, usually counts. |
exprs_values | A string indicating which assays in the object should be used to define expression. |
feature_controls | A named list containing one or more vectors (a character vector of feature names, a logical vector, or a numeric vector of indices), used to identify feature controls such as ERCC spike-in sets or mitochondrial genes. |
cell_controls | A named list containing one or more vectors (a character vector of cell (sample) names, a logical vector, or a numeric vector of indices), used to identify cell controls, e.g., blank wells or bulk controls. |
percent_top | An integer vector. Each element is treated as a number of top genes to compute the percentage of library size occupied by the most highly expressed genes in each cell. See pct_X_top_Y_features below for more details. |
detection_limit | A numeric scalar to be passed to nexprs , specifying the lower detection limit for expression. |
use_spikes | A logical scalar indicating whether existing spike-in sets in object should be automatically added to feature_controls , see ? . |
compact | A logical scalar indicating whether the metrics should be returned in a compact format as a nested DataFrame. |
BPPARAM | A BiocParallelParam object specifying whether the QC calculations should be parallelized. |
Details
This function calculates useful quality control metrics to help with pre-processing of data and identification of potentially problematic features and cells.
Underscores in assayNames(object)
and in feature_controls
or cell_controls
can cause theoretically cause ambiguities in the names of the output metrics.
While problems are highly unlikely, users are advised to avoid underscores when naming their controls/assays.
If the expression values are double-precision, the per-row means may not be exactly identity for different choices of BPPARAM
.
This is due to differences in rounding error when summation is performed across different numbers of cores.
If it is important to obtain numerically identical results (e.g., when using the per-row means for sensitive procedures like t-SNE) across various parallelization schemes,
we suggest manually calculating those statistics using rowMeans
.
Value
A SingleCellExperiment object containing QC metrics in the row and column metadata.
Author
Davis McCarthy, with (many!) modifications by Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- calculateQCMetrics(example_sce)
## with a set of feature controls defined
example_sce <- calculateQCMetrics(example_sce,
feature_controls = list(set1 = 1:40))
## with a named set of feature controls defined
example_sce <- calculateQCMetrics(example_sce,
feature_controls = list(ERCC = 1:40))
calculateTPM()
Calculate transcripts-per-million (TPM)
Description
Calculate transcripts-per-million (TPM) values for expression from counts for a set of features.
Usage
calculateTPM(object, effective_length = NULL, exprs_values = "counts",
subset_row = NULL)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object or a count matrix. |
effective_length | Numeric vector containing the effective length for each feature in object . If NULL , it is assumed that exprs_values has already been adjusted for transcript length. |
exprs_values | String or integer specifying the assay containing the counts in object , if it is a SingleCellExperiment. |
subset_row | A vector specifying the subset of rows of object for which to return a result. |
Details
For read count data, this function assumes uniform coverage along the (effective) length of the transcript. Thus, the number of transcripts for a gene is proportional to the read count divided by the transcript length.
For UMI count data, this function should be run with effective_length=NULL
, i.e., no division by the effective length.
This is because the number of UMIs is a direct (albeit probably biased) estimate of the number of transcripts.
Value
A numeric matrix of TPM values.
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info)
eff_len <- runif(nrow(example_sce), 500, 2000)
tout <- calculateTPM(example_sce, effective_length = eff_len)
centreSizeFactors()
Centre size factors at unity
Description
Scales all size factors so that the average size factor across cells is equal to 1.
Usage
centreSizeFactors(object, centre = 1)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object containing any number (or zero) sets of size factors. |
centre | A numeric scalar, the value around which all sets of size factors should be centred. |
Details
Centering of size factors at unity ensures that division by size factors yields values on the same scale as the raw counts. This is important for the interpretation of the normalized values, as well as comaprisons between features normalized with different size factors (e.g., spike-ins).
Value
A SingleCellExperiment with modified size factors that are centred at unity.
Seealso
Author
Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
sizeFactors(example_sce) <- runif(ncol(example_sce))
sizeFactors(example_sce, "ERCC") <- runif(ncol(example_sce))
example_sce <- centreSizeFactors(example_sce)
mean(sizeFactors(example_sce))
mean(sizeFactors(example_sce, "ERCC"))
getBMFeatureAnnos()
Get feature annotation information from Biomart
Description
Use the biomaRt package to add feature annotation information to an SingleCellExperiment
.
Usage
getBMFeatureAnnos(object, ids = rownames(object),
filters = "ensembl_gene_id", attributes = c(filters, "mgi_symbol",
"chromosome_name", "gene_biotype", "start_position", "end_position"),
biomart = "ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl",
host = "www.ensembl.org")
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
ids | A character vector containing the identifiers for all rows of object , of the same type specified by filters . |
filters | Character vector defining the filters to pass to the getBM function. |
attributes | Character vector defining the attributes to pass to getBM . |
biomart | String defining the biomaRt to be used, to be passed to useMart . Default is "ENSEMBL_MART_ENSEMBL" . |
dataset | String defining the dataset to use, to be passed to useMart . Default is "mmusculus_gene_ensembl" , which should be changed if the organism is not mouse. |
host | Character string argument which can be used to select a particular "host" to pass to useMart . Useful for accessing archived versions of biomaRt data. Default is "www.ensembl.org" , in which case the current version of the biomaRt (now hosted by Ensembl) is used. |
Value
A SingleCellExperiment object containing feature annotation.
The input feature_symbol
appears as the feature_symbol
field in the rowData
of the output object.
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
mock_id <- paste0("ENSMUSG", sprintf("%011d", seq_len(nrow(example_sce))))
example_sce <- getBMFeatureAnnos(example_sce, ids=mock_id)
getExplanatoryPCs()
Estimate the percentage of variance explained for each PC.
Description
Estimate the percentage of variance explained for each PC.
Usage
getExplanatoryPCs(object, use_dimred = "PCA", ncomponents = 10,
rerun = FALSE, run_args = list(), ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object containing expression values and per-cell experimental information. |
use_dimred | String specifying the field in reducedDims(object) that contains the PCA results. |
ncomponents | Integer scalar specifying the number of the top principal components to use. |
rerun | Logical scalar indicating whether the PCA should be repeated, even if pre-computed results are already present. |
run_args | A named list of arguments to pass to runPCA . |
... | Additional arguments passed to getVarianceExplained . |
Details
This function computes the percentage of variance in PC scores that is explained by variables in the sample-level metadata. It allows identification of important PCs that are driven by known experimental conditions, e.g., treatment, disease. PCs correlated with technical factors (e.g., batch effects, library size) can also be detected and removed prior to further analysis.
By default, the function will attempt to use pre-computed PCA results in object
.
This is done by taking the top ncomponents
PCs from the matrix identified by use_dimred
.
If these are not available or if rerun=TRUE
, the function will rerun the PCA using runPCA
.
Value
A matrix containing the percentage of variance explained by each factor (column) and for each PC (row).
Seealso
plotExplanatoryPCs
,
getVarianceExplained
Author
Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info)
example_sce <- normalize(example_sce)
r2mat <- getExplanatoryPCs(example_sce)
getVarianceExplained()
Estimate the percentage of variance explained for each gene.
Description
Estimate the percentage of variance explained for each gene.
Usage
getVarianceExplained(object, exprs_values = "logcounts",
variables = NULL, chunk = 1000)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object containing expression values and per-cell experimental information. |
exprs_values | String specifying the expression values for which to compute the variance. |
variables | Character vector specifying the explanatory factors in colData(object) to use. Default is NULL , in which case all variables in colData(object) are considered. |
chunk | Integer scalar specifying the chunk size for chunk-wise processing. Only affects the speed/memory usage trade-off. |
Details
This function computes the percentage of variance in gene expression that is explained by variables in the sample-level metadata. It allows problematic factors to be quickly identified, as well as the genes that are most affected.
Value
A matrix containing the percentage of variance explained by each factor (column) and for each gene (row).
Seealso
Author
Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info)
example_sce <- normalize(example_sce)
r2mat <- getVarianceExplained(example_sce)
isOutlier()
Identify outlier values
Description
Convenience function to determine which values in a numeric vector are outliers based on the median absolute deviation (MAD).
Usage
isOutlier(metric, nmads = 5, type = c("both", "lower", "higher"),
log = FALSE, subset = NULL, batch = NULL, min_diff = NA)
Arguments
Argument | Description |
---|---|
metric | Numeric vector of values. |
nmads | A numeric scalar, specifying the minimum number of MADs away from median required for a value to be called an outlier. |
type | String indicating whether outliers should be looked for at both tails ( "both" ), only at the lower tail ( "lower" ) or the upper tail ( "higher" ). |
log | Logical scalar, should the values of the metric be transformed to the log10 scale before computing MADs? |
subset | Logical or integer vector, which subset of values should be used to calculate the median/MAD? If NULL , all values are used. Missing values will trigger a warning and will be automatically ignored. |
batch | Factor of length equal to metric , specifying the batch to which each observation belongs. A median/MAD is calculated for each batch, and outliers are then identified within each batch. |
min_diff | A numeric scalar indicating the minimum difference from the median to consider as an outlier. The outlier threshold is defined from the larger of nmads MADs and min_diff , to avoid calling many outliers when the MAD is very small. If NA , it is ignored. |
Details
Lower and upper thresholds are stored in the "threshold"
attribute of the returned vector.
This is a numeric vector of length 2 when batch=NULL
for the threshold on each side.
Otherwise, it is a matrix with one named column per level of batch
and two rows (one per threshold).
Value
A logical vector of the same length as the metric
argument, specifying the observations that are considered as outliers.
Author
Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- calculateQCMetrics(example_sce)
## with a set of feature controls defined
example_sce <- calculateQCMetrics(example_sce,
feature_controls = list(set1 = 1:40))
isOutlier(example_sce$total_counts, nmads = 3)
librarySizeFactors()
Compute library size factors
Description
Define size factors from the library sizes after centering. This ensures that the library size adjustment yields values comparable to those generated after normalization with other sets of size factors.
Usage
librarySizeFactors(object, exprs_values = "counts", subset_row = NULL)
Arguments
Argument | Description |
---|---|
object | A count matrix or SingleCellExperiment object containing counts. |
exprs_values | A string indicating the assay of object containing the counts, if object is a SingleCellExperiment. |
subset_row | A vector specifying whether the rows of object should be (effectively) subsetted before calculating library sizes. |
Value
A numeric vector of size factors.
Examples
data("sc_example_counts")
summary(librarySizeFactors(sc_example_counts))
multiplot()
Multiple plot function for ggplot2 plots
Description
Place multiple ggplot
plots on one page.
Usage
multiplot(..., plotlist = NULL, cols = 1, layout = NULL)
Arguments
Argument | Description |
---|---|
... | One or more ggplot objects. |
plotlist | A list of ggplot objects, as an alternative to ... . |
cols | A numeric scalar giving the number of columns in the layout. |
layout | A matrix specifying the layout. If present, cols is ignored. |
Details
If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE)
, then:
plot 1 will go in the upper left;
plot 2 will go in the upper right;
and plot 3 will go all the way across the bottom. There is no way to tweak the relative heights or widths of the plots with this simple function. It was adapted from http://www.cookbook-r.com/Graphs/Multiplegraphs_on_one_page(ggplot2)/
Value
A ggplot object.
Examples
library(ggplot2)
## This example uses the ChickWeight dataset, which comes with ggplot2
## First plot
p1 <- ggplot(ChickWeight, aes(x = Time, y = weight, colour = Diet, group = Chick)) +
geom_line() +
ggtitle("Growth curve for individual chicks")
## Second plot
p2 <- ggplot(ChickWeight, aes(x = Time, y = weight, colour = Diet)) +
geom_point(alpha = .3) +
geom_smooth(alpha = .2, size = 1) +
ggtitle("Fitted growth curve per diet")
## Third plot
p3 <- ggplot(subset(ChickWeight, Time == 21), aes(x = weight, colour = Diet)) +
geom_density() +
ggtitle("Final weight, by diet")
## Fourth plot
p4 <- ggplot(subset(ChickWeight, Time == 21), aes(x = weight, fill = Diet)) +
geom_histogram(colour = "black", binwidth = 50) +
facet_grid(Diet ~ .) +
ggtitle("Final weight, by diet") +
theme(legend.position = "none") # No legend (redundant in this graph)
## Combine plots and display
multiplot(p1, p2, p3, p4, cols = 2)
nexprs()
Count the number of non-zero counts per cell or feature
Description
An efficient internal function that counts the number of non-zero counts in each row (per feature) or column (per cell). This avoids the need to construct an intermediate logical matrix.
Usage
nexprs(object, detection_limit = 0, exprs_values = "counts",
byrow = FALSE, subset_row = NULL, subset_col = NULL,
BPPARAM = SerialParam())
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object or a numeric matrix of expression values. |
detection_limit | Numeric scalar providing the value above which observations are deemed to be expressed. |
exprs_values | String or integer specifying the assay of object to obtain the count matrix from, if object is a SingleCellExperiment. |
byrow | Logical scalar indicating whether to count the number of detected cells per feature. If FALSE , the function will count the number of detected features per cell. |
subset_row | Logical, integer or character vector indicating which rows (i.e. features) to use. |
subset_col | Logical, integer or character vector indicating which columns (i.e., cells) to use. |
BPPARAM | A BiocParallelParam object specifying whether the calculations should be parallelized. |
Details
Setting subset_row
or subset_col
is equivalent to subsetting object
before calling nexprs
,
but more efficient as a new copy of the matrix is not constructed.
Value
An integer vector containing counts per gene or cell, depending on the provided arguments.
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info)
nexprs(example_sce)[1:10]
nexprs(example_sce, byrow = TRUE)[1:10]
normalize()
Normalize a SingleCellExperiment object using pre-computed size factors
Description
Compute normalized expression values from count data in a SingleCellExperiment object, using the size factors stored in the object.
Usage
normalizeSCE(object, exprs_values = "counts", return_log = TRUE,
log_exprs_offset = NULL, centre_size_factors = TRUE,
preserve_zeroes = FALSE)
list(list("normalize"), list("SingleCellExperiment"))(object,
exprs_values = "counts", return_log = TRUE,
log_exprs_offset = NULL, centre_size_factors = TRUE,
preserve_zeroes = FALSE)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
exprs_values | String indicating which assay contains the count data that should be used to compute log-transformed expression values. |
return_log | Logical scalar, should normalized values be returned on the log2 scale? If TRUE , output is stored as "logcounts" in the returned object; if FALSE output is stored as "normcounts" . |
log_exprs_offset | Numeric scalar specifying the pseudo-count to add when log-transforming expression values. If NULL , the value is taken from metadata(object)$log.exprs.offset if defined, otherwise it is set to 1. |
centre_size_factors | Logical scalar indicating whether size fators should be centred. |
preserve_zeroes | Logical scalar indicating whether zeroes should be preserved when dealing with non-unity offsets. |
Details
Normalized expression values are computed by dividing the counts for each cell by the size factor for that cell.
This aims to remove cell-specific scaling biases, e.g., due to differences in sequencing coverage or capture efficiency.
If log=TRUE
, log-normalized values are calculated by adding log_exprs_offset
to the normalized count and performing a log2 transformation.
Features marked as spike-in controls will be normalized with control-specific size factors, if these are available. This reflects the fact that spike-in controls are subject to different biases than those that are removed by gene-specific size factors (namely, total RNA content). If size factors for a particular spike-in set are not available, a warning will be raised.
If centre_size_factors=TRUE
, all sets of size factors will be centred to have the same mean prior to calculation of normalized expression values.
This ensures that abundances are roughly comparable between features normalized with different sets of size factors.
By default, the centre mean is unity, which means that the computed exprs
can be interpreted as being on the same scale as log-counts.
It also means that the added log_exprs_offset
can be interpreted as a pseudo-count (i.e., on the same scale as the counts).
If preserve_zeroes=TRUE
and the pseudo-count is not unity, size factors are instead centered at the specified value of log_exprs_offset
.
The log-transformation is then performed on the normalized expression values with a pseudo-count of 1, which ensures that zeroes remain so in the output matrix.
This yields the same results as preserve_zeroes=FALSE
minus a matrix-wide constant of log2(log_exprs_offset)
.
In some cases, the function will return a DelayedMatrix with delayed division and log-transformation operations.
This requires that the assay specified by exprs_values
contains a DelayedMatrix , and only one set of size factors is used for all features.
This avoids the need to explicitly calculate normalized expression values across a very large (possibly file-backed) matrix.
Value
A SingleCellExperiment object containing normalized expression values in "normcounts"
if log=FALSE
,
and log-normalized expression values in "logcounts"
if log=TRUE
.
All size factors will also be centred in the output object if centre_size_factors=TRUE
.
Author
Davis McCarthy and Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
normalizeCounts()
Divide columns of a count matrix by the size factors
Description
Compute (log-)normalized expression values by dividing counts for each cell by the corresponding size factor.
Usage
normalizeCounts(x, size_factors, return_log = TRUE,
log_exprs_offset = 1, centre_size_factors = FALSE,
subset_row = NULL)
Arguments
Argument | Description |
---|---|
x | A count matrix, with cells in the columns and genes in the rows. |
size_factors | A numeric vector of size factors for all cells. |
return_log | Logical scalar, should normalized values be returned on the log2 scale? |
log_exprs_offset | Numeric scalar specifying the offset to add when log-transforming expression values. |
centre_size_factors | Logical scalar indicating whether size fators should be centred. |
subset_row | A vector specifying the subset of rows of x for which to return a result. |
Details
This function will compute log-normalized expression values from x
.
It will endeavour to return an object of the same class as x
, with particular focus on DelayedMatrix inputs/outputs.
Note that the default centre_size_factors
differs from that in normalizeSCE
.
Users of this function are assumed to know what they're doing with respect to normalization.
Value
A matrix-like object of (log-)normalized expression values.
Author
Aaron Lun
Examples
data("sc_example_counts")
normed <- normalizeCounts(sc_example_counts,
librarySizeFactors(sc_example_counts))
plotColData()
Plot column metadata
Description
Plot column-level (i.e., cell) metadata in an SingleCellExperiment object.
Usage
plotColData(object, y, x = NULL, colour_by = NULL, shape_by = NULL,
size_by = NULL, by_exprs_values = "logcounts",
by_show_single = FALSE, ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object containing expression values and experimental information. |
y | Specification of the column-level metadata to show on the y-axis, see ?" for possible values. Note that only metadata fields will be searched, assays will not be used. |
x | Specification of the column-level metadata to show on the x-axis, see ?" for possible values. Again, only metadata fields will be searched, assays will not be used. |
colour_by | Specification of a column metadata field or a feature to colour by, see ?" for possible values. |
shape_by | Specification of a column metadata field or a feature to shape by, see ?" for possible values. |
size_by | Specification of a column metadata field or a feature to size by, see ?" for possible values. |
by_exprs_values | A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see ?" for details. |
by_show_single | Logical scalar specifying whether single-level factors should be used for point aesthetics, see ?" for details. |
... | Additional arguments for visualization, see ?" for details. |
Details
If y
is continuous and x=NULL
, a violin plot is generated.
If x
is categorical, a grouped violin plot will be generated, with one violin for each level of x
.
If x
is continuous, a scatter plot will be generated.
If y
is categorical and x
is continuous, horizontal violin plots will be generated.
If x
is missing or categorical, rectangule plots will be generated where the area of a rectangle is proportional to the number of points for a combination of factors.
Note that plotPhenoData
and plotCellData
are synonyms for plotColData
.
These are artifacts of the transition from the old SCESet class, and will be deprecated in future releases.
Value
A ggplot object.
Author
Davis McCarthy, with modifications by Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- calculateQCMetrics(example_sce)
example_sce <- normalize(example_sce)
plotColData(example_sce, y = "total_features_by_counts",
x = "log10_total_counts", colour_by = "Mutation_Status")
plotColData(example_sce, y = "total_features_by_counts",
x = "log10_total_counts", colour_by = "Mutation_Status",
size_by = "Gene_0001", shape_by = "Treatment")
plotColData(example_sce, y = "Treatment",
x = "log10_total_counts", colour_by = "Mutation_Status")
plotColData(example_sce, y = "total_features_by_counts",
x = "Cell_Cycle", colour_by = "Mutation_Status")
plotExplanatoryPCs()
Plot the explanatory PCs for each variable
Description
Plot the explanatory PCs for each variable
Usage
plotExplanatoryPCs(object, nvars_to_plot = 10, npcs_to_plot = 50,
theme_size = 10, ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object containing expression values and experimental information. Alternatively, a matrix containing the output of getExplanatoryPCs . |
nvars_to_plot | Integer scalar specifying the number of variables with the greatest explanatory power to plot. This can be set to Inf to show all variables. |
npcs_to_plot | Integer scalar specifying the number of PCs to plot. |
theme_size | numeric scalar providing base font size for ggplot theme. |
... | Parameters to be passed to getExplanatoryPCs . |
Details
A density plot is created for each variable, showing the R-squared for each successive PC (up to npcs_to_plot
PCs).
Only the nvars_to_plot
variables with the largest maximum R-squared across PCs are shown.
If object
is a SingleCellExperiment object, getExplanatoryPCs
will be called to compute the variance in expression explained by each variable in each gene.
Users may prefer to run getExplanatoryPCs
manually and pass the resulting matrix as object
, in which case the R-squared values are used directly.
Value
A ggplot object.
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info)
example_sce <- normalize(example_sce)
plotExplanatoryPCs(example_sce)
plotExplanatoryVariables()
Plot explanatory variables ordered by percentage of variance explained
Description
Plot explanatory variables ordered by percentage of variance explained
Usage
plotExplanatoryVariables(object, nvars_to_plot = 10,
min_marginal_r2 = 0, theme_size = 10, ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object containing expression values and experimental information. Alternatively, a matrix containing the output of getVarianceExplained . |
nvars_to_plot | Integer scalar specifying the number of variables with the greatest explanatory power to plot. This can be set to Inf to show all variables. |
min_marginal_r2 | Numeric scalar specifying the minimal value required for median marginal R-squared for a variable to be plotted. Only variables with a median marginal R-squared strictly larger than this value will be plotted. |
theme_size | Numeric scalar specifying the font size to use for the plotting theme |
... | Parameters to be passed to getVarianceExplained . |
Details
A density plot is created for each variable, showing the distribution of R-squared across all genes.
Only the nvars_to_plot
variables with the largest median R-squared across genes are shown.
Variables are also only shown if they have median R-squared values above min_marginal_r2
.
If object
is a SingleCellExperiment object, getVarianceExplained
will be called to compute the variance in expression explained by each variable in each gene.
Users may prefer to run getVarianceExplained
manually and pass the resulting matrix as object
, in which case the R-squared values are used directly.
Value
A ggplot object.
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info)
example_sce <- normalize(example_sce)
plotExplanatoryVariables(example_sce)
plotExpression()
Plot expression values for all cells
Description
Plot expression values for a set of features (e.g. genes or transcripts) in a SingleExperiment object, against a continuous or categorical covariate for all cells.
Usage
plotExpression(object, features, x = NULL, exprs_values = "logcounts",
log2_values = FALSE, colour_by = NULL, shape_by = NULL,
size_by = NULL, by_exprs_values = exprs_values,
by_show_single = FALSE, xlab = NULL, feature_colours = TRUE,
one_facet = TRUE, ncol = 2, scales = "fixed", ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object containing expression values and other metadata. |
features | A character vector (of feature names), a logical vector or numeric vector (of indices) specifying the features to plot. |
x | Specification of a column metadata field or a feature to show on the x-axis, see ?" for possible values. |
exprs_values | A string or integer scalar specifying which assay in assays(object) to obtain expression values from. |
log2_values | Logical scalar, specifying whether the expression values be transformed to the log2-scale for plotting (with an offset of 1 to avoid logging zeroes). |
colour_by | Specification of a column metadata field or a feature to colour by, see ?" for possible values. |
shape_by | Specification of a column metadata field or a feature to shape by, see ?" for possible values. |
size_by | Specification of a column metadata field or a feature to size by, see ?" for possible values. |
by_exprs_values | A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see ?" for details. |
by_show_single | Logical scalar specifying whether single-level factors should be used for point aesthetics, see ?" for details. |
xlab | String specifying the label for x-axis. If NULL (default), x will be used as the x-axis label. |
feature_colours | Logical scalar indicating whether violins should be coloured by feature when x and colour_by are not specified and one_facet=TRUE . |
one_facet | Logical scalar indicating whether grouped violin plots for multiple features should be put onto one facet. Only relevant when x=NULL . |
ncol | Integer scalar, specifying the number of columns to be used for the panels of a multi-facet plot. |
scales | String indicating whether should multi-facet scales be fixed ( "fixed" ), free ( "free" ), or free in one dimension ( "free_x" , "free_y" ). Passed to the scales argument in the facet_wrap when multiple facets are generated. |
... | Additional arguments for visualization, see ?" for details. |
Details
This function plots expression values for one or more features.
If x
is not specified, a violin plot will be generated of expression values.
If x
is categorical, a grouped violin plot will be generated, with one violin for each level of x
.
If x
is continuous, a scatter plot will be generated.
If multiple features are requested and x
is not specified and one_facet=TRUE
, a grouped violin plot will be generated with one violin per feature.
This will be coloured by feature if colour_by=NULL
and feature_colours=TRUE
, to yield a more aesthetically pleasing plot.
Otherwise, if x
is specified or one_facet=FALSE
, a multi-panel plot will be generated where each panel corresponds to a feature.
Each panel will be a scatter plot or (grouped) violin plot, depending on the nature of x
.
Note that this assumes that the expression values are numeric.
If not, and x
is continuous, horizontal violin plots will be generated.
If x
is missing or categorical, rectangule plots will be generated where the area of a rectangle is proportional to the number of points for a combination of factors.
Value
A ggplot object.
Author
Davis McCarthy, with modifications by Aaron Lun
Examples
## prepare data
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- calculateQCMetrics(example_sce)
sizeFactors(example_sce) <- colSums(counts(example_sce))
example_sce <- normalize(example_sce)
## default plot
plotExpression(example_sce, 1:15)
## plot expression against an x-axis value
plotExpression(example_sce, c("Gene_0001", "Gene_0004"), x="Mutation_Status")
plotExpression(example_sce, c("Gene_0001", "Gene_0004"), x="Gene_0002")
## add visual options
plotExpression(example_sce, 1:6, colour_by = "Mutation_Status")
plotExpression(example_sce, 1:6, colour_by = "Mutation_Status",
shape_by = "Treatment", size_by = "Gene_0010")
## plot expression against expression values for Gene_0004
plotExpression(example_sce, 1:4, "Gene_0004", show_smooth = TRUE)
plotExprsFreqVsMean()
Plot frequency against mean for each feature
Description
Plot the frequency of expression (i.e., percentage of expressing cells) against the mean expression level for each feature in a SingleCellExperiment object.
Usage
plotExprsFreqVsMean(object, freq_exprs, mean_exprs, controls,
exprs_values = "counts", by_show_single = FALSE,
show_smooth = TRUE, show_se = TRUE, ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
freq_exprs | Specification of the row-level metadata field containing the number of expressing cells per feature, see ?" for possible values. Note that only metadata fields will be searched, assays will not be used. If not supplied or NULL , this defaults to "n_cells_by_counts" or equivalent for compacted data. |
mean_exprs | Specification of the row-level metadata field containing the mean expression of each feature, see ?" for possible values. Again, only metadata fields will be searched, assays will not be used. If not supplied or NULL , this defaults to "mean_counts" or equivalent for compacted data. |
controls | Specification of the row-level metadata column indicating whether a feature is a control, see ?" for possible values. Only metadata fields will be searched, assays will not be used. If not supplied, this defaults to "is_feature_control" or equivalent for compacted data. |
exprs_values | String specifying the assay used for the default freq_exprs and mean_exprs . This can be set to, e.g., "logcounts" so that freq_exprs defaults to "n_cells_by_logcounts" . |
by_show_single | Logical scalar specifying whether a single-level factor for controls should be used for colouring, see ?" for details. |
show_smooth | Logical scalar, should a smoothed fit (through feature controls if available; all features otherwise) be shown on the plot? See geom_smooth for details. |
show_se | Logical scalar, should the standard error be shown for a smoothed fit? |
... | Further arguments passed to plotRowData . |
Details
This function plots gene expression frequency versus mean expression level, which can be useful to assess the effects of technical dropout in the dataset. We fit a non-linear least squares curve for the relationship between expression frequency and mean expression. We use this curve to define the number of genes above high technical dropout and the numbers of genes that are expressed in at least 50% and at least 25% of cells.
The plot will attempt to colour the points based on whether the corresponding features are labelled as feature controls in object
.
This can be turned off by setting controls=NULL
.
Value
A ggplot object.
Seealso
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
example_sce <- calculateQCMetrics(example_sce,
feature_controls = list(set1 = 1:500))
plotExprsFreqVsMean(example_sce)
plotExprsFreqVsMean(example_sce, size_by = "is_feature_control")
plotExprsVsTxLength()
Plot expression against transcript length
Description
Plot mean expression values for all features in a SingleCellExperiment object against transcript length values.
Usage
plotExprsVsTxLength(object, tx_length = "median_feat_eff_len",
length_is_assay = FALSE, exprs_values = "logcounts",
log2_values = FALSE, colour_by = NULL, shape_by = NULL,
size_by = NULL, by_exprs_values = exprs_values,
by_show_single = FALSE, xlab = "Median transcript length",
show_exprs_sd = FALSE, ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
tx_length | Transcript lengths for all features, to plot on the x-axis. If length_is_assay=FALSE , this can take any of the values described in ?" for feature-level metadata; data in assays(object) will not be searched. Otherwise, if length_is_assay=TRUE , tx_length should be the name or index of an assay in object . |
length_is_assay | Logical scalar indicating whether tx_length refers to an assay of object containing transcript lengths for all features in all cells. |
exprs_values | A string or integer scalar specifying which assay in assays(object) to obtain expression values from. |
log2_values | Logical scalar, specifying whether the expression values be transformed to the log2-scale for plotting (with an offset of 1 to avoid logging zeroes). |
colour_by | Specification of a column metadata field or a feature to colour by, see ?" for possible values. |
shape_by | Specification of a column metadata field or a feature to shape by, see ?" for possible values. |
size_by | Specification of a column metadata field or a feature to size by, see ?" for possible values. |
by_exprs_values | A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see ?" for details. |
by_show_single | Logical scalar specifying whether single-level factors should be used for point aesthetics, see ?" for details. |
xlab | String specifying the label for x-axis. |
show_exprs_sd | Logical scalar indicating whether the standard deviation of expression values for each feature should be plotted. |
... | Additional arguments for visualization, see ?" for details. |
Details
If length_is_assay=TRUE
, the median transcript length of each feature across all cells is used.
This may be necessary if the effective transcript length differs across cells, e.g., as observed in the results from pseudo-aligners.
Value
A ggplot object.
Author
Davis McCarthy, with modifications by Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
rd <- DataFrame(gene_id = rownames(sc_example_counts),
feature_id = paste("feature", rep(1:500, each = 4), sep = "_"),
median_tx_length = rnorm(2000, mean = 5000, sd = 500),
other = sample(LETTERS, 2000, replace = TRUE)
)
rownames(rd) <- rownames(sc_example_counts)
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info, rowData = rd
)
example_sce <- normalize(example_sce)
plotExprsVsTxLength(example_sce, "median_tx_length")
plotExprsVsTxLength(example_sce, "median_tx_length", show_smooth = TRUE)
plotExprsVsTxLength(example_sce, "median_tx_length", show_smooth = TRUE,
colour_by = "other", show_exprs_sd = TRUE)
## using matrix of tx length values in assays(object)
mat <- matrix(rnorm(ncol(example_sce) * nrow(example_sce), mean = 5000,
sd = 500), nrow = nrow(example_sce))
dimnames(mat) <- dimnames(example_sce)
assay(example_sce, "tx_len") <- mat
plotExprsVsTxLength(example_sce, "tx_len", show_smooth = TRUE,
length_is_assay = TRUE, show_exprs_sd = TRUE)
## using a vector of tx length values
plotExprsVsTxLength(example_sce,
data.frame(rnorm(2000, mean = 5000, sd = 500)))
plotHeatmap()
Plot heatmap of gene expression values
Description
Create a heatmap of expression values for each cell and specified features in a SingleCellExperiment object.
Usage
plotHeatmap(object, features, columns = NULL,
exprs_values = "logcounts", center = FALSE, zlim = NULL,
symmetric = FALSE, color = NULL, colour_columns_by = NULL,
by_exprs_values = exprs_values, by_show_single = FALSE,
show_colnames = TRUE, ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
features | A character vector of row names, a logical vector of integer vector of indices specifying rows of object to show in the heatmap. |
columns | A vector specifying the subset of columns in object to show as columns in the heatmp. By default, all columns are used in their original order. |
exprs_values | A string or integer scalar indicating which assay of object should be used as expression values for colouring in the heatmap. |
center | A logical scalar indicating whether each row should have its mean expression centered at zero prior to plotting. |
zlim | A numeric vector of length 2, specifying the upper and lower bounds for the expression values. This winsorizes the expression matrix prior to plotting (but after centering, if center=TRUE ). If NULL , it defaults to the range of the expression matrix. |
symmetric | A logical scalar specifying whether the default zlim should be symmetric around zero. If TRUE , the maximum absolute value of zlim will be computed and multiplied by c(-1, 1) to redefine zlim . |
color | A vector of colours specifying the palette to use for mapping expression values to colours. This defaults to the default setting in pheatmap . |
colour_columns_by | A list of values specifying how the columns should be annotated with colours. Each entry of the list can be of the form described by ?" . A character vector can also be supplied and will be treated as a list of strings. |
by_exprs_values | A string or integer scalar specifying which assay to obtain expression values from, for colouring of column-level data - see ?" for details. |
by_show_single | Logical scalar specifying whether single-level factors should be used for column-level colouring, see ?" for details. |
show_colnames | Logical scalar specifying whether column names should be shown, if available in object . |
... | Additional arguments to pass to pheatmap . |
Details
Setting center=TRUE
is useful for examining log-fold changes of each cell's expression profile from the average across all cells.
This avoids issues with the entire row appearing a certain colour because the gene is highly/lowly expressed across all cells.
Setting zlim
preserves the dynamic range of colours in the presence of outliers.
Otherwise, the plot may be dominated by a few genes, which will flatten the observed colours for the rest of the heatmap.
Value
A heatmap is produced on the current graphics device.
The output of pheatmap
is invisibly returned.
Seealso
Author
Aaron Lun
Examples
example(normalizeSCE) # borrowing the example objects in here.
plotHeatmap(example_sce, features=rownames(example_sce)[1:10])
plotHeatmap(example_sce, features=rownames(example_sce)[1:10],
center=TRUE, symmetric=TRUE)
plotHeatmap(example_sce, features=rownames(example_sce)[1:10],
colour_columns_by=c("Mutation_Status", "Cell_Cycle"))
plotHighestExprs()
Plot the highest expressing features
Description
Plot the features with the highest average expression across all cells, along with their expression in each individual cell.
Usage
plotHighestExprs(object, n = 50, controls, colour_cells_by,
drop_features = NULL, exprs_values = "counts",
by_exprs_values = exprs_values, by_show_single = TRUE,
feature_names_to_plot = NULL, as_percentage = TRUE)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
n | A numeric scalar specifying the number of the most expressed features to show. |
controls | Specification of the row-level metadata column indicating whether a feature is a control, see ?" for possible values. Only metadata fields will be searched, assays will not be used. If not supplied, this defaults to "is_feature_control" or equivalent for compacted data. |
colour_cells_by | Specification of a column metadata field or a feature to colour by, see ?" for possible values. If not supplied, this defaults to "total_features_by_counts" or equivalent for compacted data. |
drop_features | A character, logical or numeric vector indicating which features (e.g. genes, transcripts) to drop when producing the plot. For example, spike-in transcripts might be dropped to examine the contribution from endogenous genes. |
exprs_values | A integer scalar or string specifying the assay to obtain expression values from. |
by_exprs_values | A string or integer scalar specifying which assay to obtain expression values from, for use in colouring - see ?" for details. |
by_show_single | Logical scalar specifying whether single-level factors should be used for colouring, see ?" for details. Default is NULL , in which case rownames(object) are used. |
feature_names_to_plot | Specification of which row-level metadata column contains the feature names, see ?" for possible values. |
as_percentage | logical scalar indicating whether percentages should be plotted. If FALSE , the raw exprs_values are shown instead. |
Details
This function will plot the percentage of counts accounted for by the top n
most highly expressed features across the dataset.
Each feature corresponds to a row on the plot, sorted by average expression (denoted by the point).
The plot will attempt to colour the points based on whether the corresponding feature is labelled as a control in object
.
This can be turned off by setting controls=NULL
.
The distribution of expression across all cells is shown as tick marks for each feature.
These ticks can be coloured according to cell-level metadata, as specified by colour_cells_by
.
Setting colour_cells_by=NULL
will disable all tick colouring.
Value
A ggplot object.
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- calculateQCMetrics(example_sce,
feature_controls = list(set1 = 1:500)
)
plotHighestExprs(example_sce, colour_cells_by ="total_features_by_counts")
plotHighestExprs(example_sce, controls = NULL)
plotHighestExprs(example_sce, colour_cells_by="Mutation_Status")
plotPlatePosition()
Plot cells in plate positions
Description
Plots cells in their position on a plate, coloured by metadata variables or feature expression values from a SingleCellExperiment object.
Usage
plotPlatePosition(object, plate_position = NULL, colour_by = NULL,
size_by = NULL, shape_by = NULL, by_exprs_values = "logcounts",
by_show_single = FALSE, add_legend = TRUE, theme_size = 24,
point_alpha = 0.6, point_size = 24)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
plate_position | A character vector specifying the plate position for each cell (e.g., A01, B12, and so on, where letter indicates row and number indicates column). If NULL , the function will attempt to extract this from object$plate_position . Alternatively, a list of two factors ( "row" and "column" ) can be supplied, specifying the row (capital letters) and column (integer) for each cell in object . |
colour_by | Specification of a column metadata field or a feature to colour by, see ?" for possible values. |
size_by | Specification of a column metadata field or a feature to size by, see ?" for possible values. |
shape_by | Specification of a column metadata field or a feature to shape by, see ?" for possible values. |
by_exprs_values | A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see ?" for details. |
by_show_single | Logical scalar specifying whether single-level factors should be used for point aesthetics, see ?" for details. |
add_legend | Logical scalar specifying whether a legend should be shown. |
theme_size | Numeric scalar, see ?" for details. |
point_alpha | Numeric scalar specifying the transparency of the points, see ?" for details. |
point_size | Numeric scalar specifying the size of the points, see ?" for details. |
Details
This function expects plate positions to be given in a charcter format where a letter indicates the row on the plate and a numeric value indicates the column.
Each cell has a plate position such as "A01", "B12", "K24" and so on.
From these plate positions, the row is extracted as the letter, and the column as the numeric part.
Alternatively, the row and column identities can be directly supplied by setting plate_position
as a list of two factors.
Value
A ggplot object.
Author
Davis McCarthy, with modifications by Aaron Lun
Examples
## prepare data
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
example_sce <- calculateQCMetrics(example_sce)
## define plate positions
example_sce$plate_position <- paste0(
rep(LETTERS[1:5], each = 8),
rep(formatC(1:8, width = 2, flag = "0"), 5)
)
## plot plate positions
plotPlatePosition(example_sce, colour_by = "Mutation_Status")
plotPlatePosition(example_sce, shape_by = "Treatment", colour_by = "Gene_0004")
plotPlatePosition(example_sce, shape_by = "Treatment", size_by = "Gene_0001",
colour_by = "Cell_Cycle")
plotRLE()
Plot a relative log expression (RLE) plot
Description
Produce a relative log expression (RLE) plot of one or more transformations of cell expression values.
Usage
plotRLE(object, exprs_values = "logcounts", exprs_logged = TRUE,
style = "minimal", legend = TRUE, ordering = NULL,
colour_by = NULL, by_exprs_values = exprs_values, ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
exprs_values | A string or integer scalar specifying the expression matrix in object to use. |
exprs_logged | A logical scalar indicating whether the expression matrix is already log-transformed. If not, a log2-transformation (+1) will be performed prior to plotting. |
style | String defining the boxplot style to use, either "minimal" (default) or "full" ; see Details. |
legend | Logical scalar specifying whether a legend should be shown. |
ordering | A vector specifying the ordering of cells in the RLE plot. This can be useful for arranging cells by experimental conditions or batches. |
colour_by | Specification of a column metadata field or a feature to colour by, see ?" for possible values. |
by_exprs_values | A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see ?" for details. |
... | further arguments passed to geom_boxplot when style="full" . |
Details
Relative log expression (RLE) plots are a powerful tool for visualising unwanted variation in high dimensional data. These plots were originally devised for gene expression data from microarrays but can also be used on single-cell expression data. RLE plots are particularly useful for assessing whether a procedure aimed at removing unwanted variation (e.g., scaling normalisation) has been successful.
If style is full , the usual ggplot2 boxplot is created for each cell. Here, the box shows the inter-quartile range and whiskers extend no more than 1.5 times the IQR from the hinge (the 25th or 75th percentile). Data beyond the whiskers are called outliers and are plotted individually. The median (50th percentile) is shown with a white bar. This approach is detailed and flexible, but can take a long time to plot for large datasets.
If style is minimal , a Tufte-style boxplot is created for each cell. Here, the median is shown with a circle, the IQR in a grey line, and whiskers (as defined above) for the plots are shown with coloured lines. No outliers are shown for this plot style. This approach is more succinct and faster for large numbers of cells.
Value
A ggplot object
Author
Davis McCarthy, with modifications by Aaron Lun
References
Gandolfo LC, Speed TP. RLE Plots: Visualising Unwanted Variation in High Dimensional Data. arXiv [stat.ME]. 2017. Available: http://arxiv.org/abs/1704.03590
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
plotRLE(example_sce, colour_by = "Mutation_Status", style = "minimal")
plotRLE(example_sce, colour_by = "Mutation_Status", style = "full",
outlier.alpha = 0.1, outlier.shape = 3, outlier.size = 0)
plotReducedDim()
Plot reduced dimensions
Description
Plot cell-level reduced dimension results stored in a SingleCellExperiment object.
Usage
plotReducedDim(object, use_dimred, ncomponents = 2, percentVar = NULL,
colour_by = NULL, shape_by = NULL, size_by = NULL,
by_exprs_values = "logcounts", by_show_single = FALSE,
text_by = NULL, text_size = 5, text_colour = "black", ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
use_dimred | A string or integer scalar indicating the reduced dimension result in reducedDims(object) to plot. |
ncomponents | A numeric scalar indicating the number of dimensions to plot, starting from the first dimension. Alternatively, a numeric vector specifying the dimensions to be plotted. |
percentVar | A numeric vector giving the proportion of variance in expression explained by each reduced dimension. Only expected to be used in PCA settings, e.g., in the plotPCA function. |
colour_by | Specification of a column metadata field or a feature to colour by, see ?" for possible values. |
shape_by | Specification of a column metadata field or a feature to shape by, see ?" for possible values. |
size_by | Specification of a column metadata field or a feature to size by, see ?" for possible values. |
by_exprs_values | A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see ?" for details. |
by_show_single | Logical scalar specifying whether single-level factors should be used for point aesthetics, see ?" for details. |
text_by | Specification of a column metadata field for which to add text - see ?" for possible values. This must refer to a categorical field, i.e., coercible into a factor. |
text_size | Numeric scalar specifying the size of added text. |
text_colour | String specifying the colour of the added text. |
... | Additional arguments for visualization, see ?" for details. |
Details
If ncomponents
is a scalar equal to 2, a scatterplot of the first two dimensions is produced.
If ncomponents
is greater than 2, a pairs plots for the top dimensions is produced.
Alternatively, if ncomponents
is a vector of length 2, a scatterplot of the two specified dimensions is produced.
If it is of length greater than 2, a pairs plot is produced containing all pairwise plots between the specified dimensions.
The text_by
option will add factor levels as labels onto the plot, placed at the median coordinate across all points in that level.
This is useful for annotating position-related metadata (e.g., clusters) when there are too many levels to distinguish by colour.
It is only available for scatterplots.
Value
A ggplot object
Author
Davis McCarthy, with modifications by Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
example_sce <- runPCA(example_sce, ncomponents=5)
plotReducedDim(example_sce, "PCA")
plotReducedDim(example_sce, "PCA", colour_by="Cell_Cycle")
plotReducedDim(example_sce, "PCA", colour_by="Gene_0001")
plotReducedDim(example_sce, "PCA", ncomponents=5)
plotReducedDim(example_sce, "PCA", ncomponents=5, colour_by="Cell_Cycle",
shape_by="Treatment")
plotRowData()
Plot row metadata
Description
Plot row-level (i.e., gene) metadata from a SingleCellExperiment object.
Usage
plotRowData(object, y, x = NULL, colour_by = NULL, shape_by = NULL,
size_by = NULL, by_exprs_values = "logcounts",
by_show_single = FALSE, ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object containing expression values and experimental information. |
y | Specification of the row-level metadata to show on the y-axis, see ?" for possible values. Note that only metadata fields will be searched, assays will not be used. |
x | Specification of the row-level metadata to show on the x-axis, see ?" for possible values. Again, only metadata fields will be searched, assays will not be used. |
colour_by | Specification of a row metadata field or a cell to colour by, see ?" for possible values. |
shape_by | Specification of a row metadata field or a cell to shape by, see ?" for possible values. |
size_by | Specification of a row metadata field or a cell to size by, see ?" for possible values. |
by_exprs_values | A string or integer scalar specifying which assay to obtain expression values from, for use in point aesthetics - see ?" for details. |
by_show_single | Logical scalar specifying whether single-level factors should be used for point aesthetics, see ?" for details. |
... | Additional arguments for visualization, see ?" for details. |
Details
If y
is continuous and x=NULL
, a violin plot is generated.
If x
is categorical, a grouped violin plot will be generated, with one violin for each level of x
.
If x
is continuous, a scatter plot will be generated.
If y
is categorical and x
is continuous, horizontal violin plots will be generated.
If x
is missing or categorical, rectangule plots will be generated where the area of a rectangle is proportional to the number of points for a combination of factors.
Note that plotFeatureData
is a synonym for plotRowData
.
This is an artifact of the transition from the old SCESet class, and will be deprecated in future releases.
Value
A ggplot object.
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- calculateQCMetrics(example_sce,
feature_controls = list(ERCC=1:40))
example_sce <- normalize(example_sce)
plotRowData(example_sce, y="n_cells_by_counts", x="log10_total_counts")
plotRowData(example_sce, y="n_cells_by_counts",
size_by ="log10_total_counts",
colour_by = "is_feature_control")
plotScater()
Plot an overview of expression for each cell
Description
Plot the relative proportion of the library size that is accounted for by the most highly expressed features for each cell in a SingleCellExperiment object.
Usage
plotScater(x, nfeatures = 500, exprs_values = "counts",
colour_by = NULL, by_exprs_values = exprs_values,
by_show_single = FALSE, block1 = NULL, block2 = NULL, ncol = 3,
line_width = 1.5, theme_size = 10)
Arguments
Argument | Description |
---|---|
x | A SingleCellExperiment object. |
nfeatures | Numeric scalar indicating the number of top-expressed features to show n the plot. |
exprs_values | String or integer scalar indicating which assay of object should be used to obtain the expression values for this plot. |
colour_by | Specification of a column metadata field or a feature to colour by, see ?" for possible values. The curve for each cell will be coloured according to this specification. |
by_exprs_values | A string or integer scalar specifying which assay to obtain expression values from, for use in line colouring - see ?" for details. |
by_show_single | Logical scalar specifying whether single-level factors should be used for line colouring, see ?" for details. |
block1 | Specification of a factor by which to separate the cells into blocks (separate panels) in the plot. This can be any type of value described in ?" for column-level metadata. Default is NULL , in which case there is no blocking. |
block2 | Same as block1 , providing another level of blocking. |
ncol | Number of columns to use for facet_wrap if only one block is defined. |
line_width | Numeric scalar specifying the line width. |
theme_size | Numeric scalar specifying the font size to use for the plotting theme. |
Details
For each cell, the features are ordered from most-expressed to least-expressed.
The cumulative proportion of the total expression for the cell is computed across the top nfeatures
features.
These plots can flag cells with a very high proportion of the library coming from a small number of features; such cells are likely to be problematic for downstream analyses.
Using the colour and blocking arguments can flag overall differences in cells under different experimental conditions or affected by different batch and other variables.
If only one of block1
and block2
are specified, each panel corresponds to a separate level of the specified blocking factor.
If both are specified, each panel corresponds to a combination of levels.
Value
a ggplot plot object
Author
Davis McCarthy, with modifications by Aaron Lun
Examples
## Set up an example SingleCellExperiment
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
plotScater(example_sce)
plotScater(example_sce, exprs_values = "counts", colour_by = "Cell_Cycle")
plotScater(example_sce, block1 = "Treatment", colour_by = "Cell_Cycle")
cpm(example_sce) <- calculateCPM(example_sce, use_size_factors = FALSE)
plotScater(example_sce, exprs_values = "cpm", block1 = "Treatment",
block2 = "Mutation_Status", colour_by = "Cell_Cycle")
plot_reddim()
Plot specific reduced dimensions
Description
Wrapper functions to create plots for specific types of reduced dimension results in a SingleCellExperiment object, or, if they are not already present, to calculate those results and then plot them.
Usage
plotPCASCE(object, ..., rerun = FALSE, ncomponents = 2,
run_args = list())
plotTSNE(object, ..., rerun = FALSE, ncomponents = 2,
run_args = list())
plotUMAP(object, ..., rerun = FALSE, ncomponents = 2,
run_args = list())
plotDiffusionMap(object, ..., rerun = FALSE, ncomponents = 2,
run_args = list())
plotMDS(object, ..., rerun = FALSE, ncomponents = 2,
run_args = list())
list(list("plotPCA"), list("SingleCellExperiment"))(object, ..., rerun = FALSE,
ncomponents = 2, run_args = list())
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
... | Additional arguments to pass to plotReducedDim . |
rerun | Logical, should the reduced dimensions be recomputed even if object contains an appropriately named set of results in the reducedDims slot? |
ncomponents | Numeric scalar indicating the number of dimensions components to (calculate and) plot. This can also be a numeric vector, see ? for details. |
run_args | Arguments to pass to runPCA , runTSNE , etc. |
Details
Each function will search the reducedDims
slot for an appropriately named set of results and pass those coordinates onto plotReducedDim
.
If the results are not present or rerun=TRUE
, they will be computed using the relevant run*
function.
The result name and run*
function for each plot*
function are:
"PCA"
andrunPCA
forplotPCA
"TSNE"
andrunTSNE
forplotTSNE
"DiffusionMap"
andrunDiffusionMap
forplotDiffusionMap
"MDS"
andrunMDS
for"plotMDS"
Users can specify arguments to therun*
functions viarun_args
.
If ncomponents
is a numeric vector, the maximum value will be used to determine the required number of dimensions to compute in the run*
functions.
However, only the specified dimensions in ncomponents
will be plotted.
Value
A ggplot object.
Seealso
runPCA
,
runDiffusionMap
,
runTSNE
,
runMDS
,
plotReducedDim
Author
Davis McCarthy, with modifications by Aaron Lun
Examples
## Set up an example SingleCellExperiment
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
## Examples plotting PC1 and PC2
plotPCA(example_sce)
plotPCA(example_sce, colour_by = "Cell_Cycle")
plotPCA(example_sce, colour_by = "Cell_Cycle", shape_by = "Treatment")
plotPCA(example_sce, colour_by = "Cell_Cycle", shape_by = "Treatment",
size_by = "Mutation_Status")
## Force legend to appear for shape:
example_subset <- example_sce[, example_sce$Treatment == "treat1"]
plotPCA(example_subset, colour_by = "Cell_Cycle", shape_by = "Treatment",
by_show_single = TRUE)
## Examples plotting more than 2 PCs
plotPCA(example_sce, ncomponents = 4, colour_by = "Treatment",
shape_by = "Mutation_Status")
## Same for TSNE:
plotTSNE(example_sce, run_args=list(perplexity = 10))
## Same for DiffusionMaps:
plotDiffusionMap(example_sce)
## Same for MDS plots:
plotMDS(example_sce)
readSparseCounts()
Read sparse count matrix from file
Description
Reads a sparse count matrix from file containing a dense tabular format.
Usage
readSparseCounts(file, sep = " ", quote = NULL, comment.char = "",
row.names = TRUE, col.names = TRUE, ignore.row = 0L,
skip.row = 0L, ignore.col = 0L, skip.col = 0L, chunk = 1000L)
Arguments
Argument | Description |
---|---|
file | A string containing a file path to a count table, or a connection object opened in read-only text mode. |
sep | A string specifying the delimiter between fields in file . |
quote | A string specifying the quote character, e.g., in column or row names. |
comment.char | A string specifying the comment character after which values are ignored. |
row.names | A logical scalar specifying whether row names are present. |
col.names | A logical scalar specifying whether column names are present. |
ignore.row | An integer scalar specifying the number of rows to ignore at the start of the file, before the column names. |
skip.row | An integer scalar specifying the number of rows to ignore at the start of the file, after the column names. |
ignore.col | An integer scalar specifying the number of columns to ignore at the start of the file, before the column names. |
skip.col | An integer scalar specifying the number of columns to ignore at the start of the file, after the column names. |
chunk | A integer scalar indicating the chunk size to use, i.e., number of rows to read at any one time. |
Details
This function provides a convenient method for reading dense arrays from flat files into a sparse matrix in memory.
Memory usage can be further improved by setting chunk
to a smaller positive value.
The ignore.*
and skip.*
parameters allow irrelevant rows or columns to be skipped.
Note that the distinction between the two parameters is only relevant when row.names=FALSE
(for skipping/ignoring columns) or col.names=FALSE
(for rows).
Value
A dgCMatrix containing double-precision values (usually counts) for each row (gene) and column (cell).
Seealso
Author
Aaron Lun
Examples
outfile <- tempfile()
write.table(data.frame(A=1:5, B=0, C=0:4, row.names=letters[1:5]),
file=outfile, col.names=NA, sep=" ", quote=FALSE)
readSparseCounts(outfile)
runDiffusionMap()
Create a diffusion map from cell-level data
Description
Produce a diffusion map for the cells, based on the data in a SingleCellExperiment object.
Usage
runDiffusionMap(object, ncomponents = 2, ntop = 500,
feature_set = NULL, exprs_values = "logcounts",
scale_features = TRUE, use_dimred = NULL, n_dimred = NULL, ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object |
ncomponents | Numeric scalar indicating the number of diffusion components to obtain. |
ntop | Numeric scalar specifying the number of most variable features to use for constructing the diffusion map. |
feature_set | Character vector of row names, a logical vector or a numeric vector of indices indicating a set of features to use to construct the diffusion map. This will override any ntop argument if specified. |
exprs_values | Integer scalar or string indicating which assay of object should be used to obtain the expression values for the calculations. |
scale_features | Logical scalar, should the expression values be standardised so that each feature has unit variance? |
use_dimred | String or integer scalar specifying the entry of reducedDims(object) to use as input to DiffusionMap . Default is to not use existing reduced dimension results. |
n_dimred | Integer scalar, number of dimensions of the reduced dimension slot to use when use_dimred is supplied. Defaults to all available dimensions. |
... | Additional arguments to pass to DiffusionMap . |
Details
The function DiffusionMap
is used internally to compute the diffusion map.
Setting use_dimred
allows users to easily construct a diffusion map from low-rank approximations of the original expression matrix (e.g., after PCA).
In such cases, arguments such as ntop
, feature_set
, exprs_values
and scale_features
will be ignored.
The behaviour of DiffusionMap
seems to be non-deterministic, in a manner that is not responsive to any set.seed
call.
The reason for this is unknown.
Value
A SingleCellExperiment object containing the coordinates of the first ncomponent
diffusion map components for each cell.
This is stored in the "DiffusionMap"
entry of the reducedDims
slot.
Seealso
Author
Aaron Lun, based on code by Davis McCarthy
References
Haghverdi L, Buettner F, Theis FJ. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics. 2015; doi:10.1093/bioinformatics/btv325
Examples
## Set up an example SingleCellExperiment
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
example_sce <- runDiffusionMap(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
runMDS()
Perform MDS on cell-level data
Description
Perform multi-dimensional scaling (MDS) on cells, based on the data in a SingleCellExperiment object.
Usage
runMDS(object, ncomponents = 2, ntop = 500, feature_set = NULL,
exprs_values = "logcounts", scale_features = TRUE,
use_dimred = NULL, n_dimred = NULL, method = "euclidean")
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
ncomponents | Numeric scalar indicating the number of MDS dimensions to obtain. |
ntop | Numeric scalar specifying the number of most variable features to use for MDS. |
feature_set | Character vector of row names, a logical vector or a numeric vector of indices indicating a set of features to use for MDS. This will override any ntop argument if specified. |
exprs_values | Integer scalar or string indicating which assay of object should be used to obtain the expression values for the calculations. |
scale_features | Logical scalar, should the expression values be standardised so that each feature has unit variance? |
use_dimred | String or integer scalar specifying the entry of reducedDims(object) to use as input to cmdscale . Default is to not use existing reduced dimension results. |
n_dimred | Integer scalar, number of dimensions of the reduced dimension slot to use when use_dimred is supplied. Defaults to all available dimensions. |
method | String specifying the type of distance to be computed between cells. |
Details
The function cmdscale
is used internally to compute the multidimensional scaling components to plot.
Setting use_dimred
allows users to easily perform MDS on low-rank approximations of the original expression matrix (e.g., after PCA).
In such cases, arguments such as ntop
, feature_set
, exprs_values
and scale_features
will be ignored.
Value
A SingleCellExperiment object containing the coordinates of the first ncomponent
MDS dimensions for each cell.
This is stored in the "MDS"
entry of the reducedDims
slot.
Seealso
Author
Aaron Lun, based on code by Davis McCarthy
Examples
## Set up an example SingleCellExperiment
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
example_sce <- runMDS(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
runPCA()
Perform PCA on cell-level data
Description
Perform a principal components analysis (PCA) on cells, based on the data in a SingleCellExperiment object.
Usage
list(list("runPCA"), list("SingleCellExperiment"))(x, ncomponents = 2,
method = NULL, ntop = 500, exprs_values = "logcounts",
feature_set = NULL, scale_features = TRUE, use_coldata = FALSE,
selected_variables = NULL, detect_outliers = FALSE,
BSPARAM = ExactParam(), BPPARAM = SerialParam())
Arguments
Argument | Description |
---|---|
x | A SingleCellExperiment object. |
ncomponents | Numeric scalar indicating the number of principal components to obtain. |
method | Deprecated, string specifying how the PCA should be performed. |
ntop | Numeric scalar specifying the number of most variable features to use for PCA. |
exprs_values | Integer scalar or string indicating which assay of object should be used to obtain the expression values for the calculations. |
feature_set | Character vector of row names, a logical vector or a numeric vector of indices indicating a set of features to use for PCA. This will override any ntop argument if specified. |
scale_features | Logical scalar, should the expression values be standardised so that each feature has unit variance? This will also remove features with standard deviations below 1e-8. |
use_coldata | Logical scalar specifying whether the column data should be used instead of expression values to perform PCA. |
selected_variables | List of strings or a character vector indicating which variables in colData(object) to use for PCA when use_coldata=TRUE . If a list, each entry can take the form described in ?" . |
detect_outliers | Logical scalar, should outliers be detected based on PCA coordinates generated from column-level metadata? |
BSPARAM | A BiocSingularParam object specifying which algorithm should be used to perform the PCA. |
BPPARAM | A BiocParallelParam object specifying whether the PCA should be parallelized. |
Details
The function prcomp
is used internally to do the PCA when method="prcomp"
.
Alternatively, the irlba package can be used, which performs a fast approximation of PCA through the prcomp_irlba
function.
This is especially useful for large, sparse matrices.
Note that prcomp_irlba
involves a random initialization, after which it converges towards the exact PCs.
This means that the result will change slightly across different runs.
For full reproducibility, users should call set.seed
prior to running runPCA
with method="irlba"
.
If use_coldata=TRUE
, PCA will be performed on column-level metadata instead of the gene expression matrix.
The selected_variables
defaults to a vector containing:
"pct_counts_top_100_features"
"total_features_by_counts"
"pct_counts_feature_control"
"total_features_feature_control"
"log10_total_counts_endogenous"
"log10_total_counts_feature_control"
This can be useful for identifying outliers cells based on QC metrics, especially when combined withdetect_outliers=TRUE
. If outlier identification is enabled, theoutlier
field of the outputcolData
will contain the identified outliers.
Value
A SingleCellExperiment object containing the first ncomponent
principal coordinates for each cell.
If use_coldata=FALSE
, this is stored in the "PCA"
entry of the reducedDims
slot.
Otherwise, it is stored in the "PCA_coldata"
entry.
The proportion of variance explained by each PC is stored as a numeric vector in the "percentVar"
attribute of the reduced dimension matrix.
Note that this will only be of length equal to ncomponents
when method
is not "prcomp"
.
This is because approximate PCA methods do not compute singular values for all components.
Seealso
Author
Aaron Lun, based on code by Davis McCarthy
Examples
## Set up an example SingleCellExperiment
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
example_sce <- runPCA(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
runTSNE()
Perform t-SNE on cell-level data
Description
Perform t-stochastic neighbour embedding (t-SNE) for the cells, based on the data in a SingleCellExperiment object.
Usage
runTSNE(object, ncomponents = 2, ntop = 500, feature_set = NULL,
exprs_values = "logcounts", scale_features = TRUE,
use_dimred = NULL, n_dimred = NULL, perplexity = min(50,
floor(ncol(object)/5)), pca = TRUE, initial_dims = 50,
normalize = TRUE, theta = 0.5, external_neighbors = FALSE,
BNPARAM = KmknnParam(), BPPARAM = SerialParam(), ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
ncomponents | Numeric scalar indicating the number of t-SNE dimensions to obtain. |
ntop | Numeric scalar specifying the number of most variable features to use for t-SNE. |
feature_set | Character vector of row names, a logical vector or a numeric vector of indices indicating a set of features to use for t-SNE. This will override any ntop argument if specified. |
exprs_values | Integer scalar or string indicating which assay of object should be used to obtain the expression values for the calculations. |
scale_features | Logical scalar, should the expression values be standardised so that each feature has unit variance? |
use_dimred | String or integer scalar specifying the entry of reducedDims(object) to use as input to Rtsne . Default is to not use existing reduced dimension results. |
n_dimred | Integer scalar, number of dimensions of the reduced dimension slot to use when use_dimred is supplied. Defaults to all available dimensions. |
perplexity | Numeric scalar defining the perplexity parameter, see ? for more details. |
pca | Logical scalar passed to Rtsne , indicating whether an initial PCA step should be performed. This is ignored if use_dimred is specified. |
initial_dims | Integer scalar passed to Rtsne , specifying the number of principal components to be retained if pca=TRUE . |
normalize | Logical scalar indicating if input values should be scaled for numerical precision, see normalize_input . |
theta | Numeric scalar specifying the approximation accuracy of the Barnes-Hut algorithm, see Rtsne for details. |
external_neighbors | Logical scalar indicating whether a nearest neighbors search should be computed externally with findKNN . |
BNPARAM | A BiocNeighborParam object specifying the neighbor search algorithm to use when external_neighbors=TRUE . |
BPPARAM | A BiocParallelParam object specifying how the neighbor search should be parallelized when external_neighbors=TRUE . |
... | Additional arguments to pass to Rtsne . |
Details
The function Rtsne
is used internally to compute the t-SNE.
Note that the algorithm is not deterministic, so different runs of the function will produce differing results.
Users are advised to test multiple random seeds, and then use set.seed
to set a random seed for replicable results.
The value of the perplexity
parameter can have a large effect on the results.
By default, the function will try to provide a reasonable setting, by scaling the perplexity with the number of cells until it reaches a maximum of 50.
However, it is often worthwhile to manually try multiple values to ensure that the conclusions are robust.
Setting use_dimred
allows users to easily perform t-SNE on low-rank approximations of the original expression matrix (e.g., after PCA).
In such cases, arguments such as ntop
, feature_set
, exprs_values
and scale_features
will be ignored.
If external_neighbors=TRUE
, the nearest neighbor search step is conducted using a different algorithm to that in the Rtsne
function.
This can be parallelized or approximate to achieve greater speed for large data sets.
The neighbor search results are then used for t-SNE via the Rtsne_neighbors
function.
Value
A SingleCellExperiment object containing the coordinates of the first ncomponent
t-SNE dimensions for each cell.
This is stored in the "TSNE"
entry of the reducedDims
slot.
Seealso
Author
Aaron Lun, based on code by Davis McCarthy
References
L.J.P. van der Maaten. Barnes-Hut-SNE. In Proceedings of the International Conference on Learning Representations, 2013.
Examples
## Set up an example SingleCellExperiment
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
example_sce <- runTSNE(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
runUMAP()
Perform UMAP on cell-level data
Description
Perform uniform manifold approximation and projection (UMAP) for the cells, based on the data in a SingleCellExperiment object.
Usage
runUMAP(object, ncomponents = 2, ntop = 500, feature_set = NULL,
exprs_values = "logcounts", scale_features = TRUE,
use_dimred = NULL, n_dimred = NULL, pca = 50, n_neighbors = 15,
external_neighbors = FALSE, BNPARAM = KmknnParam(),
BPPARAM = SerialParam(), ...)
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object. |
ncomponents | Numeric scalar indicating the number of UMAP dimensions to obtain. |
ntop | Numeric scalar specifying the number of most variable features to use for UMAP. |
feature_set | Character vector of row names, a logical vector or a numeric vector of indices indicating a set of features to use for UMAP. This will override any ntop argument if specified. |
exprs_values | Integer scalar or string indicating which assay of object should be used to obtain the expression values for the calculations. |
scale_features | Logical scalar, should the expression values be standardised so that each feature has unit variance? |
use_dimred | String or integer scalar specifying the entry of reducedDims(object) to use as input to Rtsne . Default is to not use existing reduced dimension results. |
n_dimred | Integer scalar, number of dimensions of the reduced dimension slot to use when use_dimred is supplied. Defaults to all available dimensions. |
pca | Integer scalar specifying how many PCs should be used as input into UMAP, if the PCA is to be recomputed on the subsetted expression matrix. Only used when code use_dimred=NULL , and if pca=NULL , no PCA is performed at all. |
n_neighbors | Integer scalar, number of nearest neighbors to identify when constructing the initial graph. |
external_neighbors | Logical scalar indicating whether a nearest neighbors search should be computed externally with findKNN . |
BNPARAM | A BiocNeighborParam object specifying the neighbor search algorithm to use when external_neighbors=TRUE . |
BPPARAM | A BiocParallelParam object specifying how the neighbor search should be parallelized when external_neighbors=TRUE . |
... | Additional arguments to pass to umap . |
Details
The function umap
is used internally to compute the UMAP.
Note that the algorithm is not deterministic, so different runs of the function will produce differing results.
Users are advised to test multiple random seeds, and then use set.seed
to set a random seed for replicable results.
Setting use_dimred
allows users to easily perform UMAP on low-rank approximations of the original expression matrix (e.g., after PCA).
In such cases, arguments such as ntop
, feature_set
, exprs_values
and scale_features
will be ignored.
If external_neighbors=TRUE
, the nearest neighbor search step is conducted using a different algorithm to that in the umap
function.
This can be parallelized or approximate to achieve greater speed for large data sets.
The neighbor search results are then used directly to create the UMAP embedding.
Value
A SingleCellExperiment object containing the coordinates of the first ncomponent
UMAP dimensions for each cell.
This is stored in the "UMAP"
entry of the reducedDims
slot.
Seealso
Author
Aaron Lun
References
McInnes L, Healy J (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv.
Examples
## Set up an example SingleCellExperiment
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)
example_sce <- runUMAP(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))
sc_example_cell_info()
Cell information for the small example single-cell counts dataset to demonstrate capabilities of scater
Description
This data.frame contains cell metadata information for the 40
cells included in the example counts
dataset included in the package.
Format
a data.frame instance, 1 row per cell.
Usage
sc_example_cell_info
Value
NULL, but makes aavailable a data frame with cell metadata
Author
Davis McCarthy, 2015-03-05
sc_example_counts()
A small example of single-cell counts dataset to demonstrate capabilities of scater
Description
This data set contains counts for 2000 genes for 40 cells. They are from a real experiment, but details have been anonymised.
Format
a matrix instance, 1 row per gene.
Usage
sc_example_counts
Value
NULL, but makes aavailable a matrix of count data
Author
Davis McCarthy, 2015-03-05
scater_package()
Single-cell analysis toolkit for expression in R
Description
scater provides a class and numerous functions for the quality control, normalisation and visualisation of single-cell RNA-seq expression data.
Details
In particular, scater provides easy generation of quality control metrics and simple functions to visualise quality control metrics and their relationships.
scater_plot_args()
General visualization parameters
Description
scater functions that plot points share a number of visualization parameters, which are described on this page.
Seealso
plotColData
,
plotRowData
,
plotReducedDim
,
plotExpression
,
plotPlatePosition
,
and most other plotting functions.
scater_vis_var()
Variable selection for visualization
Description
A number of scater functions accept a SingleCellExperiment object and extract (meta)data from it for use in a plot.
These values are then used on the x- or y-axes (e.g., plotColData
) or for tuning visual parameters, e.g.,
colour_by
, shape_by
, size_by
.
This page describes how the selection of these values can be controlled by the user,
by passing appropriate values to the arguments of the desired plotting function.
Seealso
plotColData
,
plotRowData
,
plotReducedDim
,
plotExpression
,
plotPlatePosition
,
and most other plotting functions.
sumCountsAcrossCells()
Sum counts across a set of cells
Description
Create a count matrix where counts for all cells in a set are summed together.
Usage
sumCountsAcrossCells(object, ids, exprs_values = "counts",
BPPARAM = SerialParam())
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object or a count matrix. |
ids | A factor specifying the set to which each cell in object belongs. |
exprs_values | A string or integer scalar specifying the assay of object containing counts, if object is a SingleCellExperiment. |
BPPARAM | A BiocParallelParam object specifying how summation should be parallelized. |
Details
This function provides a convenient method for aggregating counts across multiple columns for each feature. A typical application would be to sum counts across all cells in each cluster to obtain pseudo-bulk samples for further analysis.
Any NA
values in ids
are implicitly ignored and will not be considered or reported.
This may be useful, e.g., to remove undesirable cells by setting their entries in ids
to NA
.
Value
A count matrix where counts for all cells in the same set are summed together for each feature.
Author
Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info)
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)
out <- sumCountsAcrossCells(example_sce, ids)
dimnames(out)
sumCountsAcrossFeatures()
Sum counts across a feature set
Description
Create a count matrix where counts for all features in a set are summed together.
Usage
sumCountsAcrossFeatures(object, ids, exprs_values = "counts",
BPPARAM = SerialParam())
Arguments
Argument | Description |
---|---|
object | A SingleCellExperiment object or a count matrix. |
ids | A factor specifying the set to which each feature in object belongs. |
exprs_values | A string or integer scalar specifying the assay of object containing counts, if object is a SingleCellExperiment. |
BPPARAM | A BiocParallelParam object specifying whether summation should be parallelized. |
Details
This function provides a convenient method for aggregating counts across multiple rows for each cell.
For example, genes with multiple mapping locations in the reference will often manifest as multiple rows with distinct Ensembl/Entrez IDs.
These counts can be aggregated into a single feature by setting the shared identifier (usually the gene symbol) as ids
.
It is theoretically possible to aggregate transcript-level counts to gene-level counts with this function. However, it is often better to do so with dedicated functions (e.g., from the tximport or tximeta packages) that account for differences in length across isoforms.
Any NA
values in ids
are implicitly ignored and will not be considered or reported.
This may be useful, e.g., to remove undesirable feature sets by setting their entries in ids
to NA
.
Value
A count matrix where counts for all features in the same set are summed together within each cell.
Author
Aaron Lun
Examples
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts),
colData = sc_example_cell_info)
ids <- sample(LETTERS, nrow(example_sce), replace=TRUE)
out <- sumCountsAcrossFeatures(example_sce, ids)
dimnames(out)
toSingleCellExperiment()
Convert an SCESet object to a SingleCellExperiment object
Description
Convert an SCESet object produced with an older version of the package to a SingleCellExperiment object compatible with the current version.
Usage
updateSCESet(object)
toSingleCellExperiment(object)
Arguments
Argument | Description |
---|---|
object | an SCESet object to be updated |
Value
a SingleCellExperiment
object
Examples
updateSCESet(example_sceset)
toSingleCellExperiment(example_sceset)
uniquifyFeatureNames()
Make feature names unique
Description
Combine a user-interpretable feature name (e.g., gene symbol) with a standard identifier that is guaranteed to be unique and valid (e.g., Ensembl) for use as row names.
Usage
uniquifyFeatureNames(ID, names)
Arguments
Argument | Description |
---|---|
ID | A character vector of unique identifiers. |
names | A character vector of feature names. |
Details
This function will attempt to use names
if it is unique.
If not, it will append the _ID
to any non-unique value of names
.
Missing names
will be replaced entirely by ID
.
The output is guaranteed to be unique, assuming that ID
is also unique.
This can be directly used as the row names of a SingleCellExperiment object.
Value
A character vector of unique-ified feature names.
Author
Aaron Lun
Examples
uniquifyFeatureNames(
ID=paste0("ENSG0000000", 1:5),
names=c("A", NA, "B", "C", "A")
)