bioconductor v3.9.0 Monocle

Monocle performs differential expression and time-series analysis

Link to this section Summary

Functions

Branched expression analysis modeling (BEAM).

The CellDataSet class

Methods for the CellDataSet class

The CellType class

The CellTypeHierarchy class

Add a new cell type

Test for branch-dependent expression

Build a CellDataSet that splits cells among two branches

Compute the area between curves (ABC) for branch-dependent genes

Calculate the Instantaneous Log Ratio between two branches

Calibrate_per_cell_total_proposal

Get the matrix of pairwise distances between cells

Sets the matrix containing distances between each pair of cells used by Monocle during cell ordering. Not intended to be called directly.

Cluster cells into a specified number of groups based on .

Clusters genes by pseudotime trend.

Compare model fits

Calculate divergence times for branch-dependent genes

Detects genes above minimum threshold.

Helper function for parallel differential expression testing

Test genes for differential expression

Retrieve a table of values specifying the mean-variance relationship

Helper function to estimate dispersions

Function to calculate the size factor for the single-cell RNA-seq data

Find the most commonly occuring relative expression value in each cell

Export a monocle CellDataSet object to other popular single cell analysis toolkit.

Extract a linear ordering of cells from a PQ tree

Fits a model for each gene in a CellDataSet object.

Helper function for parallel VGAM fitting

Fit smooth spline curves and return the residuals matrix

Fit smooth spline curves and return the response matrix

Return the names of classic muscle genes

Import a seurat or scatter/scran CellDataSet object and convert it to a monocle cds.

Build a CellDataSet from the HSMMSingleCell package

Return a CellDataSet of classic muscle genes.

Build a CellDataSet from the data stored in inst/extdata directory.

Test genes for cell type-dependent expression

Multicore apply-like function for CellDataSet

Retrieves the minimum spanning tree generated by Monocle during cell ordering.

Set the minimum spanning tree generated by Monocle during cell ordering.

Creates a new CellDateSet object.

Classify cells according to a set of markers

Orders cells according to pseudotime.

Return an ordering for a P node in the PQ tree

Plots the minimum spanning tree on cells. This function is deprecated.

Plots clusters of cells .

Plots the minimum spanning tree on cells.

Plots kinetic clusters of genes.

Not sure we're ready to release this one quite yet: Plot the branch genes in pseduotime with separate branch curves

Plots the minimum spanning tree on cells.

Create a heatmap to demonstrate the bifurcation of gene expression along two branchs

Plot the branch genes in pseduotime with separate branch curves.

Plots expression for one or more genes as a function of pseudotime

Plots expression for one or more genes as a jittered, grouped points

Plots the number of cells expressing one or more genes as a barplot

Plots expression for one or more genes as a violin plot

Create a heatmap to demonstrate the bifurcation of gene expression along multiple branches

Create a kinetic curves to demonstrate the bifurcation of gene expression along multiple branches

Plots genes by mean vs. dispersion, highlighting those selected for ordering

Plots the percentage of variance explained by the each component based on PCA from the normalized expression data using the same procedure used in reduceDimension function.

Plots a pseudotime-ordered, row-centered heatmap

Plots the decision map of density clusters .

Recursively builds and returns a PQ tree for the MST

Compute a projection of a CellDataSet object into a lower dimensional space

Get the weights needed to lift cells back to high dimensional expression space.

Get the weights needed to lift cells back to high dimensional expression space.

Retrieves the the whitening matrix during independent component analysis.

Sets the the whitening matrix during independent component analysis.

Retrieves the coordinates of each cell in the reduced-dimensionality space generated by calls to reduceDimension.

Set embedding coordinates of each cell in a CellDataSet.

Get the whitened expression values for a CellDataSet.

Sets the whitened expression values for each cell prior to independent component analysis. Not intended to be called directly.

Transform relative expression values into absolute transcript counts.

Response values

Calculates response values.

Select the most cell type specific markers

Marks genes for clustering

Spike-in transcripts data.

Return a variance-stabilized matrix of expression values

Link to this section Functions

Branched expression analysis modeling (BEAM).

Description

Identify genes with branch-dependent expression. Branches in single-cell trajectories are generated by cell fate decisions in development and also arise when analyzing genetic, chemical, or environmental perturbations. Branch expression analysis modeling is a statistical approach for finding genes that are regulated in a manner that depends on the branch. Consider a progenitor cell that generates two distinct cell types. A single-cell trajectory that includes progenitor cells and both differentiated cell types will capture the "decision" as a branch point, with progenitors upstream of the branch and the differentiated cells positioned along distinct branches. These branches will be characterized by distinct gene expression programs. BEAM aims to find all genes that differ between the branches. Such "branch-dependent" genes can help identify the mechanism by which the fate decision is made. BEAM() Takes a CellDataSet and either a specified branch point, or a pair of trajectory outcomes (as States). If a branch point is provided, the function returns a dataframe of test results for dependence on that branch. If a pair of outcomes is provided, it returns test results for the branch that unifies those outcomes into a common path to the trajectory's root state. BEAM() compares two models with a likelihood ratio test for branch-dependent expression. The full model is the product of smooth Pseudotime and the Branch a cell is assigned to. The reduced model just includes Pseudotime. You can modify these to include arbitrary additional effects in the full or both models.

Usage

BEAM(cds, fullModelFormulaStr = "~sm.ns(Pseudotime, df = 3)*Branch",
  reducedModelFormulaStr = "~sm.ns(Pseudotime, df = 3)",
  branch_states = NULL, branch_point = 1, relative_expr = TRUE,
  branch_labels = NULL, verbose = FALSE, cores = 1, ...)

Arguments

ArgumentDescription
cdsa CellDataSet object upon which to perform this operation
fullModelFormulaStra formula string specifying the full model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.
reducedModelFormulaStra formula string specifying the reduced model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.
branch_statesids for the immediate branch branch which obtained from branch construction based on MST
branch_pointThe ID of the branch point to analyze. Can only be used when reduceDimension is called with method = "DDRTree".
relative_expra logic flag to determine whether or not the relative gene expression should be used
branch_labelsthe name for each branch, for example, "AT1" or "AT2"
verboseWhether to generate verbose output
coresthe number of cores to be used while testing each gene for differential expression
...additional arguments to be passed to differentialGeneTest

Value

a data frame containing the p values and q-values from the BEAM test, with one row per gene.

The CellDataSet class

Description

The main class used by Monocle to hold single cell expression data. CellDataSet extends the basic Bioconductor ExpressionSet class.

Details

This class is initialized from a matrix of expression values Methods that operate on CellDataSet objects constitute the basic Monocle workflow.

Link to this function

CellDataSet_methods()

Methods for the CellDataSet class

Description

Methods for the CellDataSet class

Usage

list(list("sizeFactors"), list("CellDataSet"))(object)
list(list("sizeFactors"), list("CellDataSet,numeric"))(object) <- value
list(list("estimateSizeFactors"), list("CellDataSet"))(object, locfunc = median, ...)
list(list("estimateDispersions"), list("CellDataSet"))(object, modelFormulaStr = "~ 1",
  relative_expr = TRUE, min_cells_detected = 1, remove_outliers = TRUE,
  cores = 1, ...)

Arguments

ArgumentDescription
objectThe CellDataSet object
valueA vector of size factors, with length equal to the cells in object
locfuncA function applied to the geometric-mean-scaled expression values to derive the size factor.
...Additional arguments to be passed to estimateSizeFactorsForMatrix
modelFormulaStrA model formula, passed as a string, specifying how to group the cells prior to estimated dispersion. The default groups all cells together.
relative_exprWhether to transform expression into relative values
min_cells_detectedOnly include genes detected above lowerDetectionLimit in at least this many cells in the dispersion calculation
remove_outliersWhether to remove outliers (using Cook's distance) when estimating dispersions
coresThe number of cores to use for computing dispersions

The CellType class

Description

Classifies cells using a criterion function.

Details

Classifies cells via a user-defined gating function. The gating function accepts as input the entire matrix of expression data from a CellDataSet , and return TRUE or FALSE for each cell in it, depending on whether each meets the criteria in the gating function

Link to this function

CellTypeHierarchy()

The CellTypeHierarchy class

Description

Classifies cells according to a hierarchy of types.

Details

Classifies cells according to a hierarchy of types via user-defined gating functions.

Add a new cell type

Description

adds a cell type to a pre-existing CellTypeHierarchy and produces a function that accepts expression data from a CellDataSet. When the function is called on a CellDataSet a boolean vector is returned that indicates whether each cell is or is not the cell type that was added by addCellType.

Usage

addCellType(cth, cell_type_name, classify_func,
  parent_cell_type_name = "root")

Arguments

ArgumentDescription
cthThe CellTypeHierarchy object
cell_type_nameThe name of the new cell type. Can't already exist in cth
classify_funcA function that returns true when a cell is of the new type
parent_cell_type_nameIf this cell type is a subtype of another, provide its name here

Test for branch-dependent expression

Description

Testing for branch-dependent expression with BEAM first involves constructing a CellDataSet that assigns each cell to a branch, and then performing a likelihood ratio test to see if the branch assignments significantly improves the fit over a null model that does not split the cells. branchTest() implements these two steps.

Usage

branchTest(cds, fullModelFormulaStr = "~sm.ns(Pseudotime, df = 3)*Branch",
  reducedModelFormulaStr = "~sm.ns(Pseudotime, df = 3)",
  branch_states = NULL, branch_point = 1, relative_expr = TRUE,
  cores = 1, branch_labels = NULL, verbose = FALSE, ...)

Arguments

ArgumentDescription
cdsa CellDataSet object upon which to perform this operation
fullModelFormulaStra formula string specifying the full model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.
reducedModelFormulaStra formula string specifying the reduced model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.
branch_statesstates corresponding to two branches
branch_pointThe ID of the branch point to analyze. Can only be used when reduceDimension is called with method = "DDRTree".
relative_expra logic flag to determine whether or not the relative gene expression should be used
coresthe number of cores to be used while testing each gene for differential expression
branch_labelsthe name for each branch, for example, AT1 or AT2
verboseWhether to show VGAM errors and warnings. Only valid for cores = 1.
...Additional arguments passed to differentialGeneTest

Value

a data frame containing the p values and q-values from the likelihood ratio tests on the parallel arrays of models.

Link to this function

buildBranchCellDataSet()

Build a CellDataSet that splits cells among two branches

Description

Analyzing branches with BEAM requires fitting two models to the expression data for each gene. The full model assigns each cell to one of the two outcomes of the branch, and the reduced model excludes this assignment. buildBranchBranchCellDataSet() takes a CellDataSet object and returns a version where the cells are assigned to one of two branches. The branch for each cell is encoded in a new column, "Branch", in the pData table in the returned CellDataSet.

Usage

buildBranchCellDataSet(cds, progenitor_method = c("sequential_split",
  "duplicate"), branch_states = NULL, branch_point = 1,
  branch_labels = NULL, stretch = TRUE)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment
progenitor_methodThe method to use for dealing with the cells prior to the branch
branch_statesThe states for two branching branches
branch_pointThe ID of the branch point to analyze. Can only be used when reduceDimension is called with reduction_method .
branch_labelsThe names for each branching branch
stretchA logical flag to determine whether or not the pseudotime trajectory for each branch should be stretched to the same range or not

Value

a CellDataSet with the duplicated cells and stretched branches

Compute the area between curves (ABC) for branch-dependent genes

Description

This function is used to calculate the ABC score based on the the nature spline curves fitted for each branch. ABC score is used to quantify the total magnitude of divergence between two branchs. By default, the ABC score is the area between two fitted spline curves. The ABC score can be used to rank gene divergence. When coupled with p-val calculated from the branchTest, it can be used to identify potential major regulators for branch bifurcation.

Usage

calABCs(cds, trend_formula = "~sm.ns(Pseudotime, df = 3)*Branch",
  branch_point = 1, trajectory_states = NULL, relative_expr = TRUE,
  stretch = TRUE, cores = 1, verbose = F, min_expr = 0.5,
  integer_expression = FALSE, num = 5000, branch_labels = NULL, ...)

Arguments

ArgumentDescription
cdsa CellDataSet object upon which to perform this operation
trend_formulaa formula string specifying the full model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.
branch_pointthe point where two branches diverge
trajectory_statesStates corresponding to two branches
relative_expra logic flag to determine whether or not the relative gene expression should be used
stretcha logic flag to determine whether or not each branch should be stretched
coresthe number of cores to be used while testing each gene for differential expression
verbosea logic flag to determine whether or not we should output detailed running information
min_exprthe lower limit for the expressed gene
integer_expressionthe logic flag to determine whether or not the integer numbers are used for calculating the ABCs. Default is False.
numnumber of points on the fitted branch trajectories used for calculating the ABCs. Default is 5000.
branch_labelsthe name for each branch, for example, AT1 or AT2
...Additional arguments passed to buildBranchCellDataSet

Value

a data frame containing the ABCs (Area under curves) score as the first column and other meta information from fData

Calculate the Instantaneous Log Ratio between two branches

Description

This function is used to calculate the Instant Log Ratio between two branches which can be used to prepare the heatmap demonstrating the branch gene expression divergence hirearchy. If "stretch" is specifified, each branch will be firstly stretched into maturation level from 0-100. Since the results when we use "stretching" are always better and IRLs for non-stretched spline curves are often mismatched, we may only turn down "non-stretch" functionality in future versions. Then, we fit two separate nature spline curves for each individual linages. The log-ratios of the value on each spline curve corresponding to each branch are calculated, which can be used as a measure for the magnitude of divergence between two branching branchs.

Usage

calILRs(cds, trend_formula = "~sm.ns(Pseudotime, df = 3)*Branch",
  branch_point = 1, trajectory_states = NULL, relative_expr = TRUE,
  stretch = TRUE, cores = 1, ILRs_limit = 3, label_by_short_name = TRUE,
  useVST = FALSE, round_exprs = FALSE, output_type = "all",
  branch_labels = NULL, file = NULL, return_all = F, verbose = FALSE,
  ...)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment
trend_formulatrend_formula a formula string specifying the full model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.
branch_pointthe point where two branches diverge
trajectory_statesstates corresponding to two branches
relative_exprA logic flag to determine whether or not the relative expressed should be used when we fitting the spline curves
stretcha logic flag to determine whether or not each branch should be stretched
coresNumber of cores when fitting the spline curves
ILRs_limitthe minimum Instant Log Ratio used to make the heatmap plot
label_by_short_namelabel the rows of the returned matrix by gene_short_name (TRUE) or feature id (FALSE)
useVSTA logic flag to determine whether or not the Variance Stablization Transformation should be used to stablize the gene expression. When VST is used, the difference between two branchs are used instead of the log-ratio.
round_exprsA logic flag to determine whether or not the expression value should be rounded into integer
output_typeA character either of "all" or "after_bifurcation". If "after_bifurcation" is used, only the time points after the bifurcation point will be selected
branch_labelsthe name for each branch, for example, AT1 or AT2
filethe name for storing the data. Since the calculation of the Instant Log Ratio is very time consuming, so by default the result will be stored
return_allA logic flag to determine whether or not all the results from the analysis should be returned, this includes a dataframe for the log fold change, normalized log fold change, raw divergence, normalized divergence, fitting curves for each branch
verboseWhether or not detailed running information should be returned
...Additional arguments passed to buildBranchCellDataSet

Value

a ggplot2 plot object

Link to this function

calibrate_per_cell_total_proposal()

Calibrate_per_cell_total_proposal

Description

Calibrate_per_cell_total_proposal

Usage

calibrate_per_cell_total_proposal(relative_exprs_matrix, t_estimate,
  expected_capture_rate, method = c("num_genes", "tpm_fraction"))

Arguments

ArgumentDescription
relative_exprs_matrixThe matrix of relative TPM expression values
t_estimatethe TPM value that corresponds to 1 cDNA copy per cell
expected_capture_rateThe fraction of mRNAs captured as cDNAs
methodthe formula to estimate the total mRNAs (num_genes corresponds to the second formula while tpm_fraction corresponds to the first formula, see the anouncement on Trapnell lab website for the Census paper)
Link to this function

cellPairwiseDistances()

Get the matrix of pairwise distances between cells

Description

Retrieves a matrix capturing distances between each cell used during cell ordering.

Usage

cellPairwiseDistances(cds)

Arguments

ArgumentDescription
cdsexpression data matrix for an experiment

Value

A square, symmetric matrix containing the distances between each cell in the reduced-dimensionality space.

Examples

D <- cellPairwiseDistances(HSMM)
Link to this function

cellPairwiseDistances_set()

Sets the matrix containing distances between each pair of cells used by Monocle during cell ordering. Not intended to be called directly.

Description

Sets the matrix containing distances between each pair of cells used by Monocle during cell ordering. Not intended to be called directly.

Usage

cellPairwiseDistances(cds) <- value

Arguments

ArgumentDescription
cdsA CellDataSet object.
valuea square, symmetric matrix containing pairwise distances between cells.

Value

An updated CellDataSet object

Examples

cds <- cellPairwiseDistances(D)

Cluster cells into a specified number of groups based on .

Description

Unsupervised clustering of cells is a common step in many single-cell expression workflows. In an experiment containing a mixture of cell types, each cluster might correspond to a different cell type. This method takes a CellDataSet as input along with a requested number of clusters, clusters them with an unsupervised algorithm (by default, density peak clustering), and then returns the CellDataSet with the cluster assignments stored in the pData table. When number of clusters is set to NULL (num_clusters = NULL), the decision plot as introduced in the reference will be plotted and the users are required to check the decision plot to select the rho and delta to determine the number of clusters to cluster. When the dataset is big, for example > 50 k, we recommend the user to use the Louvain clustering algorithm which is inspired from phenograph paper. Note Louvain doesn't support the num_cluster argument but the k (number of k-nearest neighbors) is relevant to the final clustering number. The implementation of Louvain clustering is based on the Rphenograph package but updated based on our requirement (for example, changed the jaccard_coeff function as well as adding louvain_iter argument, etc.)

Usage

clusterCells(cds, skip_rho_sigma = F, num_clusters = NULL,
  inspect_rho_sigma = F, rho_threshold = NULL, delta_threshold = NULL,
  peaks = NULL, gaussian = T, cell_type_hierarchy = NULL,
  frequency_thresh = NULL, enrichment_thresh = NULL,
  clustering_genes = NULL, k = 50, louvain_iter = 1, weight = FALSE,
  method = c("densityPeak", "louvain", "DDRTree"), verbose = F, ...)

Arguments

ArgumentDescription
cdsthe CellDataSet upon which to perform this operation
skip_rho_sigmaA logic flag to determine whether or not you want to skip the calculation of rho / sigma
num_clustersNumber of clusters. The algorithm use 0.5 of the rho as the threshold of rho and the delta corresponding to the number_clusters sample with the highest delta as the density peaks and for assigning clusters
inspect_rho_sigmaA logical flag to determine whether or not you want to interactively select the rho and sigma for assigning up clusters
rho_thresholdThe threshold of local density (rho) used to select the density peaks
delta_thresholdThe threshold of local distance (delta) used to select the density peaks
peaksA numeric vector indicates the index of density peaks used for clustering. This vector should be retrieved from the decision plot with caution. No checking involved. will automatically calculated based on the top num_cluster product of rho and sigma.
gaussianA logic flag passed to densityClust function in desnityClust package to determine whether or not Gaussian kernel will be used for calculating the local density
cell_type_hierarchyA data structure used for organizing functions that can be used for organizing cells
frequency_threshWhen a CellTypeHierarchy is provided, cluster cells will impute cell types in clusters that are composed of at least this much of exactly one cell type.
enrichment_threshfraction to be multipled by each cell type percentage. Only used if frequency_thresh is NULL, both cannot be NULL
clustering_genesa vector of feature ids (from the CellDataSet's featureData) used for ordering cells
knumber of kNN used in creating the k nearest neighbor graph for Louvain clustering. The number of kNN is related to the resolution of the clustering result, bigger number of kNN gives low resolution and vice versa. Default to be 50
louvain_iternumber of iterations used for Louvain clustering. The clustering result gives the largest modularity score will be used as the final clustering result. Default to be 1.
weightA logic argument to determine whether or not we will use Jaccard coefficent for two nearest neighbors (based on the overlapping of their kNN) as the weight used for Louvain clustering. Default to be FALSE.
methodmethod for clustering cells. Three methods are available, including densityPeak, louvian and DDRTree. By default, we use density peak clustering algorithm for clustering. For big datasets (like data with 50 k cells or so), we recommend using the louvain clustering algorithm.
verboseVerbose A logic flag to determine whether or not we should print the running details.
...Additional arguments passed to densityClust

Value

an updated CellDataSet object, in which phenoData contains values for Cluster for each cell

References

Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. doi:10.1126/science.1242072

Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre: Fast unfolding of communities in large networks. J. Stat. Mech. (2008) P10008

Jacob H. Levine and et.al. Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis. Cell, 2015.

Clusters genes by pseudotime trend.

Description

This function takes a matrix of expression values and performs k-means clustering on the genes.

Usage

clusterGenes(expr_matrix, k, method = function(x) {     as.dist((1 -
  cor(Matrix::t(x)))/2) }, ...)

Arguments

ArgumentDescription
expr_matrixA matrix of expression values to cluster together. Rows are genes, columns are cells.
kHow many clusters to create
methodThe distance function to use during clustering
...Extra parameters to pass to pam() during clustering

Value

a pam cluster object

Examples

full_model_fits <- fitModel(HSMM[sample(nrow(fData(HSMM_filtered)), 100),],
modelFormulaStr="~sm.ns(Pseudotime)")
expression_curve_matrix <- responseMatrix(full_model_fits)
clusters <- clusterGenes(expression_curve_matrix, k=4)
plot_clusters(HSMM_filtered[ordering_genes,], clusters)
Link to this function

compareModels()

Compare model fits

Description

Performs likelihood ratio tests on nested vector generalized additive models

Usage

compareModels(full_models, reduced_models)

Arguments

ArgumentDescription
full_modelsa list of models, e.g. as returned by fitModels(), forming the numerators of the L.R.Ts.
reduced_modelsa list of models, e.g. as returned by fitModels(), forming the denominators of the L.R.Ts.

Value

a data frame containing the p values and q-values from the likelihood ratio tests on the parallel arrays of models.

Link to this function

detectBifurcationPoint()

Calculate divergence times for branch-dependent genes

Description

Branch-dependent genes may diverge at different points in pseudotime. detectBifurcationPoint() calculates these times. Although the branch times will be shaped by and distributed around the branch point in the trajectory, upstream regulators tend to branch earlier in pseudotime than their targets.

Usage

detectBifurcationPoint(str_log_df = NULL, ILRs_threshold = 0.1,
  detect_all = T, cds = cds, Branch = "Branch", branch_point = NULL,
  branch_states = c(2, 3), stretch = T, cores = 1,
  trend_formula = "~sm.ns(Pseudotime, df = 3)", ILRs_limit = 3,
  relative_expr = TRUE, label_by_short_name = TRUE, useVST = FALSE,
  round_exprs = FALSE, output_type = "all", return_cross_point = T,
  file = "bifurcation_heatmap", verbose = FALSE, ...)

Arguments

ArgumentDescription
str_log_dfthe ILRs dataframe calculated from calILRs function. If this data.frame is provided, all the following parameters are ignored. Note that we need to only use the ILRs after the bifurcation point if we duplicated the progenitor cell state.
ILRs_thresholdthe ILR value used to determine the earliest divergence time point
detect_alla logic flag to determine whether or not genes without ILRs pass the threshold will still report a bifurcation point
cdsCellDataSet for the experiment
BranchThe column in pData used for calculating the ILRs (If not equal to "Branch", a warning will report)
branch_pointThe ID of the branch point to analyze. Can only be used when reduceDimension is called with method = "DDRTree".
branch_statesThe states for two branching branchs
stretcha logic flag to determine whether or not each branch should be stretched
coresNumber of cores when fitting the spline curves
trend_formulathe model formula to be used for fitting the expression trend over pseudotime
ILRs_limitthe minimum Instant Log Ratio used to make the heatmap plot
relative_exprA logic flag to determine whether or not the relative expressed should be used when we fitting the spline curves
label_by_short_namelabel the rows of the returned matrix by gene_short_name (TRUE) or feature id (FALSE)
useVSTA logic flag to determine whether or not the Variance Stablization Transformation should be used to stablize the gene expression. When VST is used, the difference between two branchs are used instead of the log-ratio.
round_exprsA logic flag to determine whether or not the expression value should be rounded into integer
output_typeA character either of "all" or "after_bifurcation". If "after_bifurcation" is used, only the time points after the bifurcation point will be selected. Note that, if Branch is set to "Branch", we will only use "after_bifurcation" since we duplicated the progenitor cells and the bifurcation should only happen after the largest mature level from the progenitor cells
return_cross_pointA logic flag to determine whether or not only return the cross point
filethe name for storing the data. Since the calculation of the Instant Log Ratio is very time consuming, so by default the result will be stored
verboseWhether to report verbose output
...Additional arguments passed to calILRs

Value

a vector containing the time for the bifurcation point with gene names for each value

Detects genes above minimum threshold.

Description

Sets the global expression detection threshold to be used with this CellDataSet. Counts how many cells each feature in a CellDataSet object that are detectably expressed above a minimum threshold. Also counts the number of genes above this threshold are detectable in each cell.

Usage

detectGenes(cds, min_expr = NULL)

Arguments

ArgumentDescription
cdsthe CellDataSet upon which to perform this operation
min_exprthe expression threshold

Value

an updated CellDataSet object

Examples

HSMM <- detectGenes(HSMM, min_expr=0.1)
Link to this function

diff_test_helper()

Helper function for parallel differential expression testing

Description

test

Usage

diff_test_helper(x, fullModelFormulaStr, reducedModelFormulaStr,
  expressionFamily, relative_expr, weights, disp_func = NULL,
  verbose = FALSE)

Arguments

ArgumentDescription
xtest
fullModelFormulaStra formula string specifying the full model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.
reducedModelFormulaStra formula string specifying the reduced model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.
expressionFamilyspecifies the VGAM family function used for expression responses
relative_exprWhether to transform expression into relative values
weightstest
disp_functest
verboseWhether to show VGAM errors and warnings. Only valid for cores = 1.
Link to this function

differentialGeneTest()

Test genes for differential expression

Description

Tests each gene for differential expression as a function of pseudotime or according to other covariates as specified. differentialGeneTest is Monocle's main differential analysis routine. It accepts a CellDataSet and two model formulae as input, which specify generalized lineage models as implemented by the VGAM package.

Usage

differentialGeneTest(cds, fullModelFormulaStr = "~sm.ns(Pseudotime, df=3)",
  reducedModelFormulaStr = "~1", relative_expr = TRUE, cores = 1,
  verbose = FALSE)

Arguments

ArgumentDescription
cdsa CellDataSet object upon which to perform this operation
fullModelFormulaStra formula string specifying the full model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.
reducedModelFormulaStra formula string specifying the reduced model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.
relative_exprWhether to transform expression into relative values.
coresthe number of cores to be used while testing each gene for differential expression.
verboseWhether to show VGAM errors and warnings. Only valid for cores = 1.

Value

a data frame containing the p values and q-values from the likelihood ratio tests on the parallel arrays of models.

Seealso

vglm

Link to this function

dispersionTable()

Retrieve a table of values specifying the mean-variance relationship

Description

Calling estimateDispersions computes a smooth function describing how variance in each gene's expression across cells varies according to the mean. This function only works for CellDataSet objects containing count-based expression data, either transcripts or reads.

Usage

dispersionTable(cds)

Arguments

ArgumentDescription
cdsThe CellDataSet from which to extract a dispersion table.

Value

A data frame containing the empirical mean expression, empirical dispersion, and the value estimated by the dispersion model.

Link to this function

estimateDispersionsForCellDataSet()

Helper function to estimate dispersions

Description

Helper function to estimate dispersions

Usage

estimateDispersionsForCellDataSet(cds, modelFormulaStr, relative_expr,
  min_cells_detected, removeOutliers, verbose = FALSE)

Arguments

ArgumentDescription
cdsa CellDataSet that contains all cells user wants evaluated
modelFormulaStra formula string specifying the model to fit for the genes.
relative_exprWhether to transform expression into relative values
min_cells_detectedOnly include genes detected above lowerDetectionLimit in at least this many cells in the dispersion calculation
removeOutliersa boolean it determines whether or not outliers from the data should be removed
verboseWhether to show detailed running information.
Link to this function

estimateSizeFactorsForMatrix()

Function to calculate the size factor for the single-cell RNA-seq data

@importFrom stats median

Description

Function to calculate the size factor for the single-cell RNA-seq data

@importFrom stats median

Usage

estimateSizeFactorsForMatrix(counts, locfunc = median, round_exprs = TRUE,
  method = "mean-geometric-mean-total")

Arguments

ArgumentDescription
countsThe matrix for the gene expression data, either read counts or FPKM values or transcript counts
locfuncThe location function used to find the representive value
round_exprsA logic flag to determine whether or not the expression value should be rounded
methodA character to specify the size factor calculation appraoches. It can be either "mean-geometric-mean-total" (default), "weighted-median", "median-geometric-mean", "median", "mode", "geometric-mean-total".

Find the most commonly occuring relative expression value in each cell

Description

Converting relative expression values to mRNA copies per cell requires knowing the most commonly occuring relative expression value in each cell This value typically corresponds to an RPC value of 1. This function finds the most commonly occuring (log-transformed) relative expression value for each column in the provided expression matrix.

Usage

estimate_t(relative_expr_matrix, relative_expr_thresh = 0.1)

Arguments

ArgumentDescription
relative_expr_matrixa matrix of relative expression values for values with each row and column representing genes/isoforms and cells, respectively. Row and column names should be included. Expression values should not be log-transformed.
relative_expr_threshRelative expression values below this threshold are considered zero.

Details

This function estimates the most abundant relative expression value (t^) using a gaussian kernel density function. It can also optionally output the t^ based on a two gaussian mixture model based on the smsn.mixture from mixsmsn package

Value

an vector of most abundant relative_expr value corresponding to the RPC 1.

Examples

HSMM_fpkm_matrix <- exprs(HSMM)
t_estimate = estimate_t(HSMM_fpkm_matrix)

Export a monocle CellDataSet object to other popular single cell analysis toolkit.

Description

This function takes a monocle CellDataSet and converts it to another type of object used in another popular single cell analysis toolkit. It currently supports Scran and Seurat packages.

Usage

exportCDS(monocle_cds, export_to = c("Seurat", "Scater"),
  export_all = FALSE)

Arguments

ArgumentDescription
monocle_cdsthe Monocle CellDataSet you would like to export into a type used in another package
export_tothe object type you would like to export to, either Seurat or Scater
export_allWhether or not to export all the slots in Monocle and keep in another object type. Default is FALSE (or only keep minimal dataset). If export_all is setted to be true, the original monocle cds will be keeped in the other cds object too. This argument is also only applicable when export_to is Seurat.

Value

a new object in the format of another package, as described in the export_to argument.

Examples

lung <- load_lung()
seurat_lung <- exportCDS(lung)
seurat_lung_all <- exportCDS(lung, export_all = T)
scater_lung <- exportCDS(lung, export_to = 'Scater')
scater_lung_all <- exportCDS(lung, export_to = 'Scater', export_all = T)
Link to this function

extract_good_branched_ordering()

Extract a linear ordering of cells from a PQ tree

Description

Extract a linear ordering of cells from a PQ tree

Usage

extract_good_branched_ordering(orig_pq_tree, curr_node, dist_matrix,
  num_branches, reverse_main_path = FALSE)

Arguments

ArgumentDescription
orig_pq_treeThe PQ object to use for ordering
curr_nodeThe node in the PQ tree to use as the start of ordering
dist_matrixA symmetric matrix containing pairwise distances between cells
num_branchesThe number of outcomes allowed in the trajectory.
reverse_main_pathWhether to reverse the direction of the trajectory

Fits a model for each gene in a CellDataSet object.

Description

This function fits a vector generalized additive model (VGAM) from the VGAM package for each gene in a CellDataSet. By default, expression levels are modeled as smooth functions of the Pseudotime value of each cell. That is, expression is a function of progress through the biological process. More complicated formulae can be provided to account for additional covariates (e.g. day collected, genotype of cells, media conditions, etc).

Usage

fitModel(cds, modelFormulaStr = "~sm.ns(Pseudotime, df=3)",
  relative_expr = TRUE, cores = 1)

Arguments

ArgumentDescription
cdsthe CellDataSet upon which to perform this operation
modelFormulaStra formula string specifying the model to fit for the genes.
relative_exprWhether to fit a model to relative or absolute expression. Only meaningful for count-based expression data. If TRUE, counts are normalized by Size_Factor prior to fitting.
coresthe number of processor cores to be used during fitting.

Details

This function fits a vector generalized additive model (VGAM) from the VGAM package for each gene in a CellDataSet. By default, expression levels are modeled as smooth functions of the Pseudotime value of each cell. That is, expression is a function of progress through the biological process. More complicated formulae can be provided to account for additional covariates (e.g. day collected, genotype of cells, media conditions, etc).

Value

a list of VGAM model objects

Link to this function

fit_model_helper()

Helper function for parallel VGAM fitting

Description

test

Usage

fit_model_helper(x, modelFormulaStr, expressionFamily, relative_expr,
  disp_func = NULL, verbose = FALSE, ...)

Arguments

ArgumentDescription
xtest
modelFormulaStra formula string specifying the model to fit for the genes.
expressionFamilyspecifies the VGAM family function used for expression responses
relative_exprWhether to transform expression into relative values
disp_functest
verboseWhether to show VGAM errors and warnings. Only valid for cores = 1.
...test
Link to this function

genSmoothCurveResiduals()

Fit smooth spline curves and return the residuals matrix

Description

This function will fit smooth spline curves for the gene expression dynamics along pseudotime in a gene-wise manner and return the corresponding residuals matrix. This function is build on other functions (fit_models and residualsMatrix)

Usage

genSmoothCurveResiduals(cds, trend_formula = "~sm.ns(Pseudotime, df = 3)",
  relative_expr = T, residual_type = "response", cores = 1)

Arguments

ArgumentDescription
cdsa CellDataSet object upon which to perform this operation
trend_formulaa formula string specifying the model formula used in fitting the spline curve for each gene/feature.
relative_expra logic flag to determine whether or not the relative gene expression should be used
residual_typethe response desired, as accepted by VGAM's predict function
coresthe number of cores to be used while testing each gene for differential expression

Value

a data frame containing the data for the fitted spline curves.

Link to this function

genSmoothCurves()

Fit smooth spline curves and return the response matrix

Description

This function will fit smooth spline curves for the gene expression dynamics along pseudotime in a gene-wise manner and return the corresponding response matrix. This function is build on other functions (fit_models and responseMatrix) and used in calILRs and calABCs functions

Usage

genSmoothCurves(cds, new_data, trend_formula = "~sm.ns(Pseudotime, df = 3)",
  relative_expr = T, response_type = "response", cores = 1)

Arguments

ArgumentDescription
cdsa CellDataSet object upon which to perform this operation
new_dataa data.frame object including columns (for example, Pseudotime) with names specified in the model formula. The values in the data.frame should be consist with the corresponding values from cds object.
trend_formulaa formula string specifying the model formula used in fitting the spline curve for each gene/feature.
relative_expra logic flag to determine whether or not the relative gene expression should be used
response_typethe response desired, as accepted by VGAM's predict function
coresthe number of cores to be used while testing each gene for differential expression

Value

a data frame containing the data for the fitted spline curves.

Link to this function

get_classic_muscle_markers()

Return the names of classic muscle genes

Description

Returns a list of classic muscle genes. Used to add conveinence for loading HSMM data.

Usage

get_classic_muscle_markers()

Import a seurat or scatter/scran CellDataSet object and convert it to a monocle cds.

Description

This function takes a monocle CellDataSet and converts it to another type of object used in another popular single cell analysis toolkit. It currently supports Scran and Seurat packages.

Usage

importCDS(otherCDS, import_all = FALSE)

Arguments

ArgumentDescription
otherCDSthe object you would like to convert into a monocle cds
import_allWhether or not to import all the slots in seurat or scatter. Default is FALSE (or only keep minimal dataset).

Value

a new monocle cell dataset object converted from other objects (Scatter or Seurat).

Examples

lung <- load_lung()
seurat_lung <- exportCDS(lung)
seurat_lung_all <- exportCDS(lung, export_all = T)
scater_lung <- exportCDS(lung, export_to = 'Scater')
scater_lung_all <- exportCDS(lung, export_to = 'Scater', export_all = T)

importCDS(seurat_lung)
importCDS(seurat_lung, import_all = T)
importCDS(seurat_lung_all)
importCDS(seurat_lung_all, import_all = T)
importCDS(scater_lung)
importCDS(scater_lung, import_all = T)
importCDS(scater_lung_all)
importCDS(scater_lung_all, import_all = T)

Build a CellDataSet from the HSMMSingleCell package

Description

Creates a cellDataSet using the data from the HSMMSingleCell package.

Usage

load_HSMM()
Link to this function

load_HSMM_markers()

Return a CellDataSet of classic muscle genes.

Description

Return a CellDataSet of classic muscle genes.

Usage

load_HSMM_markers()

Value

A CellDataSet object

Build a CellDataSet from the data stored in inst/extdata directory.

Description

Build a CellDataSet from the data stored in inst/extdata directory.

Usage

load_lung()
Link to this function

markerDiffTable()

Test genes for cell type-dependent expression

Description

takes a CellDataSet and a CellTypeHierarchy and classifies all cells into types passed functions passed into the CellTypeHierarchy. The function will remove all "Unknown" and "Ambiguous" types before identifying genes that are differentially expressed between types.

Usage

markerDiffTable(cds, cth, residualModelFormulaStr = "~1", balanced = FALSE,
  reclassify_cells = TRUE, remove_ambig = TRUE, remove_unknown = TRUE,
  verbose = FALSE, cores = 1)

Arguments

ArgumentDescription
cdsA CellDataSet object containing cells to classify
cthThe CellTypeHierarchy object to use for classification
residualModelFormulaStrA model formula string specify effects you want to exclude when testing for cell type dependent expression
balancedWhether to downsample the cells so that there's an equal number of each type prior to performing the test
reclassify_cellsa boolean that indicates whether or not the cds and cth should be run through classifyCells again
remove_ambiga boolean that indicates whether or not ambiguous cells should be removed the cds
remove_unknowna boolean that indicates whether or not unknown cells should be removed from the cds
verboseWhether to emit verbose output during the the search for cell-type dependent genes
coresThe number of cores to use when testing

Value

A table of differential expression test results

Multicore apply-like function for CellDataSet

Description

mcesApply computes the row-wise or column-wise results of FUN, just like esApply. Variables in pData from X are available in FUN.

Usage

mcesApply(X, MARGIN, FUN, required_packages, cores = 1,
  convert_to_dense = TRUE, ...)

Arguments

ArgumentDescription
Xa CellDataSet object
MARGINThe margin to apply to, either 1 for rows (samples) or 2 for columns (features)
FUNAny function
required_packagesA list of packages FUN will need. Failing to provide packages needed by FUN will generate errors in worker threads.
coresThe number of cores to use for evaluation
convert_to_denseWhether to force conversion a sparse matrix to a dense one before calling FUN
...Additional parameters for FUN

Value

The result of with(pData(X) apply(exprs(X)), MARGIN, FUN, ...))

Link to this function

minSpanningTree()

Retrieves the minimum spanning tree generated by Monocle during cell ordering.

Description

Retrieves the minimum spanning tree (MST) that Monocle constructs during orderCells(). This MST is mostly used in plot_spanning_tree to help assess the accuracy of Monocle's ordering.

Usage

minSpanningTree(cds)

Arguments

ArgumentDescription
cdsexpression data matrix for an experiment

Value

An igraph object representing the CellDataSet's minimum spanning tree.

Examples

T <- minSpanningTree(HSMM)
Link to this function

minSpanningTree_set()

Set the minimum spanning tree generated by Monocle during cell ordering.

Description

Sets the minimum spanning tree used by Monocle during cell ordering. Not intended to be called directly.

Usage

minSpanningTree(cds) <- value

Arguments

ArgumentDescription
cdsA CellDataSet object.
valuean igraph object describing the minimum spanning tree.

Value

An updated CellDataSet object

Examples

cds <- minSpanningTree(T)
Link to this function

newCellDataSet()

Creates a new CellDateSet object.

Description

Creates a new CellDateSet object.

Usage

newCellDataSet(cellData, phenoData = NULL, featureData = NULL,
  lowerDetectionLimit = 0.1, expressionFamily = VGAM::negbinomial.size())

Arguments

ArgumentDescription
cellDataexpression data matrix for an experiment
phenoDatadata frame containing attributes of individual cells
featureDatadata frame containing attributes of features (e.g. genes)
lowerDetectionLimitthe minimum expression level that consistitutes true expression
expressionFamilythe VGAM family function to be used for expression response variables

Value

a new CellDataSet object

Examples

sample_sheet_small <- read.delim("../data/sample_sheet_small.txt", row.names=1)
sample_sheet_small$Time <- as.factor(sample_sheet_small$Time)
gene_annotations_small <- read.delim("../data/gene_annotations_small.txt", row.names=1)
fpkm_matrix_small <- read.delim("../data/fpkm_matrix_small.txt")
pd <- new("AnnotatedDataFrame", data = sample_sheet_small)
fd <- new("AnnotatedDataFrame", data = gene_annotations_small)
HSMM <- new("CellDataSet", exprs = as.matrix(fpkm_matrix_small), phenoData = pd, featureData = fd)
Link to this function

newCellTypeHierarchy()

Classify cells according to a set of markers

Description

Creates a CellTypeHierarchy object which can store cell types with the addCellType() function. When classifyCells is used with a CellDataSet and a CellTypeHierarchy cells in the CellDataSet can be classified as cell types found in the CellTypeHierarchy

classifyCells accepts a cellDataSet and and a cellTypeHierarchy. Each cell in the cellDataSet is checked against the functions in the cellTypeHierarchy to determine each cell's type

Usage

newCellTypeHierarchy()
classifyCells(cds, cth, frequency_thresh = NULL, enrichment_thresh = NULL,
  ...)
calculateMarkerSpecificity(cds, cth, remove_ambig = TRUE,
  remove_unknown = TRUE)

Arguments

ArgumentDescription
cdsThe CelllDataSet you want to classify
cthCellTypeHierarchy
frequency_threshIf at least this fraction of group of cells meet a cell types marker criteria, impute them all to be of that type.
enrichment_threshfraction to be multipled by each cell type percentage. Only used if frequency_thresh is NULL, both cannot be NULL
...character strings that you wish to pass to dplyr's groupby routine
remove_ambiga boolean that determines if ambiguous cells should be removed
remove_unknowna boolean that determines whether unknown cells should be removed

Details

CellTypeHierarchy objects are Monocle's mechanism for classifying cells into types based on known markers. To classify the cells in a CellDataSet object according to known markers, first construct a CellTypeHierachy with newCellTypeHierarchy() and addCellType() and then provide both the CellDataSet and the CellTypeHierachy to classifyCells() . Each call to addCellType() registers a classification function that accepts the expression data from a CellDataSet object as input, and returns a boolean vector indicating whether each cell is of the given type. When you call classifyCells() , each cell will be checked against the classification functions in the CellTypeHierachy . If you wish to make a cell type a subtype of another that's already been registered with a CellTypeHierarchy object, make that one the "parent" type with the cell_type_name argument. If you want two types to be mutually exclusive, make them "siblings" by giving them the same parent. The classifcation functions in a CellTypeHierarchy must take a single argument, a matrix of expression values, as input. Note that this matrix could either be a sparseMatrix or a dense matrix. Explicitly casting the input to a dense matrix inside a classification function is likely to drastically slow down classifyCells and other routines that use CellTypeHierarhcy objects. Successive calls to addCellType build up a tree of classification functions inside a CellTypeHierarchy. When two functions are siblings in the tree, classifyCells expects that a cell will meet the classification criteria for at most one of them. For example, you might place classification functions for T cells and B cells as siblings, because a cell cannot be both of these at the same time. When a cell meets the criteria for more than one function, it will be tagged as "Ambiguous". If classifyCells reports a large number of ambiguous cells, consider adjusting your classification functions. For example, some cells are defined by very high expression of a key gene that is expressed at lower levels in other cell types. Raising the threshold for this gene in a classification could resolve the ambiguities. A classification function can also have child functions. You can use this to specify subtypes of cells. For example, T cells express the gene CD3, and there are many subtypes. You can encode each subset by first adding a general T cell classification function that recognizes CD3, and then adding an additional function that recognizes CD4 (for CD4+ helper T cells), one for CD8 (to identify CD8+ cytotoxic T cells), and so on. classifyCells will aim to assign each cell to its most specific subtype in the "CellType" column. By default, classifyCells applies the classification functions to individual cells, but you can also apply it to cells in a "grouped" mode to impute the type of cells that are missing expression of your known markers. You can specify additional (quoted) grouping variables to classifyCells . The function will group the cells according to these factors, and then classify the cells. It will compute the frequency of each cell type in each group, and if a cell type is present at the frquency specified in frequency_thresh , all the cells in the group are classified as that type. If group contains more one cell type at this frequency, all the cells are marked "Ambiguous". This allows you to impute cell type based on unsupervised clustering results (e.g. with clusterCells ) or some other grouping criteria.

Value

newCellTypeHierarchy and addCellType both return an updated CellTypeHierarchy object. classifyCells returns an updated CellDataSet with a new column, "CellType", in the pData table.

For a CellDataset with N genes, and a CellTypeHierarchy with k types, returns a dataframe with N x k rows. Each row contains a gene and a specifity score for one of the types.

Examples

# Initialize a new CellTypeHierachy

# Register a set of classification functions. There are multiple types of T cells
# A cell cannot be both a B cell and a T cell, a T cell and a Monocyte, or
# a B cell and a Monocyte.
cth <- newCellTypeHierarchy()

cth <- addCellType(cth, "T cell",
classify_func=function(x) {x["CD3D",] > 0})

cth <- addCellType(cth, "CD4+ T cell",
classify_func=function(x) {x["CD4",] > 0},
parent_cell_type_name = "T cell")

cth <- addCellType(cth, "CD8+ T cell",
classify_func=function(x) {
|x["CD8A",] > 0 | x["CD8B",] > 0|
},
parent_cell_type_name = "T cell")

cth <- addCellType(cth, "B cell",
classify_func=function(x) {x["MS4A1",] > 0})

cth <- addCellType(cth, "Monocyte",
classify_func=function(x) {x["CD14",] > 0})

# Classify each cell in the CellDataSet "mix" according to these types
mix <- classifyCells(mix, cth)

# Group the cells by the pData table column "Cluster". Apply the classification
functions to the cells groupwise. If a group is at least 5% of a type, make
them all that type. If the group is 5% one type, and 5% a different, mutually
exclusive type, mark the whole cluster "Ambiguous"
mix <- classifyCells(mix, Cluster, 0.05)

Orders cells according to pseudotime.

Description

Learns a "trajectory" describing the biological process the cells are going through, and calculates where each cell falls within that trajectory. Monocle learns trajectories in two steps. The first step is reducing the dimensionality of the data with reduceDimension . The second is this function. function. This function takes as input a CellDataSet and returns it with two new columns: Pseudotime and State , which together encode where each cell maps to the trajectory. orderCells() optionally takes a "root" state, which you can use to specify the start of the trajectory. If you don't provide a root state, one is selected arbitrarily.

Usage

orderCells(cds, root_state = NULL, num_paths = NULL, reverse = NULL)

Arguments

ArgumentDescription
cdsthe CellDataSet upon which to perform this operation
root_stateThe state to use as the root of the trajectory. You must already have called orderCells() once to use this argument.
num_pathsthe number of end-point cell states to allow in the biological process.
reversewhether to reverse the beginning and end points of the learned biological process.

Details

The reduction_method argument to reduceDimension determines which algorithm is used by orderCells() to learn the trajectory. If reduction_method == "ICA" , this function uses polygonal reconstruction to learn the underlying trajectory. If reduction_method == "DDRTree" , the trajectory is specified by the principal graph learned by the DDRTree function.

Whichever algorithm you use, the trajectory will be composed of segments. The cells from a segment will share the same value of State . One of these segments will be selected as the root of the trajectory arbitrarily. The most distal cell on that segment will be chosen as the "first" cell in the trajectory, and will have a Pseudotime value of zero. orderCells() will then "walk" along the trajectory, and as it encounters additional cells, it will assign them increasingly large values of Pseudotime.

Value

an updated CellDataSet object, in which phenoData contains values for State and Pseudotime for each cell

Return an ordering for a P node in the PQ tree

Description

Return an ordering for a P node in the PQ tree

Usage

order_p_node(q_level_list, dist_matrix)

Arguments

ArgumentDescription
q_level_listA list of Q nodes in the PQ tree
dist_matrixA symmetric matrix of pairwise distances between cells
Link to this function

package_deprecated()

Plots the minimum spanning tree on cells. This function is deprecated.

Description

This function arranges all of the cells in the cds in a tree and predicts their location based on their pseudotime value

Usage

plot_spanning_tree(cds, x = 1, y = 2, color_by = "State",
  show_tree = TRUE, show_backbone = TRUE, backbone_color = "black",
  markers = NULL, show_cell_names = FALSE, cell_size = 1.5,
  cell_link_size = 0.75, cell_name_size = 2, show_branch_points = TRUE)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment
xthe column of reducedDimS(cds) to plot on the horizontal axis
ythe column of reducedDimS(cds) to plot on the vertical axis
color_bythe cell attribute (e.g. the column of pData(cds)) to map to each cell's color
show_treewhether to show the links between cells connected in the minimum spanning tree
show_backbonewhether to show the diameter path of the MST used to order the cells
backbone_colorthe color used to render the backbone.
markersa gene name or gene id to use for setting the size of each cell in the plot
show_cell_namesdraw the name of each cell in the plot
cell_sizeThe size of the point for each cell
cell_link_sizeThe size of the line segments connecting cells (when used with ICA) or the principal graph (when used with DDRTree)
cell_name_sizethe size of cell name labels
show_branch_pointsWhether to show icons for each branch point (only available when reduceDimension was called with DDRTree)

Value

a ggplot2 plot object

Seealso

plot_cell_trajectory

Examples

library(HSMMSingleCell)
HSMM <- load_HSMM()
plot_cell_trajectory(HSMM)
plot_cell_trajectory(HSMM, color_by="Pseudotime", show_backbone=FALSE)
plot_cell_trajectory(HSMM, markers="MYH3")
Link to this function

plot_cell_clusters()

Plots clusters of cells .

Description

Plots clusters of cells .

Usage

plot_cell_clusters(cds, x = 1, y = 2, color_by = "Cluster",
  markers = NULL, show_cell_names = FALSE, cell_size = 1.5,
  cell_name_size = 2, ...)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment
xthe column of reducedDimS(cds) to plot on the horizontal axis
ythe column of reducedDimS(cds) to plot on the vertical axis
color_bythe cell attribute (e.g. the column of pData(cds)) to map to each cell's color
markersa gene name or gene id to use for setting the size of each cell in the plot
show_cell_namesdraw the name of each cell in the plot
cell_sizeThe size of the point for each cell
cell_name_sizethe size of cell name labels
...additional arguments passed into the scale_color_viridis function

Value

a ggplot2 plot object

Examples

library(HSMMSingleCell)
HSMM <- load_HSMM()
HSMM <- reduceD
plot_cell_clusters(HSMM)
plot_cell_clusters(HSMM, color_by="Pseudotime")
plot_cell_clusters(HSMM, markers="MYH3")
Link to this function

plot_cell_trajectory()

Plots the minimum spanning tree on cells.

Description

Plots the minimum spanning tree on cells.

Usage

plot_cell_trajectory(cds, x = 1, y = 2, color_by = "State",
  show_tree = TRUE, show_backbone = TRUE, backbone_color = "black",
  markers = NULL, use_color_gradient = FALSE, markers_linear = FALSE,
  show_cell_names = FALSE, show_state_number = FALSE, cell_size = 1.5,
  cell_link_size = 0.75, cell_name_size = 2, state_number_size = 2.9,
  show_branch_points = TRUE, theta = 0, ...)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment
xthe column of reducedDimS(cds) to plot on the horizontal axis
ythe column of reducedDimS(cds) to plot on the vertical axis
color_bythe cell attribute (e.g. the column of pData(cds)) to map to each cell's color
show_treewhether to show the links between cells connected in the minimum spanning tree
show_backbonewhether to show the diameter path of the MST used to order the cells
backbone_colorthe color used to render the backbone.
markersa gene name or gene id to use for setting the size of each cell in the plot
use_color_gradientWhether or not to use color gradient instead of cell size to show marker expression level
markers_lineara boolean used to indicate whether you want to scale the markers logarithimically or linearly
show_cell_namesdraw the name of each cell in the plot
show_state_numbershow state number
cell_sizeThe size of the point for each cell
cell_link_sizeThe size of the line segments connecting cells (when used with ICA) or the principal graph (when used with DDRTree)
cell_name_sizethe size of cell name labels
state_number_sizethe size of the state number
show_branch_pointsWhether to show icons for each branch point (only available when reduceDimension was called with DDRTree)
thetaHow many degrees you want to rotate the trajectory
...Additional arguments passed into scale_color_viridis function

Value

a ggplot2 plot object

Examples

lung <- load_lung()
plot_cell_trajectory(lung)
plot_cell_trajectory(lung, color_by="Pseudotime", show_backbone=FALSE)
plot_cell_trajectory(lung, markers="MYH3")
Link to this function

plot_clusters()

Plots kinetic clusters of genes.

Description

returns a ggplot2 object showing the shapes of the expression patterns followed by a set of pre-selected genes. The topographic lines highlight the distributions of the kinetic patterns relative to overall trend lines.

Usage

plot_clusters(cds, clustering, drawSummary = TRUE, sumFun = mean_cl_boot,
  ncol = NULL, nrow = NULL, row_samples = NULL, callout_ids = NULL)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment
clusteringa clustering object produced by clusterCells
drawSummarywhether to draw the summary line for each cluster
sumFunwhether the function used to generate the summary for each cluster
ncolnumber of columns used to layout the faceted cluster panels
nrownumber of columns used to layout the faceted cluster panels
row_sampleshow many genes to randomly select from the data
callout_idsa vector of gene names or gene ids to manually render as part of the plot

Value

a ggplot2 plot object

Examples

full_model_fits <- fitModel(HSMM_filtered[sample(nrow(fData(HSMM_filtered)), 100),],
modelFormulaStr="~VGAM::bs(Pseudotime)")
expression_curve_matrix <- responseMatrix(full_model_fits)
clusters <- clusterGenes(expression_curve_matrix, k=4)
plot_clusters(HSMM_filtered[ordering_genes,], clusters)
Link to this function

plot_coexpression_matrix()

Not sure we're ready to release this one quite yet: Plot the branch genes in pseduotime with separate branch curves

Description

Not sure we're ready to release this one quite yet: Plot the branch genes in pseduotime with separate branch curves

Usage

plot_coexpression_matrix(cds, rowgenes, colgenes, relative_expr = TRUE,
  min_expr = NULL, cell_size = 0.85, label_by_short_name = TRUE,
  show_density = TRUE, round_expr = FALSE)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment
rowgenesGene ids or short names to be arrayed on the vertical axis.
colgenesGene ids or short names to be arrayed on the horizontal axis
relative_exprWhether to transform expression into relative values
min_exprThe minimum level of expression to show in the plot
cell_sizeA number how large the cells should be in the plot
label_by_short_namea boolean that indicates whether cells should be labeled by their short name
show_densitya boolean that indicates whether a 2D density estimation should be shown in the plot
round_expra boolean that indicates whether cds_expr values should be rounded or not

Value

a ggplot2 plot object

Link to this function

plot_complex_cell_trajectory()

Plots the minimum spanning tree on cells.

Description

Plots the minimum spanning tree on cells.

Usage

plot_complex_cell_trajectory(cds, x = 1, y = 2, root_states = NULL,
  color_by = "State", show_tree = TRUE, show_backbone = TRUE,
  backbone_color = "black", markers = NULL, show_cell_names = FALSE,
  cell_size = 1.5, cell_link_size = 0.75, cell_name_size = 2,
  show_branch_points = TRUE, ...)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment
xthe column of reducedDimS(cds) to plot on the horizontal axis
ythe column of reducedDimS(cds) to plot on the vertical axis
root_statesthe state used to set as the root of the graph
color_bythe cell attribute (e.g. the column of pData(cds)) to map to each cell's color
show_treewhether to show the links between cells connected in the minimum spanning tree
show_backbonewhether to show the diameter path of the MST used to order the cells
backbone_colorthe color used to render the backbone.
markersa gene name or gene id to use for setting the size of each cell in the plot
show_cell_namesdraw the name of each cell in the plot
cell_sizeThe size of the point for each cell
cell_link_sizeThe size of the line segments connecting cells (when used with ICA) or the principal graph (when used with DDRTree)
cell_name_sizethe size of cell name labels
show_branch_pointsWhether to show icons for each branch point (only available when reduceDimension was called with DDRTree)
...Additional arguments passed to the scale_color_viridis function

Value

a ggplot2 plot object

Examples

library(HSMMSingleCell)
HSMM <- load_HSMM()
plot_complex_cell_trajectory(HSMM)
plot_complex_cell_trajectory(HSMM, color_by="Pseudotime", show_backbone=FALSE)
plot_complex_cell_trajectory(HSMM, markers="MYH3")
Link to this function

plot_genes_branched_heatmap()

Create a heatmap to demonstrate the bifurcation of gene expression along two branchs

@description returns a heatmap that shows changes in both lineages at the same time. It also requires that you choose a branch point to inspect. Columns are points in pseudotime, rows are genes, and the beginning of pseudotime is in the middle of the heatmap. As you read from the middle of the heatmap to the right, you are following one lineage through pseudotime. As you read left, the other. The genes are clustered hierarchically, so you can visualize modules of genes that have similar lineage-dependent expression patterns.

Description

Create a heatmap to demonstrate the bifurcation of gene expression along two branchs

@description returns a heatmap that shows changes in both lineages at the same time. It also requires that you choose a branch point to inspect. Columns are points in pseudotime, rows are genes, and the beginning of pseudotime is in the middle of the heatmap. As you read from the middle of the heatmap to the right, you are following one lineage through pseudotime. As you read left, the other. The genes are clustered hierarchically, so you can visualize modules of genes that have similar lineage-dependent expression patterns.

Usage

plot_genes_branched_heatmap(cds_subset, branch_point = 1,
  branch_states = NULL, branch_labels = c("Cell fate 1", "Cell fate 2"),
  cluster_rows = TRUE, hclust_method = "ward.D2", num_clusters = 6,
  hmcols = NULL, branch_colors = c("#979797", "#F05662", "#7990C8"),
  add_annotation_row = NULL, add_annotation_col = NULL,
  show_rownames = FALSE, use_gene_short_name = TRUE, scale_max = 3,
  scale_min = -3, norm_method = c("log", "vstExprs"),
  trend_formula = "~sm.ns(Pseudotime, df=3) * Branch",
  return_heatmap = FALSE, cores = 1, ...)

Arguments

ArgumentDescription
cds_subsetCellDataSet for the experiment (normally only the branching genes detected with branchTest)
branch_pointThe ID of the branch point to visualize. Can only be used when reduceDimension is called with method = "DDRTree".
branch_statesThe two states to compare in the heatmap. Mutually exclusive with branch_point.
branch_labelsThe labels for the branchs.
cluster_rowsWhether to cluster the rows of the heatmap.
hclust_methodThe method used by pheatmap to perform hirearchical clustering of the rows.
num_clustersNumber of clusters for the heatmap of branch genes
hmcolsThe color scheme for drawing the heatmap.
branch_colorsThe colors used in the annotation strip indicating the pre- and post-branch cells.
add_annotation_rowAdditional annotations to show for each row in the heatmap. Must be a dataframe with one row for each row in the fData table of cds_subset, with matching IDs.
add_annotation_colAdditional annotations to show for each column in the heatmap. Must be a dataframe with one row for each cell in the pData table of cds_subset, with matching IDs.
show_rownamesWhether to show the names for each row in the table.
use_gene_short_nameWhether to use the short names for each row. If FALSE, uses row IDs from the fData table.
scale_maxThe maximum value (in standard deviations) to show in the heatmap. Values larger than this are set to the max.
scale_minThe minimum value (in standard deviations) to show in the heatmap. Values smaller than this are set to the min.
norm_methodDetermines how to transform expression values prior to rendering
trend_formulaA formula string specifying the model used in fitting the spline curve for each gene/feature.
return_heatmapWhether to return the pheatmap object to the user.
coresNumber of cores to use when smoothing the expression curves shown in the heatmap.
...Additional arguments passed to buildBranchCellDataSet

Value

A list of heatmap_matrix (expression matrix for the branch committment), ph (pheatmap heatmap object), annotation_row (annotation data.frame for the row), annotation_col (annotation data.frame for the column).

Link to this function

plot_genes_branched_pseudotime()

Plot the branch genes in pseduotime with separate branch curves.

Description

Works similarly to plot_genes_in_psuedotime esceptit shows one kinetic trend for each lineage.

Usage

plot_genes_branched_pseudotime(cds, branch_states = NULL, branch_point = 1,
  branch_labels = NULL, method = "fitting", min_expr = NULL,
  cell_size = 0.75, nrow = NULL, ncol = 1, panel_order = NULL,
  color_by = "State", expression_curve_linetype_by = "Branch",
  trend_formula = "~ sm.ns(Pseudotime, df=3) * Branch",
  reducedModelFormulaStr = NULL, label_by_short_name = TRUE,
  relative_expr = TRUE, ...)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment
branch_statesThe states for two branching branchs
branch_pointThe ID of the branch point to analyze. Can only be used when reduceDimension is called with method = "DDRTree".
branch_labelsThe names for each branching branch
methodThe method to draw the curve for the gene expression branching pattern, either loess ('loess') or VGLM fitting ('fitting')
min_exprThe minimum (untransformed) expression level to use in plotted the genes.
cell_sizeThe size (in points) of each cell used in the plot
nrowNumber of columns used to layout the faceted cluster panels
ncolNumber of columns used to layout the faceted cluster panels
panel_orderThe a character vector of gene short names (or IDs, if that's what you're using), specifying order in which genes should be layed out (left-to-right, top-to-bottom)
color_byThe cell attribute (e.g. the column of pData(cds)) to be used to color each cell
expression_curve_linetype_byThe cell attribute (e.g. the column of pData(cds)) to be used for the linetype of each branch curve
trend_formulaThe model formula to be used for fitting the expression trend over pseudotime
reducedModelFormulaStrA formula specifying a null model. If used, the plot shows a p value from the likelihood ratio test that uses trend_formula as the full model
label_by_short_nameWhether to label figure panels by gene_short_name (TRUE) or feature id (FALSE)
relative_exprWhether or not the plot should use relative expression values (only relevant for CellDataSets using transcript counts)
...Additional arguments passed on to branchTest. Only used when reducedModelFormulaStr is not NULL.

Details

This plotting function is used to make the branching plots for a branch dependent gene goes through the progenitor state and bifurcating into two distinct branchs (Similar to the pitch-fork bifurcation in dynamic systems). In order to make the bifurcation plot, we first duplicated the progenitor states and by default stretch each branch into maturation level 0-100. Then we fit two nature spline curves for each branchs using VGAM package.

Value

a ggplot2 plot object

Link to this function

plot_genes_in_pseudotime()

Plots expression for one or more genes as a function of pseudotime

Description

Plots expression for one or more genes as a function of pseudotime. Plotting allows you determine if the ordering produced by orderCells() is correct and it does not need to be flipped using the "reverse" flag in orderCells

Usage

plot_genes_in_pseudotime(cds_subset, min_expr = NULL, cell_size = 0.75,
  nrow = NULL, ncol = 1, panel_order = NULL, color_by = "State",
  trend_formula = "~ sm.ns(Pseudotime, df=3)", label_by_short_name = TRUE,
  relative_expr = TRUE, vertical_jitter = NULL, horizontal_jitter = NULL)

Arguments

ArgumentDescription
cds_subsetCellDataSet for the experiment
min_exprthe minimum (untransformed) expression level to use in plotted the genes.
cell_sizethe size (in points) of each cell used in the plot
nrowthe number of rows used when laying out the panels for each gene's expression
ncolthe number of columns used when laying out the panels for each gene's expression
panel_orderthe order in which genes should be layed out (left-to-right, top-to-bottom)
color_bythe cell attribute (e.g. the column of pData(cds)) to be used to color each cell
trend_formulathe model formula to be used for fitting the expression trend over pseudotime
label_by_short_namelabel figure panels by gene_short_name (TRUE) or feature id (FALSE)
relative_exprWhether to transform expression into relative values
vertical_jitterA value passed to ggplot to jitter the points in the vertical dimension. Prevents overplotting, and is particularly helpful for rounded transcript count data.
horizontal_jitterA value passed to ggplot to jitter the points in the horizontal dimension. Prevents overplotting, and is particularly helpful for rounded transcript count data.

Value

a ggplot2 plot object

Examples

library(HSMMSingleCell)
HSMM <- load_HSMM()
my_genes <- row.names(subset(fData(HSMM), gene_short_name %in% c("CDK1", "MEF2C", "MYH3")))
cds_subset <- HSMM[my_genes,]
plot_genes_in_pseudotime(cds_subset, color_by="Time")
Link to this function

plot_genes_jitter()

Plots expression for one or more genes as a jittered, grouped points

Description

Accepts a subset of a CellDataSet and an attribute to group cells by, and produces one or more ggplot2 objects that plots the level of expression for each group of cells.

Usage

plot_genes_jitter(cds_subset, grouping = "State", min_expr = NULL,
  cell_size = 0.75, nrow = NULL, ncol = 1, panel_order = NULL,
  color_by = NULL, plot_trend = FALSE, label_by_short_name = TRUE,
  relative_expr = TRUE)

Arguments

ArgumentDescription
cds_subsetCellDataSet for the experiment
groupingthe cell attribute (e.g. the column of pData(cds)) to group cells by on the horizontal axis
min_exprthe minimum (untransformed) expression level to use in plotted the genes.
cell_sizethe size (in points) of each cell used in the plot
nrowthe number of rows used when laying out the panels for each gene's expression
ncolthe number of columns used when laying out the panels for each gene's expression
panel_orderthe order in which genes should be layed out (left-to-right, top-to-bottom)
color_bythe cell attribute (e.g. the column of pData(cds)) to be used to color each cell
plot_trendwhether to plot a trendline tracking the average expression across the horizontal axis.
label_by_short_namelabel figure panels by gene_short_name (TRUE) or feature id (FALSE)
relative_exprWhether to transform expression into relative values

Value

a ggplot2 plot object

Examples

library(HSMMSingleCell)
HSMM <- load_HSMM()
my_genes <- HSMM[row.names(subset(fData(HSMM), gene_short_name %in% c("MYOG", "ID1", "CCNB2"))),]
plot_genes_jitter(my_genes, grouping="Media", ncol=2)
Link to this function

plot_genes_positive_cells()

Plots the number of cells expressing one or more genes as a barplot

Description

@description Accetps a CellDataSet and a parameter,"grouping", used for dividing cells into groups. Returns one or more bar graphs (one graph for each gene in the CellDataSet). Each graph shows the percentage of cells that express a gene in the in the CellDataSet for each sub-group of cells created by "grouping".

Let's say the CellDataSet passed in included genes A, B, and C and the "grouping parameter divided all of the cells into three groups called X, Y, and Z. Then three graphs would be produced called A, B, and C. In the A graph there would be three bars one for X, one for Y, and one for Z. So X bar in the A graph would show the percentage of cells in the X group that express gene A.

Usage

plot_genes_positive_cells(cds_subset, grouping = "State", min_expr = 0.1,
  nrow = NULL, ncol = 1, panel_order = NULL, plot_as_fraction = TRUE,
  label_by_short_name = TRUE, relative_expr = TRUE, plot_limits = c(0,
  100))

Arguments

ArgumentDescription
cds_subsetCellDataSet for the experiment
groupingthe cell attribute (e.g. the column of pData(cds)) to group cells by on the horizontal axis
min_exprthe minimum (untransformed) expression level to use in plotted the genes.
nrowthe number of rows used when laying out the panels for each gene's expression
ncolthe number of columns used when laying out the panels for each gene's expression
panel_orderthe order in which genes should be layed out (left-to-right, top-to-bottom)
plot_as_fractionwhether to show the percent instead of the number of cells expressing each gene
label_by_short_namelabel figure panels by gene_short_name (TRUE) or feature id (FALSE)
relative_exprWhether to transform expression into relative values
plot_limitsA pair of number specifying the limits of the y axis. If NULL, scale to the range of the data.

Value

a ggplot2 plot object

Examples

library(HSMMSingleCell)
HSMM <- load_HSMM()
MYOG_ID1 <- HSMM[row.names(subset(fData(HSMM), gene_short_name %in% c("MYOG", "ID1"))),]
plot_genes_positive_cells(MYOG_ID1, grouping="Media", ncol=2)
Link to this function

plot_genes_violin()

Plots expression for one or more genes as a violin plot

Description

Accepts a subset of a CellDataSet and an attribute to group cells by, and produces one or more ggplot2 objects that plots the level of expression for each group of cells.

Usage

plot_genes_violin(cds_subset, grouping = "State", min_expr = NULL,
  cell_size = 0.75, nrow = NULL, ncol = 1, panel_order = NULL,
  color_by = NULL, plot_trend = FALSE, label_by_short_name = TRUE,
  relative_expr = TRUE, log_scale = TRUE)

Arguments

ArgumentDescription
cds_subsetCellDataSet for the experiment
groupingthe cell attribute (e.g. the column of pData(cds)) to group cells by on the horizontal axis
min_exprthe minimum (untransformed) expression level to use in plotted the genes.
cell_sizethe size (in points) of each cell used in the plot
nrowthe number of rows used when laying out the panels for each gene's expression
ncolthe number of columns used when laying out the panels for each gene's expression
panel_orderthe order in which genes should be layed out (left-to-right, top-to-bottom)
color_bythe cell attribute (e.g. the column of pData(cds)) to be used to color each cell
plot_trendwhether to plot a trendline tracking the average expression across the horizontal axis.
label_by_short_namelabel figure panels by gene_short_name (TRUE) or feature id (FALSE)
relative_exprWhether to transform expression into relative values
log_scalea boolean that determines whether or not to scale data logarithmically

Value

a ggplot2 plot object

Examples

library(HSMMSingleCell)
HSMM <- load_HSMM()
my_genes <- HSMM[row.names(subset(fData(HSMM), gene_short_name %in% c("ACTA1", "ID1", "CCNB2"))),]
plot_genes_violin(my_genes, grouping="Hours", ncol=2, min_expr=0.1)
Link to this function

plot_multiple_branches_heatmap()

Create a heatmap to demonstrate the bifurcation of gene expression along multiple branches

Description

Create a heatmap to demonstrate the bifurcation of gene expression along multiple branches

Usage

plot_multiple_branches_heatmap(cds, branches, branches_name = NULL,
  cluster_rows = TRUE, hclust_method = "ward.D2", num_clusters = 6,
  hmcols = NULL, add_annotation_row = NULL, add_annotation_col = NULL,
  show_rownames = FALSE, use_gene_short_name = TRUE,
  norm_method = c("vstExprs", "log"), scale_max = 3, scale_min = -3,
  trend_formula = "~sm.ns(Pseudotime, df=3)", return_heatmap = FALSE,
  cores = 1)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment (normally only the branching genes detected with BEAM)
branchesThe terminal branches (states) on the developmental tree you want to investigate.
branches_nameName (for example, cell type) of branches you believe the cells on the branches are associated with.
cluster_rowsWhether to cluster the rows of the heatmap.
hclust_methodThe method used by pheatmap to perform hirearchical clustering of the rows.
num_clustersNumber of clusters for the heatmap of branch genes
hmcolsThe color scheme for drawing the heatmap.
add_annotation_rowAdditional annotations to show for each row in the heatmap. Must be a dataframe with one row for each row in the fData table of cds_subset, with matching IDs.
add_annotation_colAdditional annotations to show for each column in the heatmap. Must be a dataframe with one row for each cell in the pData table of cds_subset, with matching IDs.
show_rownamesWhether to show the names for each row in the table.
use_gene_short_nameWhether to use the short names for each row. If FALSE, uses row IDs from the fData table.
norm_methodDetermines how to transform expression values prior to rendering
scale_maxThe maximum value (in standard deviations) to show in the heatmap. Values larger than this are set to the max.
scale_minThe minimum value (in standard deviations) to show in the heatmap. Values smaller than this are set to the min.
trend_formulaA formula string specifying the model used in fitting the spline curve for each gene/feature.
return_heatmapWhether to return the pheatmap object to the user.
coresNumber of cores to use when smoothing the expression curves shown in the heatmap.

Value

A list of heatmap_matrix (expression matrix for the branch committment), ph (pheatmap heatmap object), annotation_row (annotation data.frame for the row), annotation_col (annotation data.frame for the column).

Link to this function

plot_multiple_branches_pseudotime()

Create a kinetic curves to demonstrate the bifurcation of gene expression along multiple branches

Description

Create a kinetic curves to demonstrate the bifurcation of gene expression along multiple branches

Usage

plot_multiple_branches_pseudotime(cds, branches, branches_name = NULL,
  min_expr = NULL, cell_size = 0.75, norm_method = c("vstExprs", "log"),
  nrow = NULL, ncol = 1, panel_order = NULL, color_by = "Branch",
  trend_formula = "~sm.ns(Pseudotime, df=3)", label_by_short_name = TRUE,
  TPM = FALSE, cores = 1)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment (normally only the branching genes detected with BEAM)
branchesThe terminal branches (states) on the developmental tree you want to investigate.
branches_nameName (for example, cell type) of branches you believe the cells on the branches are associated with.
min_exprThe minimum level of expression to show in the plot
cell_sizeA number how large the cells should be in the plot
norm_methodDetermines how to transform expression values prior to rendering
nrowthe number of rows used when laying out the panels for each gene's expression
ncolthe number of columns used when laying out the panels for each gene's expression
panel_orderthe order in which genes should be layed out (left-to-right, top-to-bottom)
color_bythe cell attribute (e.g. the column of pData(cds)) to be used to color each cell
trend_formulathe model formula to be used for fitting the expression trend over pseudotime
label_by_short_namelabel figure panels by gene_short_name (TRUE) or feature id (FALSE)
TPMWhether to convert the expression value into TPM values.
coresNumber of cores to use when smoothing the expression curves shown in the heatmap.

Value

a ggplot2 plot object

Link to this function

plot_ordering_genes()

Plots genes by mean vs. dispersion, highlighting those selected for ordering

Description

Each gray point in the plot is a gene. The black dots are those that were included in the last call to setOrderingFilter. The red curve shows the mean-variance model learning by estimateDispersions().

Usage

plot_ordering_genes(cds)

Arguments

ArgumentDescription
cdsThe CellDataSet to be used for the plot.
Link to this function

plot_pc_variance_explained()

Plots the percentage of variance explained by the each component based on PCA from the normalized expression data using the same procedure used in reduceDimension function.

Description

Plots the percentage of variance explained by the each component based on PCA from the normalized expression data using the same procedure used in reduceDimension function.

Usage

plot_pc_variance_explained(cds, max_components = 100, norm_method = c("log",
  "vstExprs", "none"), residualModelFormulaStr = NULL, pseudo_expr = NULL,
  return_all = F, use_existing_pc_variance = FALSE, verbose = FALSE, ...)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment after running reduceDimension with reduction_method as tSNE
max_componentsMaximum number of components shown in the scree plot (variance explained by each component)
norm_methodDetermines how to transform expression values prior to reducing dimensionality
residualModelFormulaStrA model formula specifying the effects to subtract from the data before clustering.
pseudo_expramount to increase expression values before dimensionality reduction
return_allA logical argument to determine whether or not the variance of each component is returned
use_existing_pc_varianceWhether to plot existing results for variance explained by each PC
verboseWhether to emit verbose output during dimensionality reduction
...additional arguments to pass to the dimensionality reduction function

Examples

library(HSMMSingleCell)
HSMM <- load_HSMM()
plot_pc_variance_explained(HSMM)
Link to this function

plot_pseudotime_heatmap()

Plots a pseudotime-ordered, row-centered heatmap

Description

The function plot_pseudotime_heatmap takes a CellDataSet object (usually containing a only subset of significant genes) and generates smooth expression curves much like plot_genes_in_pseudotime. Then, it clusters these genes and plots them using the pheatmap package. This allows you to visualize modules of genes that co-vary across pseudotime.

Usage

plot_pseudotime_heatmap(cds_subset, cluster_rows = TRUE,
  hclust_method = "ward.D2", num_clusters = 6, hmcols = NULL,
  add_annotation_row = NULL, add_annotation_col = NULL,
  show_rownames = FALSE, use_gene_short_name = TRUE,
  norm_method = c("log", "vstExprs"), scale_max = 3, scale_min = -3,
  trend_formula = "~sm.ns(Pseudotime, df=3)", return_heatmap = FALSE,
  cores = 1)

Arguments

ArgumentDescription
cds_subsetCellDataSet for the experiment (normally only the branching genes detected with branchTest)
cluster_rowsWhether to cluster the rows of the heatmap.
hclust_methodThe method used by pheatmap to perform hirearchical clustering of the rows.
num_clustersNumber of clusters for the heatmap of branch genes
hmcolsThe color scheme for drawing the heatmap.
add_annotation_rowAdditional annotations to show for each row in the heatmap. Must be a dataframe with one row for each row in the fData table of cds_subset, with matching IDs.
add_annotation_colAdditional annotations to show for each column in the heatmap. Must be a dataframe with one row for each cell in the pData table of cds_subset, with matching IDs.
show_rownamesWhether to show the names for each row in the table.
use_gene_short_nameWhether to use the short names for each row. If FALSE, uses row IDs from the fData table.
norm_methodDetermines how to transform expression values prior to rendering
scale_maxThe maximum value (in standard deviations) to show in the heatmap. Values larger than this are set to the max.
scale_minThe minimum value (in standard deviations) to show in the heatmap. Values smaller than this are set to the min.
trend_formulaA formula string specifying the model used in fitting the spline curve for each gene/feature.
return_heatmapWhether to return the pheatmap object to the user.
coresNumber of cores to use when smoothing the expression curves shown in the heatmap.

Value

A list of heatmap_matrix (expression matrix for the branch committment), ph (pheatmap heatmap object), annotation_row (annotation data.frame for the row), annotation_col (annotation data.frame for the column).

Link to this function

plot_rho_delta()

Plots the decision map of density clusters .

Description

Plots the decision map of density clusters .

Usage

plot_rho_delta(cds, rho_threshold = NULL, delta_threshold = NULL)

Arguments

ArgumentDescription
cdsCellDataSet for the experiment after running clusterCells_Density_Peak
rho_thresholdThe threshold of local density (rho) used to select the density peaks for plotting
delta_thresholdThe threshold of local distance (delta) used to select the density peaks for plotting

Examples

library(HSMMSingleCell)
HSMM <- load_HSMM()
plot_rho_delta(HSMM)

Recursively builds and returns a PQ tree for the MST

Description

Recursively builds and returns a PQ tree for the MST

Usage

pq_helper(mst, use_weights = TRUE, root_node = NULL)

Arguments

ArgumentDescription
mstThe minimum spanning tree, as an igraph object.
use_weightsWhether to use edge weights when finding the diameter path of the tree.
root_nodeThe name of the root node to use for starting the path finding.
Link to this function

reduceDimension()

Compute a projection of a CellDataSet object into a lower dimensional space

Description

Monocle aims to learn how cells transition through a biological program of gene expression changes in an experiment. Each cell can be viewed as a point in a high-dimensional space, where each dimension describes the expression of a different gene in the genome. Identifying the program of gene expression changes is equivalent to learning a trajectory that the cells follow through this space. However, the more dimensions there are in the analysis, the harder the trajectory is to learn. Fortunately, many genes typically co-vary with one another, and so the dimensionality of the data can be reduced with a wide variety of different algorithms. Monocle provides two different algorithms for dimensionality reduction via reduceDimension . Both take a CellDataSet object and a number of dimensions allowed for the reduced space. You can also provide a model formula indicating some variables (e.g. batch ID or other technical factors) to "subtract" from the data so it doesn't contribute to the trajectory.

Usage

reduceDimension(cds, max_components = 2, reduction_method = c("DDRTree",
  "ICA", "tSNE", "SimplePPT", "L1-graph", "SGL-tree"), norm_method = c("log",
  "vstExprs", "none"), residualModelFormulaStr = NULL, pseudo_expr = 1,
  relative_expr = TRUE, auto_param_selection = TRUE, verbose = FALSE,
  scaling = TRUE, ...)

Arguments

ArgumentDescription
cdsthe CellDataSet upon which to perform this operation
max_componentsthe dimensionality of the reduced space
reduction_methodA character string specifying the algorithm to use for dimensionality reduction.
norm_methodDetermines how to transform expression values prior to reducing dimensionality
residualModelFormulaStrA model formula specifying the effects to subtract from the data before clustering.
pseudo_expramount to increase expression values before dimensionality reduction
relative_exprWhen this argument is set to TRUE (default), we intend to convert the expression into a relative expression.
auto_param_selectionwhen this argument is set to TRUE (default), it will automatically calculate the proper value for the ncenter (number of centroids) parameters which will be passed into DDRTree call.
verboseWhether to emit verbose output during dimensionality reduction
scalingWhen this argument is set to TRUE (default), it will scale each gene before running trajectory reconstruction.
...additional arguments to pass to the dimensionality reduction function

Details

You can choose two different reduction algorithms: Independent Component Analysis (ICA) and Discriminative Dimensionality Reduction with Trees (DDRTree). The choice impacts numerous downstream analysis steps, including orderCells . Choosing ICA will execute the ordering procedure described in Trapnell and Cacchiarelli et al., which was implemented in Monocle version 1. DDRTree is a more recent manifold learning algorithm developed by Qi Mao and colleages. It is substantially more powerful, accurate, and robust for single-cell trajectory analysis than ICA, and is now the default method.

Often, experiments include cells from different batches or treatments. You can reduce the effects of these treatments by transforming the data with a linear model prior to dimensionality reduction. To do so, provide a model formula through residualModelFormulaStr .

Prior to reducing the dimensionality of the data, it usually helps to normalize it so that highly expressed or highly variable genes don't dominate the computation. reduceDimension() automatically transforms the data in one of several ways depending on the expressionFamily of the CellDataSet object. If the expressionFamily is negbinomial or negbinomial.size , the data are variance-stabilized. If the expressionFamily is Tobit , the data are adjusted by adding a pseudocount (of 1 by default) and then log-transformed. If you don't want any transformation at all, set norm_method to "none" and pseudo_expr to 0. This maybe useful for single-cell qPCR data, or data you've already transformed yourself in some way.

Value

an updated CellDataSet object

Get the weights needed to lift cells back to high dimensional expression space.

Description

Retrieves the weights that transform the cells' coordinates in the reduced dimension space back to the full (whitened) space.

Usage

reducedDimA(cds)

Arguments

ArgumentDescription
cdsA CellDataSet object.

Value

A matrix that when multiplied by a reduced-dimension set of coordinates for the CellDataSet, recovers a matrix in the full (whitened) space

Examples

A <- reducedDimA(HSMM)
Link to this function

reducedDimA_set()

Get the weights needed to lift cells back to high dimensional expression space.

Description

Sets the weights transform the cells' coordinates in the reduced dimension space back to the full (whitened) space.

Usage

reducedDimA(cds) <- value

Arguments

ArgumentDescription
cdsA CellDataSet object.
valueA whitened expression data matrix

Value

An updated CellDataSet object

Examples

cds <- reducedDimA(A)

Retrieves the the whitening matrix during independent component analysis.

Description

Retrieves the the whitening matrix during independent component analysis.

Usage

reducedDimK(cds)

Arguments

ArgumentDescription
cdsA CellDataSet object.

Value

A matrix, where each row is a set of whitened expression values for a feature and columns are cells.

Examples

K <- reducedDimW(HSMM)
Link to this function

reducedDimK_set()

Sets the the whitening matrix during independent component analysis.

Description

Sets the the whitening matrix during independent component analysis.

Usage

reducedDimK(cds) <- value

Arguments

ArgumentDescription
cdsA CellDataSet object.
valuea numeric matrix

Value

A matrix, where each row is a set of whitened expression values for a feature and columns are cells.

Examples

cds <- reducedDimK(K)

Retrieves the coordinates of each cell in the reduced-dimensionality space generated by calls to reduceDimension.

Description

Reducing the dimensionality of the expression data is a core step in the Monocle workflow. After you call reduceDimension(), this function will return the new coordinates of your cells in the reduced space.

Usage

reducedDimS(cds)

Arguments

ArgumentDescription
cdsA CellDataSet object.

Value

A matrix, where rows are cell coordinates and columns correspond to dimensions of the reduced space.

Examples

S <- reducedDimS(HSMM)
Link to this function

reducedDimS_set()

Set embedding coordinates of each cell in a CellDataSet.

Description

This function sets the coordinates of each cell in a new (reduced-dimensionality) space. Not intended to be called directly.

Usage

reducedDimS(cds) <- value

Arguments

ArgumentDescription
cdsA CellDataSet object.
valueA matrix of coordinates specifying each cell's position in the reduced-dimensionality space.

Value

An update CellDataSet object

Examples

cds <- reducedDimS(S)

Get the whitened expression values for a CellDataSet.

Description

Retrieves the expression values for each cell (as a matrix) after whitening during dimensionality reduction.

Usage

reducedDimW(cds)

Arguments

ArgumentDescription
cdsA CellDataSet object.

Value

A matrix, where each row is a set of whitened expression values for a feature and columns are cells.

Examples

W <- reducedDimW(HSMM)
Link to this function

reducedDimW_set()

Sets the whitened expression values for each cell prior to independent component analysis. Not intended to be called directly.

Description

Sets the whitened expression values for each cell prior to independent component analysis. Not intended to be called directly.

Usage

reducedDimW(cds) <- value

Arguments

ArgumentDescription
cdsA CellDataSet object.
valueA whitened expression data matrix

Value

An updated CellDataSet object

Examples

#' cds <- reducedDimA(A)

Transform relative expression values into absolute transcript counts.

Description

Converts FPKM/TPM data to transcript counts. This allows for the use for negative binomial as an expressionFamily. These results are often far more accurate than using tobit().

Usage

relative2abs(relative_cds, t_estimate = estimate_t(exprs(relative_cds)),
  modelFormulaStr = "~1", ERCC_controls = NULL, ERCC_annotation = NULL,
  volume = 10, dilution = 40000, mixture_type = 1,
  detection_threshold = 800, expected_capture_rate = 0.25,
  verbose = FALSE, return_all = FALSE, method = c("num_genes",
  "tpm_fraction"), cores = 1)

Arguments

ArgumentDescription
relative_cdsthe cds object of relative expression values for single cell RNA-seq with each row and column representing genes/isoforms and cells. Row and column names should be included
t_estimatean vector for the estimated most abundant FPKM value of isoform for a single cell. Estimators based on gene-level relative expression can also give good approximation but estimators based on isoform FPKM will give better results in general
modelFormulaStrmodelformula used to grouping cells for transcript counts recovery. Default is "~ 1", which means to recover the transcript counts from all cells.
ERCC_controlsthe FPKM matrix for each ERCC spike-in transcript in the cells if user wants to perform the transformation based on their spike-in data. Note that the row and column names should match up with the ERCC_annotation and relative_exprs_matrix respectively.
ERCC_annotationthe ERCC_annotation matrix from illumina USE GUIDE which will be ued for calculating the ERCC transcript copy number for performing the transformation.
volumethe approximate volume of the lysis chamber (nanoliters). Default is 10
dilutionthe dilution of the spikein transcript in the lysis reaction mix. Default is 40, 000. The number of spike-in transcripts per single-cell lysis reaction was calculated from
mixture_typethe type of spikein transcripts from the spikein mixture added in the experiments. By default, it is mixture 1. Note that m/c we inferred are also based on mixture 1.
detection_thresholdthe lowest concentration of spikein transcript considered for the regression. Default is 800 which will ensure (almost) all included spike transcripts expressed in all the cells. Also note that the value of c is based on this concentration.
expected_capture_ratethe expected fraction of RNA molecules in the lysate that will be captured as cDNAs during reverse transcription
verbosea logical flag to determine whether or not we should print all the optimization details
return_allparameter for the intended return results. If setting TRUE, matrix of m, c, k^, b^ as well as the transformed absolute cds will be returned in a list format
methodthe formula to estimate the total mRNAs (num_genes corresponds to the second formula while tpm_fraction corresponds to the first formula, see the anouncement on Trapnell lab website for the Census paper)
coresnumber of cores to perform the recovery. The recovery algorithm is very efficient so multiple cores only needed when we have very huge number of cells or genes.

Details

Transform a relative expression matrix to absolute transcript matrix based on the inferred linear regression parameters from most abundant isoform relative expression value. This function takes a relative expression matrix and a vector of estimated most abundant expression value from the isoform-level matrix and transform it into absolute transcript number. It is based on the observation that the recovery efficient of the single-cell RNA-seq is relative low and that most expressed isoforms of gene in a single cell therefore only sequenced one copy so that the most abundant isoform log10-FPKM (t^) will corresponding to 1 copy transcript. It is also based on the fact that the spikein regression parameters k/b for each cell will fall on a line because of the intrinsic properties of spikein experiments. We also assume that if we perform the same spikein experiments as Treutlein et al. did, the regression parameters should also fall on a line in the same way. The function takes the the vector t^ and the detection limit as input, then it uses the t^ and the m/c value corresponding to the detection limit to calculate two parameters vectors k^ and b^ (corresponding to each cell) which correspond to the slope and intercept for the linear conversion function between log10 FPKM and log10 transcript counts. The function will then apply a linear transformation to convert the FPKM to estimated absolute transcript counts based on the the k^ and b^*. The default m/c values used in the algoritm are 3.652201, 2.263576, respectively.

Value

an matrix of absolute count for isoforms or genes after the transformation.

Examples

HSMM_relative_expr_matrix <- exprs(HSMM)
HSMM_abs_matrix <- relative2abs(HSMM_relative_expr_matrix,
t_estimate = estimate_t(HSMM_relative_expr_matrix))
Link to this function

residualMatrix()

Response values

Description

Generates a matrix of response values for a set of fitted models

Usage

residualMatrix(models, residual_type = "response", cores = 1)

Arguments

ArgumentDescription
modelsa list of models, e.g. as returned by fitModels()
residual_typethe response desired, as accepted by VGAM's predict function
coresnumber of cores used for calculation

Value

a matrix where each row is a vector of response values for a particular feature's model, and columns are cells.

Link to this function

responseMatrix()

Calculates response values.

Description

Generates a matrix of response values for a set of fitted models

Usage

responseMatrix(models, newdata = NULL, response_type = "response",
  cores = 1)

Arguments

ArgumentDescription
modelsa list of models, e.g. as returned by fitModels()
newdataa dataframe used to generate new data for interpolation of time points
response_typethe response desired, as accepted by VGAM's predict function
coresnumber of cores used for calculation

Value

a matrix where each row is a vector of response values for a particular feature's model, and columns are cells.

Link to this function

selectTopMarkers()

Select the most cell type specific markers

Description

This is a handy wrapper function around dplyr's top_n function to extract the most specific genes for each cell type. Convenient, for example, for selecting a balanced set of genes to be used in semi-supervised clustering or ordering.

Usage

selectTopMarkers(marker_specificities, num_markers = 10)

Arguments

ArgumentDescription
marker_specificitiesThe dataframe of specificity results produced by calculateMarkerSpecificity
num_markersThe number of markers that will be shown for each cell type

Value

A data frame of specificity results

Link to this function

setOrderingFilter()

Marks genes for clustering

Description

The function marks genes that will be used for clustering in subsequent calls to clusterCells. The list of selected genes can be altered at any time.

Usage

setOrderingFilter(cds, ordering_genes)

Arguments

ArgumentDescription
cdsthe CellDataSet upon which to perform this operation
ordering_genesa vector of feature ids (from the CellDataSet's featureData) used for ordering cells

Value

an updated CellDataSet object

Spike-in transcripts data.

Description

A dataset containing the information for the 92 ERCC spikein transcripts (This dataset is based on the data from the Nature paper from Stephen Quake group)

Format

A data frame with 92 rows and 9 variables: list(" ", " ", list(list("ERCC_ID"), list("ID for ERCC transcripts")), " ", " ", list(list("subgroup"), list("Subgroup for ERCC transcript")), " ", " ", list(list("conc_attomoles_ul_Mix1"), list("Contration of Mix 1 (attomoles / ul)")), " ", " ", list(list("conc_attomoles_ul_Mix2"), list("Contration of Mix 2 (attomoles / ul)")), " ", " ", list(list("exp_fch_ratio"), list("expected fold change between mix 1 over mix 2")), " ", " ", list(list("numMolecules"), list("number of molecules calculated from concentration and volume")),

"

", " ", list(list("rounded_numMolecules"), list("number in rounded digit of molecules calculated from concentration and volume")), " ")

Usage

spike_df

Return a variance-stabilized matrix of expression values

Description

This function was taken from the DESeq package (Anders and Huber) and modified to suit Monocle's needs. It accpets a either a CellDataSet or the expression values of one and returns a variance-stabilized matrix based off of them.

Usage

vstExprs(cds, dispModelName = "blind", expr_matrix = NULL,
  round_vals = TRUE)

Arguments

ArgumentDescription
cdsA CellDataSet to use for variance stabilization.
dispModelNameThe name of the dispersion function to use for VST.
expr_matrixAn matrix of values to transform. Must be normalized (e.g. by size factors) already. This function doesn't do this for you.
round_valsWhether to round expression values to the nearest integer before applying the transformation.