Popular RNAseq packages often use the formula notation in R. For example, the DESeq package uses it in the design parameter, whereas edgeR creates its design matrix by expanding a formula with “model.matrix”. The formula syntax seems to confuse many users of these libraries.
As mentioned in an earlier post, I have been working on a R library for RNAseq data analysis. The goal of this library is to provide clean, easy-to-remember functions for data analysis. In this post, I will describe the functional options chosen for the rna_visualize function for plotting of data. I will also discuss the design and coding challenges encountered during this implementation.
Over the last couple of months, I have been working on and off on a new R package for statistical analysis of RNAseq data. A number of popular and excellent packages (e.g. edgeR, DEseq, DEseq2, limma-voom, sleuth, etc.) exist to solve this problem, and they all use different mathematical methods to find statistically significant genes.
Many exciting papers/preprints on RNAseq came out over the last few months. Among them, a recently posted preprint solves an important problem - improving annotations based on new RNAseq data. There were other papers on quantification, compression and search, and we like to cover them in the next few posts.
Abstract: Transcriptomes are tremendously diverse and highly dynamic; visualizing and analysing this complexity is a major challenge. Here we present superTranscript, a single linear representation for each gene. SuperTranscripts contain all unique exonic sequence, built from any combination of transcripts, including reference assemblies, de novo assemblies and long-read sequencing. Our approach enables visualization of transcript structure and provides increased power to detect differential isoform usage.
Lynn Yi, Harlod Pimentel and Lior Pachter published a new
RNAseq paper that our
readers will definitely find interesting. In this paper, the authors showcase the
new RNAseq technologies Pachterlab has been developing over the last few years. We
covered those components (e.g Kallisto, Sleuth) in earlier posts, but here you can
see a biological application to get new insights from already published data.
RNA-sequencing (RNA-seq) has a wide variety of applications, but no single
analysis pipeline can be used in all cases. We review all of the major steps
in RNA-seq data analysis, including experimental design, quality control, read
alignment, quantification of gene and transcript levels, visualization,
differential gene expression, alternative splicing, functional analysis, gene
fusion detection and eQTL mapping. We highlight the challenges associated with
each step. We discuss the analysis of small RNAs and the integration of RNA-
seq with other functional genomics techniques. Finally, we discuss the outlook
for novel technologies that are changing the state of the art in
Characterizing transcriptomes in both model and non-model organisms has
resulted in a massive increase in our understanding of biological phenomena.
This boon, largely made possible via high-throughput sequencing, means that
studies of functional, evolutionary and population genomics are now being done
by hundreds or even thousands of labs around the world. For many, these
studies begin with a de novo transcriptome assembly, which is a technically
complicated process involving several discrete steps. Each step may be
accomplished in one of several different ways, using different software
packages, each producing different results. This analytical complexity begs
the question – Which method(s) are optimal? Using reference and non-reference
based evaluative methods, I propose a set of guidelines that aim to
standardize and facilitate the process of transcriptome assembly. These
recommendations include the generation of between 20 million and 40 million
sequencing reads from single individual where possible, error correction of
reads, gentle quality trimming, assembly filtering using Transrate and/or gene
expression, annotation using dammit, and appropriate reporting. These
recommendations have been extensively benchmarked and applied to publicly
available transcriptomes, resulting in improvements in both content and
contiguity. To facilitate the implementation of the proposed standardized
methods, I have released a set of version controlled open-sourced code, The
Oyster River Protocol for Transcriptome Assembly, available at http://oyster-
An extremely interesting application of RNA-sequencing analysis is to study
samples over a time series. This allows you to identify patterns of expression
over some response to a stimuli or developmental progression.
If software license is the only thing that stops you
from using wonderful Kallisto algorithm/program, maybe this github
code can help.
As another advantage, it comes with GPL license (could be BSD if not for
Jellyfish dependence) and you can build your code on top of it by using RapMap
as a module. Pseudoalignment is a powerful lightweight concept and we can
expect more applications to use this module.