In previous two posts on RNAseq concepts (here and here), we explained the inner workings of programs like Kallisto and Salmon based on a simple example. We also created a small simulated set identical to the example, ran Kallisto on it and got results matching theory.

# RNAseq Analysis

## Debugging RNAseq - (iii) Expectation Maximization Hands on

We recently started a series on debugging RNAseq analysis. In the first post of this series, we explained how the programs like Kallisto and Salmon work based on a simple example. This covered the basics of Expectation Maximization.

## Debugging RNAseq - (ii) Core Concept - Expectation Maximization

We recently started a series on debugging RNAseq analysis. At presest, the programs used for analysis are used as black boxes, and the results are taken on faith. This becomes a problem, when the known genes do not show up as expected. Does that mean the prior knowledge is wrong, or the samples are prepared incorrectly, or some parameters/assumptions in the analysis programs are incorrect? Moreover, who will debug the analysis - bioinformaticians reading all relevant biology papers, or biologists learning about the analysis step?

## Debugging RNAseq Analysis - (i)

In my experience, there are large disconnects between the bioinformaticians and biologists regarding RNAseq data analysis. In the current mode of operations, biologists send their samples to core sequencing labs, core sequencing labs send the FASTQ sequence files to ‘expert bioinformaticians’ and then the bioinformaticians pass derived tables to the biologists. Biologists publish, get money and the process starts once again.

## RNAseq Questions - How to Load and Combine Salmon Data in R?

Over the last year, I have been meeting many biologists and training them on NGS RNAseq data analysis. Sometimes they even bring their own research data to the class and learn to analyze as well as see results.

## RNAseq Questions - Loading Many Kallisto Count Files in R

This is a continuation of our discussion from yesterday’s post. For general context, over the last year, I have been meeting many biologists and training them on NGS RNAseq data analysis. Sometimes they even bring their own research data to the class and learn to analyze as well as see results.

## RNAseq Questions - How to Combine Gene Annotations (gff/gtf) with Kallisto Counts?

This is a continuation of our discussion from yesterday’s post. For general context, over the last year, I have been meeting many biologists and training them on NGS RNAseq data analysis. Sometimes they even bring their own research data to the class and learn to analyze as well as see results.

## RNAseq Questions - How to Load and Combine Kallisto Counts in R?

Over the last year, I have been meeting many biologists and training them on NGS RNAseq data analysis. Sometimes they even bring their own research data to the class and learn to analyze as well as see results.

## Papers and Online Tutorials on RNAseq Data Analysis

Over the years, I made many google searches to find high-quality online tutorials on analysis of RNAseq data. Posting my list (with comments) will likely save some time for others making the same journey.

## Here is How RNAseq.work Works

The R package “rnaseq.work” helps you run differential expression analysis of RNAseq data using different packages through a single command (“rna_diff_expr”). Also, you can make different types of plots using another single command (“rna_visualize”).

## R Code for Adding Isoform Data in RNAseq

In RNAseq analysis, we often need to add the expression estimates for various isoforms of a gene into a single number. For example, the Kallisto or Salmon measure expressions for all isoforms as separate numbers. Those numbers need to be aggregated for subsequent analysis steps for differentially expressed genes. This task is rather trivial for those using PERL or Python. If you want to do the entire analysis in R, the following code may help.

## Formula Syntax in RNAseq Packages like DESeq2 or edgeR

Popular RNAseq packages often use the formula notation in R. For example, the DESeq package uses it in the design parameter, whereas edgeR creates its design matrix by expanding a formula with “model.matrix”. The formula syntax seems to confuse many users of these libraries.

## Rnaseq.work - Current APIs and Design Decisions

As mentioned in an earlier post, I have been working on a R library for RNAseq data analysis. The goal of this library is to provide clean, easy-to-remember functions for data analysis. In this post, I will describe the functional options chosen for the rna_visualize function for plotting of data. I will also discuss the design and coding challenges encountered during this implementation.

## Rnaseq.work - Plotting Functions in RNAseq-related Packages

As mentioned in the previous post, I have been working on a R library for RNAseq data analysis. The goal of this library is to provide clean, easy-to-remember functions for analysis. Also, we offer live online classes on R and RNAseq data analysis. For both efforts, it is helpful to discuss the related visualization functions in R.

## Rnaseq.work - A Package with Clean APIs for Statistical Analysis of RNAseq Data

Over the last couple of months, I have been working on and off on a new R package for statistical analysis of RNAseq data. A number of popular and excellent packages (e.g. edgeR, DEseq, DEseq2, limma-voom, sleuth, etc.) exist to solve this problem, and they all use different mathematical methods to find statistically significant genes.

## Live Online Class - RNAseq Data Analysis using R

If you like to use R for RNAseq data analysis, please join our online class on Dec 1/8/15 at 10AM-1PM Pacific time. This module is designed for those from biology background.

## GRASS for Rapid Reannotation of RNAseq Data

Many exciting papers/preprints on RNAseq came out over the last few months. Among them, a recently posted preprint solves an important problem - improving annotations based on new RNAseq data. There were other papers on quantification, compression and search, and we like to cover them in the next few posts.

## SuperTranscript - a reference for analysis and visualization of the transcriptome

Abstract:Transcriptomes are tremendously diverse and highly dynamic; visualizing and analysing this complexity is a major challenge. Here we present superTranscript, a single linear representation for each gene. SuperTranscripts contain all unique exonic sequence, built from any combination of transcripts, including reference assemblies, de novo assemblies and long-read sequencing. Our approach enables visualization of transcript structure and provides increased power to detect differential isoform usage.

## Lior Pachter's Zika Paper

Lynn Yi, Harlod Pimentel and Lior Pachter published a new RNAseq paper that our readers will definitely find interesting. In this paper, the authors showcase the new RNAseq technologies Pachterlab has been developing over the last few years. We covered those components (e.g Kallisto, Sleuth) in earlier posts, but here you can see a biological application to get new insights from already published data.