Rnaseq.work - A Package with Clean APIs for Statistical Analysis of RNAseq Data

Rnaseq.work - A Package with Clean APIs for Statistical Analysis of RNAseq Data


Over the last couple of months, I have been working on and off on a new R package for statistical analysis of RNAseq data. A number of popular and excellent packages (e.g. edgeR, DEseq, DEseq2, limma-voom, sleuth, etc.) exist to solve this problem, and they all use different mathematical methods to find statistically significant genes.

The library rnaseq.work is unique in this respect, because it offers no innovation. Try to match that, DEseq2 and edgeR !!! :)

Jokes aside, I only mean there is no new math in this library. I am developing it with two objectives.

1. Simplified API

Rnaseq.work has only two functions - rna_diff_expr() and rna_visualize().

The function rna_diff_expr() allows statistical analysis of RNAseq data using various approaches (e.g. DESeq, DESeq2, edgeR, limma-voom, sleuth, baySeq, NOISeq, EBSeq, etc.). You just need to give the count_table and design_table in the form of data frames, and the rest will be internally taken care of by the function. Only DESeq2, edgeR and limma-voom are implemented so far.

rna_diff_expr(count_table, design_table, method="DESeq")
rna_diff_expr(count_table, design_table, method="DESeq2")
rna_diff_expr(count_table, design_table, method="edgeR")
rna_diff_expr(count_table, design_table, method="limma-voom")

The function rna_visualize() will plot RNAseq-related data in various ways (e.g. density plot, MA plot, smear plot, MDS plot, PCA plot, BCV plot, volcano plot, etc) using either the base or ggplot libraries. You only have to change the ‘method’ and ‘lib’ parameters to generate different types of plots.

rna_visualize(count_table, method="hist", lib="base")
rna_visualize(count_table, method="MDS", lib="base")
rna_visualize(count_table, method="smear", lib="ggplot")

This function is not implemented in the publicly available version of the library.

2. Only data frames for input/output, no S3/S4 classes

The second design objective is to use data frames only, and not S3/S4 classes, for input and output. The functions create S4 classes internally as they are required by the existing libraries (e.g. edgeR, DESeq2, etc.).

The reason behind this choice is that in our online classes on RNAseq analysis with R, we like to make the code easily understandable by biologists. Classes or objects are hard to explain to someone new to programming. Given that our courses are designed for biologists, we do not want to minimize those aspects requiring CS training.

Availability

The package rnaseq.work is publicly available at github under the MIT license. It is far from complete at this point. Please feel free to give feedback at github or help in development, if you like.

Working on rnaseq.work forced me to go under the hood for many of the existing RNAseq analysis libraries. I think it will help the community, if I share some useful observations. Therefore, over the coming weeks, I will write a series of blog posts on this topic. Please check for ‘rnaseq.work’ in the title or tag to find them.



Written by M. //