Homolog.us - Frontier in Bioinformatics

Weird Patterns in Japanese Genome Evolution Explained by Garbage-in Garbage-out Effect

A new preprint titled “Legacy Data Confounds Genomics Studies” is recently posted in biorxiv. It shows that the researchers using data from 1000-genome project need to be cautious about garbage-in-garbage-out effect (technical term: batch effect) leading to spurious discoveries.

(Remotely Taught Module) - Data Visualization in R

We are offering a new remotely taught module on data visualization in R. You will learn some of the most essential tools needed for exploratory data analysis. Especially, if you heard about the powerful ggplot library, but its logic appears complicated, this module is perfect for you.

A Bioinformatics Study Guide for the Biologists - (i)

Increasingly all biologists and biochemists are feeling the need to learn bioinformatics. The required skill-sets go way beyond being able to run BLAST searches at NCBI or find information on genes and genomes from the online databases. Believe it or not, doing those tasks used to be called “bioinformatics” in biology departments a few years back. That situation changed with next-generation sequencing. Now that sequencing is so cheap, every lab has tons of raw data sitting in their hard-drives and they need help in their analysis.

SibeliaZ - An Extremely Fast Aligner for Multiple Genomes

Readers may enjoy a new paper posted at biorxiv by Ilia Minkin and Paul Medvedev. It shows a method for aligning against multiple closely-related genomes that is order(s) of magnitude faster than the competing approaches. In bioinformatics, such dramatic improvement in speed is not seen often.

Go is Now the Best Programming Languages for Full-fledged Bioinformatics - Really?

Bioinformaticians writing in twitter appear considerably bemused by a new paper that appeared in biorxiv late Friday. Here is the abstract.

Unpacking S4 Objects in R

Modern statistics was invented by a doctor, whose income from curing people was just not enough. To make more money on the side from gambling, he came up with the earliest versions of the rules of probability.

Papers and Online Tutorials on RNAseq Data Analysis

Over the years, I made many google searches to find high-quality online tutorials on analysis of RNAseq data. Posting my list (with comments) will likely save some time for others making the same journey.

Here is How RNAseq.work Works

The R package “rnaseq.work” helps you run differential expression analysis of RNAseq data using different packages through a single command (“rna_diff_expr”). Also, you can make different types of plots using another single command (“rna_visualize”).

R Code for Adding Isoform Data in RNAseq

In RNAseq analysis, we often need to add the expression estimates for various isoforms of a gene into a single number. For example, the Kallisto or Salmon measure expressions for all isoforms as separate numbers. Those numbers need to be aggregated for subsequent analysis steps for differentially expressed genes. This task is rather trivial for those using PERL or Python. If you want to do the entire analysis in R, the following code may help.

Annual Bioinformatics Contest from the Rosalind Team

Formula Syntax in RNAseq Packages like DESeq2 or edgeR

Popular RNAseq packages often use the formula notation in R. For example, the DESeq package uses it in the design parameter, whereas edgeR creates its design matrix by expanding a formula with “model.matrix”. The formula syntax seems to confuse many users of these libraries.

Rnaseq.work - Current APIs and Design Decisions

As mentioned in an earlier post, I have been working on a R library for RNAseq data analysis. The goal of this library is to provide clean, easy-to-remember functions for data analysis. In this post, I will describe the functional options chosen for the rna_visualize function for plotting of data. I will also discuss the design and coding challenges encountered during this implementation.

Rnaseq.work - Plotting Functions in RNAseq-related Packages

Rnaseq.work - A Package with Clean APIs for Statistical Analysis of RNAseq Data

Over the last couple of months, I have been working on and off on a new R package for statistical analysis of RNAseq data. A number of popular and excellent packages (e.g. edgeR, DEseq, DEseq2, limma-voom, sleuth, etc.) exist to solve this problem, and they all use different mathematical methods to find statistically significant genes.

Live Online Class - RNAseq Data Analysis using R

If you like to use R for RNAseq data analysis, please join our online class on Dec 1/8/15 at 10AM-1PM Pacific time. This module is designed for those from biology background.

Illumina Buys Pacbio, What Are the Implications?

Puzzling observations from various eukaryotic genomes (part III)

We are continuing our discussion of eukaryotic genome evolution based on Dan Graur’s “Molecular and Genome Evolution”. In this post, we present a number of puzzling observations in various eukaryotic genomes. The title of each section also includes the page number of Graur’s book, where the observation is reported.

How do the eukaryotic genomes evolve? (part II)

We are continuing our discussion of eukaryotic genome evolution based on Dan Graur’s “Molecular and Genome Evolution”. In this post, we look at two key measures - genome size and gene size.

How do the eukaryotic genomes evolve? (part I)

In the previous post, I wrote about the book “Molecular and Genome Evolution” by Dan Graur. It contains thirteen chapters as shown below. Chapters 7-11 may be considered the heart of the book, where Graur discusses how the genomes evolve and how new genes come into existence. Among those, the chapters 6-8 present three mechanisms for genome evolution, namely DNA duplication, molecular tinkering and mobile elements. Subsequently, chapters 10 and 11 discuss evolutionary aspects of the prokaryotic and the eukaryotic genomes respectively.

Molecular and Genome Evolution by Dan Graur

Over the last two weeks, I have been reading Dan Graur’s book titled “Molecular and Genome Evolution”. This is a fantastic book that everyone should read before starting to work on any genome-related project. For the benefit of our readers, I will share some comments in this short post. If time permits, I will later follow up with a longer post on the book.

More Articles ›