Why Efficiency Matters in Big Data Biology

Why Efficiency Matters in Big Data Biology

In the context of yesterday’s commentary on Are Ultra-low RAM Assemblers Useful for those with Kick-ass Servers?, readers may enjoy this very old (i.e. April 2012 :) ) and relevant commentary from C. Titus Brown.

We fully agree with everything he said, except one -

Assemblers kinda suck. Everyone knows it, and recent contests & papers have done a pretty good job of highlighting the limitations (see GAGE and Assemblathon). This is not because the field is full of stupid people, but rather because assembly is a really, really hard problem (see Nagarajan & Pop) – so hard that really smart people have worked for decades on it.

Actually, assembly is the easiest among all hard problems in computational biology. Many researchers came to genome assembly after they got burnt trying to solve protein folding. Other computational problems like de novo prediction of genes, folding of long non-coding RNAs, etc. are also reasonably hard and computational scientists had less success with them than genome assembly.

That criticism aside, we do thank CTB for letting us feel good about the time spent in understanding the paper on Succinct de Bruijn Graph from Japanese group. Efficiency matters !!

Written by M. //