How Novel Ideas Trickle Down from Computer Science to Biology

In 2001, Pavel Pevzner wrote an interesting paper titled -

An Eulerian path approach to DNA fragment assembly

The paper collected fourteen princely citations in seven years, excluding those from the authors and their ‘friends and families’.


Although the algorithm by Pevzner and colleagues was good, their program was possibly not biologist-friendly (whatever that means). The 2008 program Velvet implementing similar de Bruijn graph-based algorithm found wider use than 2007 program EULER-SR by Chaisson and Pevzner.

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs


Although Velvet was excellent for bacterial assembly, it failed spectacularly for human genome, just like the senior author of Velvet paper. BGI wrote a fast and scalable program that found wider use.

De novo assembly of human genomes with massively parallel short read sequencing


SOAPdenovo was good for genome assembly, but not for transcriptomes. Trinity, a de Bruijn graph-based assembler written by researchers from Broad Institute, worked better for transcriptomes. Their paper was published in 2011.

Full-length transcriptome assembly from RNA-Seq data without a reference genome


Fast forward to 2013, when a biology lab at Stanford presented the procedure in ‘biologist-friendly language’ in their ‘Simple Fools Guide’. The pdf guide is available from here. De novo assembly, the most difficult step in the RNAseq pipeline, is explained in the following biologist-friendly text.

Building a de novo assembly is a very memory-intensive process. There are many programs for this, some of which are listed in the Resources section of this chapter. In our experience, the one that can be used most effectively on any fairly new Mac computer is CLC genomics workbench, as most others require more RAM memory than typically is available on personal computers (in the 100s of GB, depending on the number of reads). CLC is the only software in this protocol that is not open source (an academic license is currently $4,995), although there is a free two-week trial version available. Unlike the other software in this protocol, CLC has a point-and-click graphical user interface and is very easy to use. CLC uses De Bruijn graphs to join reads together.


If after reading the above narrative, you still have not figured out why Pavel Pevzner is carrying his gun, please read Alice and Bob’s story in the first chapter in this book. You do not need to buy the book. Dr. Pevzner’s thoughts are available on free trial.

