How Novel Ideas Trickle Down from Computer Science to Biology

How Novel Ideas Trickle Down from Computer Science to Biology

[The following linear narrative skips a few relevant citations.]

In 2001, Pavel Pevzner wrote an interesting paper titled -

An Eulerian path approach to DNA fragment assembly

The paper collected fourteen princely citations in seven years, excluding those from the authors and their ‘friends and families’.


Although the algorithm by Pevzner and colleagues was good, their program was possibly not biologist-friendly (whatever that means). The 2008 program Velvet implementing similar de Bruijn graph-based algorithm found wider use than 2007 program EULER-SR by Chaisson and Pevzner.

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs


Although Velvet was excellent for bacterial assembly, it failed spectacularly for human genome, just like the senior author of Velvet paper. BGI wrote a fast and scalable program that found wider use.

De novo assembly of human genomes with massively parallel short read sequencing


SOAPdenovo was good for genome assembly, but not for transcriptomes. Trinity, a de Bruijn graph-based assembler written by researchers from Broad Institute, worked better for transcriptomes. Their paper was published in 2011.

Full-length transcriptome assembly from RNA-Seq data without a reference genome


Fast forward to 2013, when a biology lab at Stanford presented the procedure in ‘biologist-friendly language’ in their ‘Simple Fools Guide’. The pdf guide is available from here. De novo assembly, the most difficult step in the RNAseq pipeline, is explained in the following biologist-friendly text.

Buildingade novoassemblyisaverymemory-intensive process.Therearemanyprograms forthis,someofwhicharelistedintheResourcessectionofthischapter.Inourexperience ,theonethatcanbeusedmosteffectivelyonanyfairlynewMaccomputerisCLCgenomicsworkb ench,asmostothersrequiremoreRAMmemorythantypicallyisavailableonpersonalcompute rs(inthe100sofGB,dependingonthenumberofreads).CLCistheonlysoftwareinthisprotoc olthatisnotopensource(anacademiclicenseiscurrently$4,995 ),althoughthereisafreetwo-week trialversionavailable.Unliketheothersoftwareinthisprotocol,CLChasapoint?and-cl ickgraphicaluserinterfaceandisveryeasytouse.CLCusesDeBruijngraphstojoinreadsto gether.


If after reading the above narrative, you still have not figured out why Pavel Pevzner is carrying his gun, please read Alice and Bob’s story in the first chapter in this book. You do not need to buy the book. Dr. Pevzner’s thoughts are available on free trial.

Written by M. //