An Easy-to-follow Introductory Book on NGS Assembly Algorithms

An Easy-to-follow Introductory Book on NGS Assembly Algorithms

Dear Readers,

We are putting together an introductory book to help biologist/bioinformatician (the terms are increasingly being synonymous) understand what is going on inside the RNAseq or genome assembly programs. It is built on top of our popular ‘De Bruijn Graphs for NGS Assembly’ tutorials and several other blog posts. Following the same style as the tutorials, we are focusing on the algorithms and the big picture, and not just how to run programs.


In addition to presenting the basic concepts as in the tutorials, we are adding an entire chapter to delve into the latest cutting-edge algorithms and explain them in simple language. The goal is to help you understand conceptually where those new algorithms fit in the big picture and why they increase speed by 10/100x or improve the assembly quality.

Most of the presentation do not talk about any specific program, but in the last chapter, we go through three commonly used programs (SOAPdenovo2, SPAdes and Trinity), explain how to run them and discuss their code/algorithm. In addition, we mention a number of other commonly used programs and explain what they do.

We are also developing small software modules to explain some of the concepts better. They will be available from our website.


Is assembly an obscure topic most biologists should stay away from?

We do not see how biologists can do any bioinformatics without having rudimentary sense of the assembly algorithms, and this observation is based on interactions with collaborators and their students working on RNAseq data. They all realize that by being able to assemble short read RNAseq libraries, they can explore many different organisms apart from handful of those with high-quality genome sequences. However, how to interpret the Trinity or Oases assemblies is a big challenge for them. From time to time, we receive questions about why certain unrelated genes got fused together or what to do with 100 splice forms of a gene or whether to trust the Trinity or Oases or SOAPdenovo-trans result for a gene. It is impossible to explain why a particular result can be an assembly artifact without drawing the graph structure from the intermediate steps of the assembly.


Timeline and Request Page

The book will be available in electronic form in 2-3 months. If you find the topics interesting, please sign up here and we will keep you posted on our progress. You can also give us feedback on the choice of topics and suggest something we may have missed.

We definitely like to develop a paper version, if there is enough interest from the readers. Printing has additional costs involved, but for those getting the electronic version, we will make sure they only pay for the difference between paper price and electronic price to get the paper version.

Written by M. //