Readers may enjoy an old feature article from Nature Methods on how to assess quality of genome assembly. ‘Old’, in the fast-moving NGS world, means it was published in March 2012 :)
We will save you some time by giving the punchline first.
People like to talk about an absolute quality, but there is none, he says. You have to ask about the quality relative to likely uses.
So far, the above has been our strategy to cope with the data deluge. In all collaborative projects, we usually start with the biological question in hand and make approximate calculations to satisfy those needs. Then, we go with trying to figure out the most perfect genome assembly, most perfect transcriptome assembly, etc. That way many heads can start thinking from an early stage of the project, and we do not remain hostages to Trinity running for weeks.
Of course, we do not want to diminish the other aspect of de novo assembly - finding the ‘unknown unknowns’, or as mentioned by C. T. B., but that is a hard project that interests only few computer geeks like us. Most biologists seem to be happy with known unknowns.
Few other insights from the paper -
Ian Korf has a warning for newcomers to de novo genome assembly: This is not an easy science problem. Expect errors and tread carefully.
Wish we spoke with him two years back :)
No matter how much effort is put into mimicking experimental artifacts and biases, simulated data won’t mirror actual data well, he says. Salzberg agrees. Some assemblers perform beautifully on simulated data but fall down on actual data, he says.
We would go one step further and say that even actual data sets differ quite a bit from one another, but thankfully Salzberg mentioned what we were going to add next as a solution.
“If you are trying to get the best assembly, you should run multiple assemblies multiple times.
For many other critical insights, we refer you the article.
Request to readers - would anyone please email us (samanta at homolog.us) the article “Succinct de Bruijn Graphs” by Alexander Bowe, Taku Onodera, Kunihiko Sadakane and Tetsuo Shibuya? Thanks in advance.