Note: These tutorials are incomplete. More complete versions are being made available for our members. Sign up for free.

Why are Assembled Contigs Shorter than Read Length?

One feature of de Bruijn graph-based assemblers that puzzles new users is that their outputs often contain ‘assembled contigs’ shorter than the read length. How is it possible to have assembled units shorter than the reads? What are the significances of those short contigs?

To understand the above observation, readers need to realize that de Bruijn graph-based assemblers break short reads into much shorter k-mers, after which all information about the reads are lost as far as the assembler is concerned. Fig. 1 shows the process for a set of reads, all of which contain a short repetitive region in the center.

Second important point is that the assemblers take two information into account – graph structure showing connectivity of the nodes, and the frequency of the nodes.


Web Statistics