Note: These tutorials are incomplete. More complete versions are being made available for our members. Sign up for free.

De Bruijn Graphs for Highly Polymorphic Genomes

In previous section, we discussed distortions in de Bruijn graphs due to haplotype differences. Such differences are minor for most genomes, and do not affect more than 1% of the genome. However, in some organisms, t difference between two haplotypes is not small. For example, the genomes of many marine organisms (pacific oyster Crassostrea gigas, sea urchin Strongylocentrotus purpuratus) are highly polymorphic, which means two copies of the same chromosome found in one individual can have many differences.

The above assumption becomes weak, if two chromosomes are substantially different as it would be in case of highly polymorphic genomes.

In that case, de Bruijn graph of two haploid chromosomes need to be constructed separately and then combined together to understand what the de Bruijn graph from NGS reads would look like.

Mathematically it is very difficult and still open problem. Too many branches are opened, and it is impossible to determine whether those branches are due to assembly problem or due to repeats.

Then again, nobody says that everything needs to be done from short reads. Other technologies need to be used. Long reads or colony sequencing can be used.


Web Statistics