Tutorials

Enjoy This Site? Join Our Remote R/Bioinformatics Classes

Note: These tutorials are incomplete. More complete versions are being made available for our members. Sign up for free.

Tips, Bubbles and Crosslinks

Readers may have noticed that two branching structures, tips and bubbles, appeared again and again in our discussions on various de Bruijn graph structures. What they represent depends on the context of the graph. In different circumstances, tips and bubbles may represent sequencing errors, alternatively spliced genes, or SNPs and insertion/deletion. Give that the same graph structure can code for many biologically relevant patterns, it is not possible to resolve a de Bruijn graph without knowing the biological context of the measurement. This issue becomes important, when a de Bruijn graph-based genome assembler is used to solve various problems mentioned in this section.

Apart from the structure of the graph, most algorithms can use k-mer frequency information to resolve between various possibilities. The frequency of reads supporting junctions is an important factor in resolving between which possibility to choose. For example, if both branches of a bubble have similar frequency in a genomic library, they possibly arise from alternative splicing. On the other hand, if one branch has very small frquency compared to another, it is a result of sequencing error. The argument changes for transcriptomes or metagenomes. However, most assemblers remove singleton k-mers and that cuts down large amount of spurious reads at the cost of small number of real reads.