The de Bruijn graphs are immensely helpful in assembling Illumina sequences, but they often occupy massive amounts of memory, especially for large raw datasets. Our readers interested in representing de Bruijn graphs in compact space should not miss a recent paper by Victoria Crawford, Alan Kuhnle, Christina Boucher, Rayan Chikhi and Travis Gagie. The paper is published in Bioinformatics, but the journal link is not open-source.
The core problem addressed in this paper is as follows. Being able to represent de Bruijn graphs in small space is helpful, but it is also important to be able to update the representation dynamically after changes in the original graph structure. This helps in the assembly process, because often the graph is pruned to delete spurious links. In 2016, Belazzougui et al. proposed a method to do so in “Fully Dynamic de Bruijn Graphs”. The current paper is a practical implemention of Belazzougui’s proposed data structure.
We will write more about this paper in a later blog post. Until then, the readers interested in understanding the algorithm will find these slides from the first author helpful.
The de Bruijn graph is fundamental to the analysis of next generation sequencing data and so, as datasets of DNA reads grow rapidly, it becomes more important to represent de Bruijn graphs compactly while still supporting fast assembly. Previous implementations of compact de Bruijn graphs have not supported node or edge deletion, however, which is important for pruning spurious elements from the graph.
Belazzougui et al. (2016b) recently proposed a compact and fully dynamic representation, which supports exact membership queries and insertions and deletions of both nodes and edges. In this paper, we give a practical implementation of their data structure, supporting exact membership queries and fully dynamic edge operations, as well as limited support for dynamic node operations. We demonstrate experimentally that its performance is comparable to that of state-of-the-art implementations based on Bloom filters.
Availability and Implementation
Our source-code is publicly available at https://github.com/csirac/dynamicDBG under an open-source license.