We finished understanding the algorithms discussed in three SPAdes papers and find them fascinating.
i) The main paper titled “SPAdes A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing”,
ii) Rectangle graph paper - “From de Bruijn Graphs to Rectangle Graphs for Genome Assembly”,
iii) Pathset graph paper - “Pathset Graphs A Novel Approach for Comprehensive Utilization of Paired Reads in Genome Assembly”.
Before we discuss the algorithms here, we would like to explore the SPAdes code to complete our understanding. As we work through the code, we will post our notes on this thread of the forum. That way, we can share what we are doing while keeping the blog clean. There are many C++ files, and we chose a subset that are the most important to start with IMHO. Here is the list -
In the debruijn directory under src:
In the include directory:
Also, they have a separate folder for ‘hammer’ and ‘quake’ error-correction codes that we are staying away from for the time being. If you like to take a look, please feel free to explore these programs -
The above lists are subject to change as we understand better. After completing our work, we will post a blog commentary summarizing our understanding of the files.