SOAPdenovo2 Demystified - A Rough Guideline of Functionality

SOAPdenovo2 Demystified - A Rough Guideline of Functionality


The next few SOAPdenovo-related commentaries will dig deeper into data structure and algorithm of SOAPdenovo2. Here is a rough guideline of what to expect.

A new user, who likes to stay completely oblivious to the details of how SOAPdenovo2 works, has the following mental picture of the assembler -

write-blog

Once you get somewhat familiar with the manual, you will find that the blackbox consists of four independent stages -

write-blog

Why do we call them independent? The answer will be clear from the following picture -

write-blog

Each stage of the program generates a set of output files, and the next stage reads them as input and does the next step of assembly. So, if you replace the output of a stage by files generated from a different program (say Velvet or Minia), SOAPdenovo2 will not be able to tell the difference and run to completion. That could be an interesting way to mix-and-match various assembly program.

More adventurous bioinformaticians will probably look into the code and see what the program does to go from one stage to another. You will find that the stages run through following functions.

-————-

pregraph stage:

`

prlRead2HashTable - parallel code for loading read libraries and splitting into kmers

singleKmer

chopKmer4read - Chops the reads into kmers and store them in KmerSets

thread_mark

thread_delow - Removes the kmers with low coverage

Mark1in1outNode - Marks the linear kmers

deLowCov - Removes the kmers with low coverage

removeSingleTips - single tips are removed

removeMinorTips - minor tips are removed

kmer2edges - Builds edges by combining linear kmers

prlRead2edge - parallel code for remapping reads on to edges

searchKmer

chopKmer4read

parse1read

search1kmerPlus

thread_add1preArc

`

-——————-

contig stage:

`

swapedge

sortedge

freshArc

solveReps

removeWeakEdges

removeLowCovEdges

cutTipsInGraph

Iterate

createFilter

buildGraphHash

addArc

getUnlikeArc - get arcs that could be processed incorrectly

deleteUnlikeArc - delete this arcs

freshArc

removeWeakEdges2

removeLowCovEdges2

cutTipsInGraph2

`

-——————-

map stage:

`

prlContig2nodes

prlLongRead2Ctg

prlRead2Ctg

`

-——————-

scaffold stage:

`

PE2Links - Updates connections between contigs based on alignment information of paired-end reads

Links2Scaf - Constructs scaffolds based on alignment information

scaffolding

`

In our blog, we posted many commentaries on building and resolving de Bruijn graphs and readers can understand the first three stages by going through them. The scaffold stage is novel (and quite elaborate) for SOAPdenovo. It will take us some time to go through the algorithm, but curious readers may find the following diagram helpful. It explains Links2Scaf function. Functions in red have many sub-functions not shown in the chart, whereas functions in black have few sub-functions.

write-blog



Written by M. //