Heng Li, who moved to Broad Institute, posted an interesting article at arxiv on a genome assembler that combines both assembly and variant calling. We have not finished reading it, but it is the most important paper for us to read today. (Note: The article is already published in Bioinformatics journal. Thanks to Nick Loman for pointing out.)
To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly can be a beneficial complement to the standard variant calling pipeline for whole- genome resequencing. In the methodological aspects, we proposed FMD-index for forward-backward extension of DNA sequences, a fast algorithm for finding all
super-maximal exact matches and one-pass construction of unitigs from an FMD- index.
Cortex is an efficient and low-memory software framework for analysis of genomes using sequence data. There are two main executables, being developed in parallel streams: cortex_con (primary contact Mario Caccamo) is for consensus genome assembly, and cortex_var (primary contact Zamin Iqbal) is for variation and population assembly.
For a slightly different approach to derive variants from short reads, also check HapCompass algorithm.