Exploring Genome Characteristics and Sequence Quality Without a Reference

Exploring Genome Characteristics and Sequence Quality Without a Reference


We wrote about Jared Simpson’s preprocessing module -

Very Helpful Preprocessing Module for Those Interested in Assembling Genomes

If you used the module and do not know what to cite, Jared posted the related paper at the arxiv.

The de novo assembly of large, complex genomes is a signi cant challenge with currently available DNA sequencing technology. While many

de novo assembly software packages are available, comparatively little attention has been paid to assisting the user with the assembly. This paper

addresses the practical aspects of de novo assembly by introducing new

ways to perform quality assessment on a collection of DNA sequence reads.

The software implementation calculates per-base error rates, paired-end

fragment size histograms and coverage metrics in the absence of a reference genome. Additionally, the software will estimate characteristics of

the sequenced genome, such as repeat content and heterozygosity, that are

key determinants of assembly diculty. The software described is freely

available and open source under the GNU Public License.



Written by M. //