Reducing Assembly Complexity of Microbial Genomes with Single-molecule Sequencing

Reducing Assembly Complexity of Microbial Genomes with Single-molecule Sequencing

Many Pacbio experts and one novice group (which is us) joined a twitter chat at #SMRTseq this morning. If you are interested, please click on the hashtag #SMRTseq to find what was discussed. Many thanks to @GenomeBiology for arranging it and building a storify article on the chat. It is very exciting to find the editors of Genome Biology to think out-of-the-box and use the latest social media tools to bring so many PacBio enthusiasts together. Another good example of using social media was shown by Carl Zimmer of National Geographic, who arranged a Google Hangout meeting to discuss the Coelacanth genome paper in April.

In the PacBio chat, Adam Phillippy updated on his pacbio-related paper available from arxiv.

Background: The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.

Results: To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These assemblies are also comparable in accuracy to hybrid assemblies including second-generation data.

Conclusions: Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to below $2,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of complete genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.

Written by M. //