HGAP Assembler Paper for PacBio Reads is [....]

HGAP Assembler Paper for PacBio Reads is [....]

PR-reviewed journals are doing an incredible job of catching up !! Last week we reported about publication of David H. Silver’s paper on efficient pre- processing of PE Reads. Today we found out that a very good assembly approach for PacBio reads, which we wrote about several times over the last year, is finally published in Nature Methods. Congratulations to its authors including first author Chen-Shan Chin, who is incredibly happy. You can follow him at the highly informative twitter feed @infoecho. He also comments at our blog from time to time.


Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graphbased consensus procedure, which we follow with assembly using off-the- shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.


To read rest of the commentary, you will have to pay $150 for one year’s subscription. You have other options - $2.99 to rent our brains or $3.99 to buy them. However, if you really want to share the incredible story about renting the brains of online bloggers, be ready to mail us a check of $32.


Sorry for our lame attempt at humor, but that is what we got after trying to read the Nature Methods paper. Instead we will provide you with various other openly accessible online sources on HGAP, including our previous commentaries.

1. HGAP Very Accurate de novo Genome Assembly from PacBio Data

2. Is the Era of Short Reads and de Bruijn Assembly Over?

It contains a poster from Chen-Shan on HGAP presented at AGBT13. The commentary generated heated discussion among readers on various aspects of 2GS and 3GS technologies, and whether we need short reads to correct PacBio reads.

3. Github repositories on HGAP, old and new -


Hierarchical genome assembly process (hgap)

4. Other non-HGAP commentaries that may help PacBio researchers -

A Simulator for Pacbio Reads

PacBioToCA for Error-correcting Pacbio Reads

5. This paper got submitted to arxiv.org today.

Reducing Assembly Complexity of Microbial Genomes with Single-molecule Sequencing


The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.


To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These assemblies are also comparable in accuracy to hybrid assemblies including second-generation data.


Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to below $2,000 for most genomes, and future advances in this technology are expected to drive the cost lower. Thisis expected to increase the number of complete genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan- genomes and chromosomal organization.


If you are wondering about empty box in the title, we toyed with different words and did not find them to be accurate descriptions of reality.


What is ‘published’ given that the method was presented in our blog posts in the Jurassic past?

Nature Journal Recognizes arXiv.org (or its own Wile E. Coyote Moment)

Finally available in print

Al Gore became mega-millionaire preaching the virtues of saving the planet, and are you saying you still read printed versions of papers?

Electronically available from Nature’s website

We did not have $150, or even $32, to verify the claim.

Written by M. //