Chromosome-scale Shotgun Assembly using an in vitro Method for Long-range Linkage
This appears to be a promising paper for chromosome-scale scaffolding. (h/t:@lexnederbragt) The main technology is explained in the following paragraph.
We demonstrate here that DNA linkages up to several hundred kilobases can be produced in vitro using reconstituted chromatin rather than living chromosomes as the substrate for the production of proximity ligation libraries. The resulting libraries share many of the characteristics of Hi-C data that are useful for long-range genome assembly and phasing, including a regular relationship between within-read-pair distance and read count. Combining this in vitro long-range mate-pair library with standard whole genome shotgun and jumping libraries, we generated a de novo human genome assembly with long- range accuracy and contiguity comparable to more expensive methods, for a fraction of the cost and effort. This method, called Chicago (Cell-free Hi-C for Assembly and Genome Organization), depends only on the availability of modest amounts of high molecular weight DNA, and is generally applicable to any species. Here we demonstrate the value of this Chicago data not only for de novo genome assembly using human and alligator, but also as an efficient tool for the identification of structural variations and the phasing of heterozygous variants.
Here is the abstract -
Long-range and highly accurate de novo assembly from short-read data is one of the most pressing challenges in genomics. Recently, it has been shown that read pairs generated by proximity ligation of DNA in chromatin of living tissue can address this problem. These data dramatically increase the scaffold contiguity of assemblies and provide haplotype phasing information. Here, we describe a simpler approach (“Chicago”) based on in vitro reconstituted chromatin. We generated two Chicago datasets with human DNA and used a new software pipeline (“HiRise”) to construct a highly accurate de novo assembly and scaffolding of a human genome with scaffold N50 of 30 Mb. We also demonstrated the utility of Chicago for improving existing assemblies by re- assembling and scaffolding the genome of the American alligator. With a single library and one lane of Illumina HiSeq sequencing, we increased the scaffold N50 of the American alligator from 508 kb to 10 Mb. Our method uses established molecular biology procedures and can be used to analyze any genome, as it requires only about 5 micrograms of DNA as the starting material.
The technology will most likely be offered as a commercial service.
Competing financial interests
The authors have applied for patents on technology described in this manuscript, and Dovetail Genomics LLC is established to commercialize this technology. R.E.G. is Founder and Chief Scientific Officer of Dovetail Genomics. D.H. and D.S.R are members of the Scientific Advisory Board.