Second-Generation Linkage Maps for the Pacific Oyster Crassostrea gigas Reveal Errors in Assembly of Genome Scaffolds

Second-Generation Linkage Maps for the Pacific Oyster Crassostrea gigas Reveal Errors in Assembly of Genome Scaffolds


Genome centers publish genome papers in glam journals and then move on to more genome assemblies and more glam papers. In the meanwhile, researchers trying to do biology using the published genomes are stuck with defective assemblies and noise-prone downstream analysis. The biggest source of noise comes from ‘clean’ Illumina reads. Short reads are noisy because of being short, and that noise manifests into incorrect scaffolding in case of genome assembly. Unless researchers recognize this problem and actively work on it instead of generating more and more (1K, 10K, etc.) ‘genome assemblies’ and genome papers, we will be stuck with massive amount of erroneous genome assemblies.

Link

The Pacific oyster Crassostrea gigas, a widely cultivated marine bivalve mollusc, is becoming a genetically and genomically enabled model for highly fecund marine metazoans with complex life-histories. A genome sequence is available for the Pacific oyster, as are first-generation, low density, linkage and gene-centromere maps mostly constructed from microsatellite DNA makers. Here, higher density, second-generation, linkage maps are constructed from more than 1100 coding (exonic) single-nucleotide polymorphisms (SNPs), as well as 66 previously mapped microsatellite DNA markers, all typed in five families of Pacific oysters (nearly 172,000 genotypes). The map comprises 10 linkage groups, as expected, has an average total length of 588 centiMorgans (cM), an average marker-spacing of 1.0 cM, and covers 86% of a genome estimated to be 616 cM. All but seven of the mapped SNPs map to 618 genome scaffolds; 260 scaffolds contain two or more mapped SNPs, but for 100 of these scaffolds (38.5%), the contained SNPs map to different linkage groups, suggesting widespread errors in scaffold assemblies. The 100 misassembled scaffolds are significantly longer than those that map to a single linkage group. On the genetic maps, marker orders and inter-marker distances vary across families and mapping methods, owing to an abundance of markers segregating from only one parent, to widespread distortions of segregation ratios caused by early mortality, as previously observed for oysters, and to genotyping errors. Maps made from framework markers provide stronger support for marker orders and reasonable map lengths and are used to produce a consensus high-density linkage map containing 656 markers.



Written by M. //