Genome Assembly is a Nearly Solved Problem with Long Reads

Genome Assembly is a Nearly Solved Problem with Long Reads

The “genome assembly era” is finally over. This is clear from the titles of the recent talks such as “Question, is de novo genome assembly a solved problem with long reads, yet?” and “40 years of genome assembly, are we done yet?”. Such titles suggest a perception among the audience that genome assembly is a nearly solved problem, or rather the “big bucks are elsewhere”.

This situation was foreseen in a 2013 article posted in the blog. In those days, the most common comment about Pacbio we heard was “Pacbio Instruments, Are they Still Around?”. I wrote -

I will go out on a limb and make a bold call. The world of genomics is on the verge of seeing another set of major transformations, and many algorithms, tools, pipelines and methodologies developed for short reads over the last 3-4 years will be useless. In my opinion, the era of short-read sequencing is reaching a peak, or to be kind to its users, short read technologies are shining like the full moon. Related to peaking of the short read era, we will see two other changes - (i) end of “genome sequencing and genome paper” era and (ii) end of big data bioinformatics. For further explanation of the last sentence, please read the detailed explanation in the later part of the commentary.

PacBio makes assembly so easy that there will be no glory in genome assembly. The genome sequencing era started in the mid-90s with the publication of genome papers of various model organisms, but reached massive media frenzy in 2000-2001, when human genome papers were published. Biologists are generally rewarded for publishing in Science and Nature and genome papers had been the surest way to get there.

Essentially the genome papers got into Nature or Science for picking a cool organism and completing (ii). With long reads from PacBio making step (ii) easy, I do not see why genome papers should get any importance for merely completing the engineering task of assembling high-quality genomes. We also noticed that a paper comparing the genomes of white Bengal tiger, African lion, white African lion and snow leopard did not make into a glam journal. So, possibly BGI saturated the field even prior to arrival of PacBio.

Among the two mentioned talks, Rayan Chikhi makes a solid case that genome assembly is not a “solved problem”. This is correct seeing from the eyes of computer scientists, and I continue to expect novel algorithms improving assembly quality and speed. However, those interested in using their with computational skills for biology will likely find understanding the genome architecture more fascinating. The articles “Rules of the Genome - 1 and 2” in our Expert Membership section are good places to get started.

Written by M. //