Discover Novel Viruses Without Leaving Your Couch
Over the last year, you probably saw news articles about how the Wuhan researchers collected bat poops from the remote caves of Yunnan province and found novel viruses. One of those viruses matched closely with the SARS-Cov2 (or that is what they claimed) giving us some idea about the origin of the pandemic. We will talk about this origin question in a later post. This one is about a much easier way to discover new viruses that you can safely try at home. A group of bioinformaticians discovered two novel coronaviruses without leaving their sofa or kissing the posterior of any bat or other dirty animal.
In a recent preprint titled “Unexpected novel Merbecovirus discoveries in agricultural sequencing datasets from Wuhan, China”, Daoyu Zhang and collaborators reported about downloading a bunch of raw sequence data files from the NCBI SRA database and magically finding coronaviruses and other deadly pathogens that nobody reported about before. Here is what is odd about their finding. They looked into innocuous sequencing files from rice and cotton, where such viruses were not expected. All datasets they analyzed came from China.
To give you some perspective on what is going on, let me share one of my experiences from 2006-07. At that time, I was working on the sea urchin genome and found two honey bee genes in two short contigs fully conserved at the nucleotide level. Given the large evolutionary distance between sea urchin and honey bee, two fully conserved segments of the chromosome would have been major news. My observation was just an artifact, because the same lab doing the genome sequencing of sea urchin was also doing the genome sequencing of honey bee nearby. Contamination happened.
In that light, agricultural sequences showing contamination from deadly viruses mean the plant biologist was probably sharing lab or office with someone else messing up with those viruses without following proper safety protocols. Given that many different raw data files from different places in China showed such contamination, one would argue that working with nasty viruses without adequate protection was a norm, not exception. Moreover, some of those data files were from as early as 2017 and showed completely unknown viruses. How wild was this “wild west” of virus research?
The authors got time to go through only a small subset of raw data files. I expect enterprising bioinformaticians to dig deeper into SRA to track down more unknown viruses and, who knows, maybe some Martians.