Bioinformatics


Three Major Breakthroughs on the Origin of Covid

The scientists looking for the true origin of the Wuhan flu are excited about three major developments over the last few weeks. Let me list them here.

Pushback against Using PCA, tSNE and UMAP in Biology

Shotgun Development Biology

In an earlier post, I wrote about five open problems in bioinformatics. In the next several posts, I will select each of them and discuss in some detail. The current post is on the shotgun development biology experiments and related challenges.

Top Five Open Problems in Bioinformatics (2021)

In twitter, a number of researchers are discussing about the open problems in bioinformatics. Therefore, I wanted to share a set of unsolved problems I am curious about. Please tweet your suggestions in reply to this tweet, and I will add them below with your name.

Genome Assembly Experts - Was RaTG13 Fraudulently Constructed?

I like to make our readers be aware of the Chinese publications from where the claim that the virus came from bat originated. The key sequence to understand the origin of Covid is RaTG13, which you can download from here. You can also download the raw data files from NCBI SRA (SRX7724752 and SRX8357956).

Covid Coverup Party (CCP) Invades Zenodo

Over the last eighteen months, biologists funded by the NIH participated in a massive coverup of the origin of the covid virus. Now that they are exposed, these people are acting rather strangely, reminding me of cockroaches running away from the shining light. We like to provide our readers with a detailed overview so that they can get entertained by the actions of these lowly creatures. I have not been so amused ever since Dan Graur vanquised the ENCODE team in 2013 (check “On the Immortality of Television Sets: “Function” in the Human Genome According to the Evolution-Free Gospel of ENCODE”), but ENCODE, to its credit, did not get anyone killed (apart from science itself).

Important Covid-related Datasets Disappeared from NCBI SRA

The origin of SARS coronavirus causing the pandemic is still a mystery due to paucity of early data. This is puzzling because Wuhan, where the pandemic started, is equipped with world-class virology labs. A recent finding by Jesse Bloom, a virologist from Fred Hutch, suggests that we are likely being deliberately misled. By checking the internet caches, he recovered an entire set of early measurements deleted from the NCBI SRA database in March 2020, possibly based on an order from the Chinese government. Incorporating these early measurements point to a progenitor of SARS-Cov2 different from the commonly accepted one.

Discover Novel Viruses Without Leaving Your Couch

Over the last year, you probably saw news articles about how the Wuhan researchers collected bat poops from the remote caves of Yunnan province and found novel viruses. One of those viruses matched closely with the SARS-Cov2 (or that is what they claimed) giving us some idea about the origin of the pandemic. We will talk about this origin question in a later post. This one is about a much easier way to discover new viruses that you can safely try at home. A group of bioinformaticians discovered two novel coronaviruses without leaving their sofa or kissing the posterior of any bat or other dirty animal.

Pangolin - Guilty or Not Guilty?

This is a short post highlighting two recent papers making contradictory claims. Are they both correct?

Did SARS-1.0 Come from Bats through Civets?

“If you tell a lie big enough and keep repeating it, people will eventually come to believe it.”

Strains of SARS-CoV-2

In the previous post, we covered the basics of genetic analysis. The tools discussed there will go a long way to help you follow various scientific discussions involving SARS-CoV-2 genetic data. Today we will quickly review that post, and then look into different “strains” of SARS-CoV-2 coronavirus.

Is Wuhan Coronavirus a Bioweapon?

Over the last few weeks, I received many questions related to genetics of the new coronavirus. Some of them are about genome-based tracking of this virus by the nexttrain team. Others are on claims about two strains of the virus (“L” and “R”), whether the virus is mutating rapidly into more deadly form, how the tests are made, how scientist know that it came from bat or pangolin and finally whether it is a bioweapon.

Counting Quotient Filter and SeqOthello

Prashant Pandey, Rob Patro and collaborators published a number of excellent papers on a new kind of “compound” hashing scheme. The original paper discussing the idea is available at “A General-Purpose Counting Filter: Making Every Bit Count”, but they published other papers linking their idea to bioinformatics. We wrote about Mantis last year in this blog.

A Minimalist R Cheatsheet for NGS Biology

While teaching R to biologists, a common complaint I hear is that “there are too many functions”. Therefore, I decided to take a minimalist approach and not teach students new functions unless those are absolutely necessary. Using existing functions for new tasks has two benefits - (i) it keeps the brain clutter-free from too many function names, (ii) it gives students more practice on the existing functions thus reinforcing their knowledge.

Thousand Dollar Server for NGS Biology

These days, many biologists are performing RNAseq and other NGS experiments. The immediate challenges after collecting the data are (i) where to store them, (ii) where to analyze them and (iii) how to give access to all lab members in an efficient and secure manner.

Git Tricks to be Dangerous

Our Expert content for this week is posted here. You need to become an Expert Member to access it.

Please Join Our Expert Membership Section

Dear readers, over the years many of you requested more organized content and complete tutorials on bioinformatics. Three years back, we started posting them in our membership section. All content in the membership section had been free with registration.

The Hardest Easy Problem in Bioinformatics

Based on my experience of teaching bioinformatics to new programmers, the question - “extract the coding sequence of a multi-exon gene from the human (or other large eukaryotic) genome and translate it to find the protein sequence.” - can be classified as the hardest easy problem. Experienced bioinformaticians can answer the question without blinking, but those in this game for the first time find it extremely challenging.

R is the Most Powerful Language, but not for Bioinformatics

Tutorials - An Absolute Beginner's Guide to Bioinformatics

More Articles ›