Top Five Open Problems in Bioinformatics (2021)
In twitter, a number of researchers are discussing about the open problems in bioinformatics. Therefore, I wanted to share a set of unsolved problems I am curious about. Please tweet your suggestions in reply to this tweet, and I will add them below with your name.
This post uses a narrow definition of bioinformatics to include only those problems involving the biomolecules (DNA, RNA, protein). A broader definition may include all aspects of biology and come up with other fascinating questions regarding the analysis of neural signals or images. If you are interested in the later topic, please check Gene Myers’ excellent work.
Here are my top five open problems.
1. Shotgun Development Experiments (Single cell)
Genome assembly from shotgun sequencing got many computer scientists interested in bioinformatics. The alternative method of genome assembly (clone by clone) was very expensive, and an assembly from randomly located fragments dramatically lowered the cost of genome sequencing.
Currently development biology is in the same phase as the early phase of genome sequencing. Researchers are able to get gene expression levels from randomly picked cells through single-cell RNAseq, and they like to reconstruct the developmental process from those shotgun single cell gene expression. This is a fascinating problem Colin Trapnell is working on.
2. T-cell Receptor and Immunoglobulin Repertoire Diversity (and Evolution of Adaptive Immune System in General)
How our adaptive immune system learns and generates antibody to counter future attacks is a fascinating question. Only recently it is possible to get sequence data and reconstruct receptor diversity before and after invasion by foreign pathogen.
3. Origin of Eukaryotes
The evolutionary origin of eukaryotes remains a mystery. Metagenomic sequencing from oceans is filling up our knowledge gap on unicellular eukaryotes as well as bacteria/archaea, and now may be the right time to shine light on this question based on available sequences.
4. Origin of Translation/Genetic Code
The evolutionary origin of translation remains a mystery. Carl Woese spent most of his research life trying to answer this question and ended up revolutionizing microbiology. With discovery of new microbes from thousands of metagenomic sequencing projects, it is now possible to do a computational analysis to address this question.
5. Protein Folding
This problem attracted many physicists to biology in the mid-1990s, when extensive structural data started to be available. Despite decades of work, predicting protein structure from sequence remains to be an open problem. Here is an excellent review from 2012.