Scientific Questions (and Maybe Answers) for 2021-2038
In an earlier post, I divided the modern era of genetics into 18-year periods (eras). The discoveries of each era opened new questions and provided fuel for the next era. In the most recent era (2003-2020), biologists moved from working on individual genes to whole genome experiments, performing single-cell experiments instead of measuring gene expressions in many cells in aggregate and also moved out of “model organisms” to a wide variety of organisms, all thanks to inexpensive sequencing. The intellectual and commercial drives for these came from the experiences and questions posed by the previous era (1985-2002).
What are the questions posed by 2003-2020 to drive scientific progress in the subsequent 18 years (2021-2038)? We are already in the sixth year, and therefore I will summarize the key directions based on my understanding of the literature. I should also mention that this division into fixed time-spans of 18 years is not perfect, and you see plenty of overlaps of ideas between the eras. It is just a way summarize observations into something understandable.
Here are the key questions of the current period based on reading I have done so far. I will update this post with better understanding.
Pan-genomics and an Unified Understanding of Genome Evolution
With cheaper sequencing, it is now possible to sequence thousands to hundreds of thousands of genomes of model organisms, as well as large number of genomes from related organisms. What patterns do we see in all these genomic data?
Gene-regulatory Networks
Unlike genome sequencing, RNA sequencing (RNAseq) provides dynamic information on an organisms. You can compare between different tissues, or developmental stages or between diseased and normal conditions, and get kind of information never available from genome sequencing.
How do we make sense of information generated from RNAseq? One way is to look into transcriptional control of genes. In the context of developmental biology, Eric Davidson created a systematic method of understanding based on cascade of transcriptional constrols or gene regulatory networks. We posted an interview with him here before he unfortunately passed away in 2016.
With the availability of massive data from scRNAseq, can we mathematically derive the gene regulatory networks?
Using AI to Model for Massive Data
AI (using the third later here)is not a scientific question. It is a set of mathematical tools made feasible by cheap computing, or rather technology in search of questions. In a way, it is similar to DNA sequencing in early 2000s. People anticipated sequencing to get cheaper over time, and attempted to use it to answer all kinds of long-standing problems.
In the same way, many scientists with mathematical or computing background using AI methods to answer biological problems and this will continue to grow over the years.
I will continue to update this post with links to many examples.