Gene Regulatory Network Reconstruction with Single-cell Data
On the subject of reconstructing gene regulatory networks (GRNs) from RNAseq and scRNAseq data, I am working through a number of review papers and the techniques described therein.
-
Gene Regulatory Network Inference in the Era of Single-cell Multi-omics - review paper published in 2023.
-
Gene Regulatory Network Reconstruction: Harnessing the Power of Single-cell Multi-omic Data - published in 2023.
-
Gene Regulatory Network Inference: an Introductory Survey - published in 2019.
As I learn from them, I will post my notes here for the benefit of others. Also, in this context, I will discuss how we can use AI (chatGPT) to speed up learning of bioinformatics tools.
In an earlier post titled Scientific Questions (and Maybe Answers) for 2021-2038, I mentioned reconstructing GRNs as one of the questions that comes as a follow up of getting large amount of RNAseq and scRNAseq data.
How do we make sense of information generated from RNAseq? One way is to look into transcriptional control of genes. In the context of developmental biology, Eric Davidson created a systematic method of understanding based on cascade of transcriptional constrols or gene regulatory networks. We posted an interview with him here before he unfortunately passed away in 2016.
The approach followed by Davidson was slow, because the sequencing technologies were expensive. It appears that scRNAseq produces a large amount of data, and it is possible to mathematically derive GRNs. I am not sure how effective they are, and that is why I decided to go through the review papers. Here I will give an outline and mentione the key mathematical tools described in them. In future articles, I will dig deeper into many of them.
Phase 1 - RNAseq data only
The first round of RNAseq data started coming around 2008. So, it is no surprise that the publication of tools analyzing them also started around the same time. However, the prior era of microarray experiments also saw application of similar mathematical methods.
The prominent methods from the “RNAseq only” era were WGCNA and GENIE3. I will cover them in the following post. The performance of GENIE3 later got a boost in GRNboost published by the same authors.
Phase 2 - RNAseq data and TF-binding sites
Predicting GRNs from RNAseq data was extremely difficult, and therefore the software tools started to incorporate transcription factor (TF) binding sites into their analyses. The quality of TF-binding site identification improved with ATAC-seq. Therefore, most tools combining RNAseq and TF-binding sites (ATAC2GRN, LISA, SPIDER) got published in or after 2018.
Phase 3 - scRNAseq data and TF-binding sites
The data quality improved significantly with single-cell RNAseq, and so did the tools. I will start with covering the following tools in my later articles - SCENIC/SCENIC+, FigR and CellOracle.