Here is a great opportunity to learn cutting-edge algorithms in bioinformatics. Heng Li, who developed several popular NGS bioinformatics programs like Samtools, BWA and Minimap, is moving to Dana Farber Cancer Institute. He is hiring new post-docs to work with him.
I always find Heng Li’s programs and papers very elegant. During his time at Broad Institute, he wrote a number of single author papers on new bioinformatics algorithms. I read each of them several times and found them thought-provoking. Also, I often go through his github account to know what he is current working on. In addition to his popular programs, his github account has many efficient libraries (e.g. klib) that others can use as the starting point.
I asked Heng Li about what he plans to work on at Dana Farber, and he pointed me to his NIH proposal titled Advanced Computational Methods in Analyzing High-throughput Sequencing Data. The abstract is posted below -
“Sequencing technologies have become an essential tool to the study of human evolution, to the understanding of the genetic bases of diseases and to the clinical detection and treatment of genetic disorders. Computational algorithms are indispensible to the analysis of large-scale sequencing data and have received broad attention. However, developed several years ago, many mainstream software packages for sequence alignment, assembly and variant calling have gradually lagged behind the rapid development of sequencing technologies. They are unable to process the latest long reads or assembled contigs, and will be outpaced by upcoming technologies in terms of throughput. The development of advanced algorithms is critical to the applications of sequencing technologies in the near future. This project will address this pressing need with four proposals: (1) developing a fast and accurate aligner that accelerates short-read alignment and can map megabase-long assemblies against large sequence collections of over 100 gigabases in size; (2) developing an integrated caller for small sequence variations that is faster to run, more sensitive to moderately longer insertions and more accessible to biologists without extended expertise in bioinformatics; (3) developing a generic variant filtering tool that uses a novel deep learning model to achieve human-level accuracy on identifying false positive calls; (4) developing a new de novo assembler that works with the latest nanopore reads of ~100 kilobases in length and may achieve good contiguity at low coverage. Upon completion, the proposed studies will dramatically reduce the computational cost of data processing in most research labs and commercial entities, and will enable the applications of long reads in genome assembly, in the study of structural variations and in cancer researches.”
In addition, he plans to continue collaborating with Sunney Xie et al. on single-cell genomics, including single-cell Hi-C and single-cell SNV calling. Sequence graph is the other topic that interests him. If you enjoy working on those topics with him, feel free to send Heng Li an email. You will find his address in this paper.
Minimap2 and Future of BWA
Our readers may also find Heng’s blog titled “Minimap2 and the future of BWA” informative. He is planning to discontinue BWA-MEM, because Minimap2 works significantly better, especially for ultra-long noisy reads.