Increasingly all biologists and biochemists are feeling the need to learn bioinformatics. The required skill-sets go way beyond being able to run BLAST searches at NCBI or find information on genes and genomes from the online databases. Believe it or not, doing those tasks used to be called “bioinformatics” in biology departments a few years back. That situation changed with next-generation sequencing. Now that sequencing is so cheap, every lab has tons of raw data sitting in their hard-drives and they need help in their analysis.
Readers may enjoy a new paper posted at biorxiv by Ilia Minkin and Paul Medvedev. It shows a method for aligning against multiple closely-related genomes that is order(s) of magnitude faster than the competing approaches. In bioinformatics, such dramatic improvement in speed is not seen often.
Modern statistics was invented by a doctor, whose income from curing people was just not enough. To make more money on the side from gambling, he came up with the earliest versions of the rules of probability.
Here is a great opportunity to learn cutting-edge algorithms in bioinformatics. Heng Li, who developed several popular NGS bioinformatics programs like Samtools, BWA and Minimap, is moving to Dana Farber Cancer Institute. He is hiring new post-docs to work with him.
For those interested in trying out the cutting-edge tools in ancestry research on real data, I
am open-sourcing my own genotype information in this github project
along with all analysis steps. You need to install two programs - plink and admixture. Then by following
the steps given in the README file, you should be able to find the geographic origin of the given sample,
(which is me).
This is a condensed version of our longer tutorial on minimizer algorithms available here.
Many bioinformatics algorithms use short substrings of a longer sequence, commonly
known as k-mers, for indexing, search or assembly. Minimizers allow efficient binning of
those k-mers so that some information about the sequence contiguity is preserved.
There has been a number of interesting recent developments on minimizers likely to make
bioinformatics algorithms even more efficient. In this post, we like to mention three papers by Y.
Orenstein, G. Marçais and collaborators.
Two biorxiv papers cover the important topic of making CRISPR analysis user-friendly. In this
context, we also included references to several other available CRISPR analysis tools for the
benefit of our readers.
1. Correcting Long Noisy Reads Using de Bruijn Graphs
Great news - the algorithmic concepts for short read assembly developed over the last decade need
not be unlearned. In the two papers presented below, Myers, Pevzner and their colleagues use de
Bruijn graphs for assembly and error correction of long noisy reads.
Yesterday we looked into the newly released ‘kmc tools’. Today we will work out another
simple problem so that you feel familiar with it. We really love this powerful program,
because, as the authors have shown, they could reproduce the results of many previously
published bioinformatics papers with only a few commands.
Happy New Year ! Here is a great way to bring some fun and challenges to your new year. We got
a note from Nikolay Vyahhi, who helped build Rosalind and Stepik, that their organization is hosting a bioinformatics competition. The details are posted below -
A number of recent papers are proposing to use multidimensional Bloom filters to identify genes from a
large collection of RNAseq libraries. This post provides general perspective on these papers. In a later
post, we will go in depth and explain the algorithm of the recent preprint by carrying out an