Counting Quotient Filter and SeqOthello

Prashant Pandey, Rob Patro and collaborators published a number of excellent papers on a new kind of “compound” hashing scheme. The original paper discussing the idea is available at “A General-Purpose Counting Filter: Making Every Bit Count”, but they published other papers linking their idea to bioinformatics. We wrote about Mantis last year in this blog.

A Minimalist R Cheatsheet for NGS Biology

While teaching R to biologists, a common complaint I hear is that “there are too many functions”. Therefore, I decided to take a minimalist approach and not teach students new functions unless those are absolutely necessary. Using existing functions for new tasks has two benefits - (i) it keeps the brain clutter-free from too many function names, (ii) it gives students more practice on the existing functions thus reinforcing their knowledge.

Thousand Dollar Server for NGS Biology

These days, many biologists are performing RNAseq and other NGS experiments. The immediate challenges after collecting the data are (i) where to store them, (ii) where to analyze them and (iii) how to give access to all lab members in an efficient and secure manner.

Git Tricks to be Dangerous

Our Expert content for this week is posted here. You need to become an Expert Member to access it.

Please Join Our Expert Membership Section

Dear readers, over the years many of you requested more organized content and complete tutorials on bioinformatics. Three years back, we started posting them in our membership section. All content in the membership section had been free with registration.

The Hardest Easy Problem in Bioinformatics

Based on my experience of teaching bioinformatics to new programmers, the question - “extract the coding sequence of a multi-exon gene from the human (or other large eukaryotic) genome and translate it to find the protein sequence.” - can be classified as the hardest easy problem. Experienced bioinformaticians can answer the question without blinking, but those in this game for the first time find it extremely challenging.

R is the Most Powerful Language, but not for Bioinformatics

Tutorials - An Absolute Beginner's Guide to Bioinformatics

Python Sandbox and Other Helpful Resources for Biology/Bioinformatics

A student in our online class on bioinformatics mentioned that she would have to learn Python/R/linux within a month to be allowed to work at her research lab. This is the new reality in biology. Almost every researcher I know is collecting massive amounts NGS data, whereas the skills to make sense of data are in dire need.

(Remotely Taught Module) - Data Visualization in R

We are offering a new remotely taught module on data visualization in R. You will learn some of the most essential tools needed for exploratory data analysis. Especially, if you heard about the powerful ggplot library, but its logic appears complicated, this module is perfect for you.

A Bioinformatics Study Guide for the Biologists - (i)

Increasingly all biologists and biochemists are feeling the need to learn bioinformatics. The required skill-sets go way beyond being able to run BLAST searches at NCBI or find information on genes and genomes from the online databases. Believe it or not, doing those tasks used to be called “bioinformatics” in biology departments a few years back. That situation changed with next-generation sequencing. Now that sequencing is so cheap, every lab has tons of raw data sitting in their hard-drives and they need help in their analysis.

SibeliaZ - An Extremely Fast Aligner for Multiple Genomes

Readers may enjoy a new paper posted at biorxiv by Ilia Minkin and Paul Medvedev. It shows a method for aligning against multiple closely-related genomes that is order(s) of magnitude faster than the competing approaches. In bioinformatics, such dramatic improvement in speed is not seen often.

Go is Now the Best Programming Languages for Full-fledged Bioinformatics - Really?

Bioinformaticians writing in twitter appear considerably bemused by a new paper that appeared in biorxiv late Friday. Here is the abstract.

Unpacking S4 Objects in R

Modern statistics was invented by a doctor, whose income from curing people was just not enough. To make more money on the side from gambling, he came up with the earliest versions of the rules of probability.

Annual Bioinformatics Contest from the Rosalind Team

A Terrific Post-doc Opportunity to Learn Bioinformatics

Here is a great opportunity to learn cutting-edge algorithms in bioinformatics. Heng Li, who developed several popular NGS bioinformatics programs like Samtools, BWA and Minimap, is moving to Dana Farber Cancer Institute. He is hiring new post-docs to work with him.

Mantis and the Counting Quotient Filter

Bioinformatics Contest - 2018

It is that time of the year again. Our friends from Rosalind, Stepik and Bioinformatics Institute are hosting another bioinformatics contest with qualifying round starting on Feb. 3rd. Details below.

DIY Ancestry Analysis using the GPS Algorithm

For those interested in trying out the cutting-edge tools in ancestry research on real data, I am open-sourcing my own genotype information in this github project along with all analysis steps. You need to install two programs - plink and admixture. Then by following the steps given in the README file, you should be able to find the geographic origin of the given sample, (which is me).

Minimizer - An Introductory Tutorial

This is a condensed version of our longer tutorial on minimizer algorithms available here. Many bioinformatics algorithms use short substrings of a longer sequence, commonly known as k-mers, for indexing, search or assembly. Minimizers allow efficient binning of those k-mers so that some information about the sequence contiguity is preserved.

More Articles ›