RNAseq Questions - How to Load and Combine Kallisto Counts in R?

Over the last year, I have been meeting many biologists and training them on NGS RNAseq data analysis. Sometimes they even bring their own research data to the class and learn to analyze as well as see results.

A Minimalist R Cheatsheet for NGS Biology

While teaching R to biologists, a common complaint I hear is that “there are too many functions”. Therefore, I decided to take a minimalist approach and not teach students new functions unless those are absolutely necessary. Using existing functions for new tasks has two benefits - (i) it keeps the brain clutter-free from too many function names, (ii) it gives students more practice on the existing functions thus reinforcing their knowledge.

Thousand Dollar Server for NGS Biology

These days, many biologists are performing RNAseq and other NGS experiments. The immediate challenges after collecting the data are (i) where to store them, (ii) where to analyze them and (iii) how to give access to all lab members in an efficient and secure manner.

Git Tricks to be Dangerous

Hybrid Metagenomic Assembler OPERA-MS

Readers working on metagenome assembly will enjoy a new paper by Denis Bertrand et al that came out in Nature Biotech. I have not gone through the algorithm yet, but would like to do, when the authors make a pdf copy available.

Genome Assembly is a Nearly Solved Problem with Long Reads

Dan Graur's Excellent Book Sold Only One Copy so Far

Recently I requested Dan Graur’s book (“Molecular and Genome Evolution”) through interlibrary loan (ILL). Little did I realize that I took away the only copy available in the US university libraries. For proof, I attach this request slip hidden inside. My copy came from Reed College, which is not far from where I live, but look where else it went to. The request slip shows that someone from Harvard University borrowed the same copy through ILL. Given the distance between Harvard (east coast) and Reed College (west coast), I came to the conclusion that no other copy was available in any library in between.

Campbell Biology is a Horrendously Expensive Worthless Book

I recently picked up the latest “Campbell Biology” (11th edition) after reading two excellent books on genome evolution, namely “The Origins of Genome Architecture” by Michael Lynch and “Molecular and Genome Evolution” by Dan Graur. Descriptions of the later two are posted for our expert members.

Uncertainty over Pacbio-Illumina Deal May Spill over to Oxford Nanopore

A couple of warnings before we begin - (i) this article is for entertainment purpose only and no part of it should be considered an investment advice, (ii) we have no financial position in the mentioned companies.

Trouble in the Software Crowdsourcing Paradise

The world of software crowdsourcing is experiencing a new threat that is far more serious than the existing nuisances (e.g. dependency hell, heartbleed bug). In it, a malicious programmer included code in a popular and widely deployed Javascript library to steal cryptocurrency wallets. To explain the significance, let me quickly review the history of this model for software development.

The Hardest Easy Problem in Bioinformatics

Based on my experience of teaching bioinformatics to new programmers, the question - “extract the coding sequence of a multi-exon gene from the human (or other large eukaryotic) genome and translate it to find the protein sequence.” - can be classified as the hardest easy problem. Experienced bioinformaticians can answer the question without blinking, but those in this game for the first time find it extremely challenging.

I Wish Ryan Wick Does not "Publish" his Long Read Assembler Comparison

We encourage our readers to take a look at the comparison of long read assemblers by Ryan Wick and Kathryn Holt. The authors benchmarked five different assemblers, namely Canu, Flye, Ra, Unicycler and Wtdbg2.

Using Synteny in Genome Assembly, an Interesting New Direction?

In this week’s commentary in the membership section, we reviewed the recent advances in the genome assembly field. One paper mentioned there is an excellent PLOS Compbio. review on scaffolding by Jay Ghurye and Mihai Pop. I will skip over the discussion on various long-read technologies and mention a topic with the potential to make substantial improvement in genome assembly.

R is the Most Powerful Language, but not for Bioinformatics


Clarification to those easily offended - the title of this post refers to the Swedish word ‘genomfart’, meaning “place of passage” or “the way forward”. More relevant to our blog, it is the last chapter of Michael Lynch’s 2007 book - “The Origins of Genome Architecture”. I enjoy Lynch’s papers on genome architecture, but must admit that his catchy chapter title compelled me to request his book from the library. So, if you are click-baited into this blog post, I am in a similar boat.

When Will Citing Blog Posts be a Norm in Bioinformatics Publishing?

For many years, bioinformaticians were defining the publishing trend in biology. This started with the influx of physicists around the completion of human genome project. I remember from early 2000s, when my papers with physicists went straight to preprint servers before publication, whereas the papers with biologists had to go through military-level secrecy. Biologists were not ready to share their papers even with close friends due to the fear of “getting scooped”.

Scrubbing Tools for Long Noisy Reads from Rayan Chikhi and Collaborators

Will Companies Like Oxford Nanopore be at the Epicenter of the Next Financial Crisis?

A couple of warnings before we begin - (i) this article is for entertainment purpose only and no part of it should be considered an investment advice, (ii) we have no financial position in the mentioned companies.

