Homolog.us - Frontier in Bioinformatics

Uncertainty over Pacbio-Illumina Deal May Spill over to Oxford Nanopore

A couple of warnings before we begin - (i) this article is for entertainment purpose only and no part of it should be considered an investment advice, (ii) we have no financial position in the mentioned companies.

Trouble in the Software Crowdsourcing Paradise

The world of software crowdsourcing is experiencing a new threat that is far more serious than the existing nuisances (e.g. dependency hell, heartbleed bug). In it, a malicious programmer included code in a popular and widely deployed Javascript library to steal cryptocurrency wallets. To explain the significance, let me quickly review the history of this model for software development.

The Hardest Easy Problem in Bioinformatics

Based on my experience of teaching bioinformatics to new programmers, the question - “extract the coding sequence of a multi-exon gene from the human (or other large eukaryotic) genome and translate it to find the protein sequence.” - can be classified as the hardest easy problem. Experienced bioinformaticians can answer the question without blinking, but those in this game for the first time find it extremely challenging.

I Wish Ryan Wick Does not "Publish" his Long Read Assembler Comparison

We encourage our readers to take a look at the comparison of long read assemblers by Ryan Wick and Kathryn Holt. The authors benchmarked five different assemblers, namely Canu, Flye, Ra, Unicycler and Wtdbg2.

Using Synteny in Genome Assembly, an Interesting New Direction?

In this week’s commentary in the membership section, we reviewed the recent advances in the genome assembly field. One paper mentioned there is an excellent PLOS Compbio. review on scaffolding by Jay Ghurye and Mihai Pop. I will skip over the discussion on various long-read technologies and mention a topic with the potential to make substantial improvement in genome assembly.

R is the Most Powerful Language, but not for Bioinformatics


Clarification to those easily offended - the title of this post refers to the Swedish word ‘genomfart’, meaning “place of passage” or “the way forward”. More relevant to our blog, it is the last chapter of Michael Lynch’s 2007 book - “The Origins of Genome Architecture”. I enjoy Lynch’s papers on genome architecture, but must admit that his catchy chapter title compelled me to request his book from the library. So, if you are click-baited into this blog post, I am in a similar boat.

When Will Citing Blog Posts be a Norm in Bioinformatics Publishing?

For many years, bioinformaticians were defining the publishing trend in biology. This started with the influx of physicists around the completion of human genome project. I remember from early 2000s, when my papers with physicists went straight to preprint servers before publication, whereas the papers with biologists had to go through military-level secrecy. Biologists were not ready to share their papers even with close friends due to the fear of “getting scooped”.

Scrubbing Tools for Long Noisy Reads from Rayan Chikhi and Collaborators

Will Companies Like Oxford Nanopore be at the Epicenter of the Next Financial Crisis?

A couple of warnings before we begin - (i) this article is for entertainment purpose only and no part of it should be considered an investment advice, (ii) we have no financial position in the mentioned companies.

Tutorials - An Absolute Beginner's Guide to Bioinformatics

Pay Attention to These Three New and Impressive Genome Assemblers

The genome assembly field continues to be highly active, and the researchers are still coming up with algorithms making significant speed improvements. The following three projects are definitely worth your attention.

A Decision Point Arrives for Oxford Nanopore

A couple of warnings before we begin - (i) this article is for entertainment purpose only and no part of it should be considered an investment advice, (ii) we have no financial position in the mentioned companies.

Python Sandbox and Other Helpful Resources for Biology/Bioinformatics

A student in our online class on bioinformatics mentioned that she would have to learn Python/R/linux within a month to be allowed to work at her research lab. This is the new reality in biology. Almost every researcher I know is collecting massive amounts NGS data, whereas the skills to make sense of data are in dire need.

Contamination Nightmare in Microbial Genome Assemblies

While working on RNAse P in microbial genomes, I noticed something very puzzling. An archaeal protein that was never seen before in bacteria was present (and even annotated) in a newly sequenced bacterial genomes. If true, it could completely change the evolutionary understanding of the RNase P protein families.

Weird Patterns in Japanese Genome Evolution Explained by Garbage-in Garbage-out Effect

A new preprint titled “Legacy Data Confounds Genomics Studies” is recently posted in biorxiv. It shows that the researchers using data from 1000-genome project need to be cautious about garbage-in-garbage-out effect (technical term: batch effect) leading to spurious discoveries.

(Remotely Taught Module) - Data Visualization in R

We are offering a new remotely taught module on data visualization in R. You will learn some of the most essential tools needed for exploratory data analysis. Especially, if you heard about the powerful ggplot library, but its logic appears complicated, this module is perfect for you.

A Bioinformatics Study Guide for the Biologists - (i)

Increasingly all biologists and biochemists are feeling the need to learn bioinformatics. The required skill-sets go way beyond being able to run BLAST searches at NCBI or find information on genes and genomes from the online databases. Believe it or not, doing those tasks used to be called “bioinformatics” in biology departments a few years back. That situation changed with next-generation sequencing. Now that sequencing is so cheap, every lab has tons of raw data sitting in their hard-drives and they need help in their analysis.

SibeliaZ - An Extremely Fast Aligner for Multiple Genomes

Readers may enjoy a new paper posted at biorxiv by Ilia Minkin and Paul Medvedev. It shows a method for aligning against multiple closely-related genomes that is order(s) of magnitude faster than the competing approaches. In bioinformatics, such dramatic improvement in speed is not seen often.

Go is Now the Best Programming Languages for Full-fledged Bioinformatics - Really?

Bioinformaticians writing in twitter appear considerably bemused by a new paper that appeared in biorxiv late Friday. Here is the abstract.

More Articles ›