Yesterday we looked into the newly released ‘kmc tools’. Today we will work out another simple problem so that you feel familiar with it. We really love this powerful program, because, as the authors have shown, they could reproduce the results of many previously published bioinformatics papers with only a few commands.

A tutorial on KMC tools

The new version of kmc includes a number of really cool utilities. You need to run the executable ‘kmc_tools’ to access them. Let us demonstrate some uses.

Genome assembly algorithms through jigsaw puzzles

We are developing this tutorial to explain genome assembly algorithms in a simple manner. In fact, rather than explaining, we expect you to discover the answer by manually solving a jigsaw puzzle. Later we show you how your solutions are related to the commonly used algorithms and their variations.

Monday review - KMC3 and other seXY topics

1. KMC3 is out

KMC2 is the best kmer counting tool and is included in our Pandora’s Toolbox. Newly published KMC3 packs many improvements to make the program even better. Here are the updates -

Online Bioinformatics Contest from Stepik/Rosalind

Dear Readers, Happy New Year ! Here is a great way to bring some fun and challenges to your new year. We got a note from Nikolay Vyahhi, who helped build Rosalind and Stepik, that their organization is hosting a bioinformatics competition. The details are posted below -

Roche Dumps Pacbio

Yesterday, Pacbio received its Christmas present for 2016. Roche decided to abruptly terminate its three year-long alliance with the company. During this collaboration, Roche paid Pacbio to develop the Sequel instrument and reserved the exclusive right to sell it in the human clinical market.

Business analysis - Oxford Nanopore

Investor warning: The following post is for entertainment purpose only, and should not be considered as financial advice of any sort. Please consult your favorite government-certified investment adviser or central banker regarding decisions on investing your life savings.

GRASS for Rapid Reannotation of RNAseq Data

Many exciting papers/preprints on RNAseq came out over the last few months. Among them, a recently posted preprint solves an important problem - improving annotations based on new RNAseq data. There were other papers on quantification, compression and search, and we like to cover them in the next few posts.

Using Multidimensional Bloom filters to Search RNAseq Libraries - (i)

A number of recent papers are proposing to use multidimensional Bloom filters to identify genes from a large collection of RNAseq libraries. This post provides general perspective on these papers. In a later post, we will go in depth and explain the algorithm of the recent preprint by carrying out an example.

Postdoctoral Scholar Position in Comparative Plant Genomics and Bioinformatics

Job Title: Postdoctoral Scholar Position in Comparative Plant Genomics and Bioinformatics
The Computational Plant Genomics Lab invites applications for a Postdoctoral position in the Department of Ecology and Evolutionary Biology at the University of Connecticut. We focus on developing computational approaches that integrate next generation sequence data to address questions in non-model plants, particularly forest trees. The lab has the following ongoing projects: 1) Understanding the evolution of alternative translation initiation using RNA-seq data 2) Integrating new and existing approaches to gene prediction to improve the annotation of complex genomes 3) Analysis of gene family evolution and related comparative genomics questions 4) Detecting variation in populations from GBS and related sequence data.

Zipper plot for visualizing transcriptional activity of genomic regions

Abstract: Reconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. The Zipper plot is an application that enables users to interrogate putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These features are obtained from publicly available datasets including CAGE-sequencing (CAGE-seq), ChIP-sequencing (ChIP-seq) for histone marks and DNase-sequencing (DNase-seq). The Zipper plot application requires three input fields (chromosome, genomic coordinate (hg19) of the TSS and strand) and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot.

SuperTranscript - a reference for analysis and visualization of the transcriptome

Abstract: Transcriptomes are tremendously diverse and highly dynamic; visualizing and analysing this complexity is a major challenge. Here we present superTranscript, a single linear representation for each gene. SuperTranscripts contain all unique exonic sequence, built from any combination of transcripts, including reference assemblies, de novo assemblies and long-read sequencing. Our approach enables visualization of transcript structure and provides increased power to detect differential isoform usage.

Designing Molecular LEGOs in Lisp Language

This is a fascinating talk that our readers from both computational and life sciences sides will enjoy. The author realized shortcomings of common programming languages in solving his domain-specific task and developed Clasp starting from common Lisp.

Upgrade to the blog

We are back after making extensive changes to the blog software being used here. Most important among the changes, we got rid of Wordpress and made a commitment to never use Wordpress again. Wordpress is easy to install, but nightmare to maintain with its entire panoply of buggy plugins. Moreover, it sucks up time by failing at the most unfortunate times.

Lior Pachter's Zika Paper

Lynn Yi, Harlod Pimentel and Lior Pachter published a new RNAseq paper that our readers will definitely find interesting. In this paper, the authors showcase the new RNAseq technologies Pachterlab has been developing over the last few years. We covered those components (e.g Kallisto, Sleuth) in earlier posts, but here you can see a biological application to get new insights from already published data.

Ongoing Pacbio bioinformatics meeting (#SMRTBFX)

Readers may keep an eye on #SMRTBFX hashtag on twitter to follow an ongoing conference. This is the best place to know about the latest bioinformatics algorithms on long reads. Gene Myers is again the star of the show. He has been distributing a lot of goodies through his Dazzlerblog, such as -

Qudaich - a Smart Sequence Aligner

Suicide Epidemic: Since NIH-funded Clowns Do Not Want to Discuss It, We Will

A large number of NIH-funded parasites waste taxpayers’ money with the excuse that they are working toward improving the health of Americans. Francis Collins, the head of NIH, uses every opportunity to tell everyone how research funded by NIH helps in improving the life expectancy of Americans (a flat out lie). Yet, when research by Deaton and Case uncovered that the life expectancy of Americans of prime age (45-54) was falling, primarily due to rising suicides, Collins and his minions went completely silent.

Population Genetics of Ancient Jewish Population in India

‘Ancient’ Bene Israel Jews and late-arrived Baghdadi Jews in India started the Bollywood movie industry. Many famous early Indian actresses also came from these communities. This is not common knowledge in India, because those actresses took Muslim (Firoza Begum) or Hindu (Sulochana, Pramila) screen names.

Was Google Really Censoring Elhaik's Khazar Research in 2013?

In 2013, Dr. Elhaik complained about his home page at John Hopkins University mysteriously disappearing from google searches right after his first Jewish genomics paper started to gain attention. We reproduced his complaint here, and then his page came back on top again after a few days.

