Tuesday Review - SAVE your day for CRISPR, Nature Fake News and Other Stories

Tuesday Review - SAVE your day for CRISPR, Nature Fake News and Other Stories

1. SAVE your day for CRISPR

Two biorxiv papers cover the important topic of making CRISPR analysis user-friendly. In this context, we also included references to several other available CRISPR analysis tools for the benefit of our readers.

SAVE: A secure cloud-based pipeline for CRISPR pooled screen deconvolution

We present a user-friendly, cloud-based, data analysis pipeline for the deconvolution of pooled screening data. This tool, termed SAVE for Screening Analysis Visual Explorer, serves a dual purpose of extracting, clustering and analyzing raw next generation sequencing files derived from pooled screening experiments while at the same time presenting them in a user-friendly way on a secure web-based platform. Moreover, SAVE serves as a useful web-based analysis pipeline for reanalysis of pooled CRISPR screening datasets. Taken together, the framework described in this study is expected to accelerate development of web-based bioinformatics tool for handling all studies which include next generation sequencing data. SAVE is available at http://save.nrihub.org.

CRISPRAnalyzeR: Interactive analysis, annotation and documentation of pooled CRISPR screens

Pooled CRISPR/Cas9 screens are a powerful and versatile tool for the systematic investigation of cellular processes in a variety of organisms. Such screens generate large amounts of data that present a new challenge to analyze and interpret. Here, we developed a web application to analyze, document and explore pooled CRISR/Cas9 screens using a unified single workflow. The end-to-end analysis pipeline features eight different hit calling strategies based on state-of-the-art methods, including DESeq2, MAGeCK, edgeR, sgRSEA, Z-Ratio, Mann-Whitney test, ScreenBEAM and BAGEL. Results can be compared with interactive visualizations and data tables. CRISPRAnalyzeR integrates metainformation from 26 external data resources, providing a wide array of options for the annotation and documentation of screens. The application was developed with user experience in mind, requiring no previous knowledge in bioinformatics. All modern operating systems are supported.

Availability and online documentation: The source code, a pre-configured docker application, sample data and a documentation can be found on our GitHub page (http://www.github.com/boutroslab/CRISPRAnalyzeR). A tutorial video can be found at http://www.crispr-analyzer.org.

Here are some other CRISPR analysis programs. Please feel free to suggest more.

  • Shao, D.D., et al., ATARiS: computational quantification of gene suppression phenotypes from multisample RNAi screens. Genome Res, 2013. 23(4): p. 665-78. Link.
  • ScreenProcessing. 2016; Link.
  • Dai,Z. et al.(2014) edgeR: a versatile tool for the analysis of shRNA-seq and CRISPR-Cas9 genetic screens. F1000Research, 3, 95.
  • Diaz,A.A. et al.(2014) HiTSelect: a comprehensive tool for high-complexity-pooled screen analysis. Nucleic Acids Res.Gene Ontology Consortium (2015) Gene Ontology Consortium: going forward. Nucleic Acids Res., 43, D1049-56.
  • Hart,T. and Moffat,J. (2016) BAGEL: a computational framework for identifying essential genes from pooled library screens. BMC Bioinformatics, 17, 164
  • Heigwer,F. et al.(2016) CRISPR library designer (CLD): Software for multispecies design of single guide RNA libraries. Genome Biol., 17.
  • Heigwer,F. et al.(2014) E-CRISP: fast CRISPR target site identification. Nat. Methods, 11, 122–123.
  • Li,W. et al.(2014) MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol., 15, 554.
  • Li,W. et al.(2015) Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol., 16, 281. Link.
  • Noh,J. and Beibei,C. (2015) sgRSEA: Enrichment Analysis of CRISPR/Cas9 Knockout Screen Data.
  • Rauscher,B. et al.(2017) GenomeCRISPR -a database for high-throughput CRISPR/Cas9 screens. Nucleic Acids Res., 45, D679–D686. Link.
  • Winter,J. et al.(2015) caRpools: An R package for exploratory data analysis and documentation of pooled CRISPR/Cas9 screens. Bioinformatics. Link

2. Make Phylogeny Analysis Great Again

PhyD3: a phylogenetic tree viewer with extended phyloXML support for functional genomics data visualization

Motivation: Comparative and evolutionary studies utilise phylogenetic trees to analyse and visualise biological data. Recently, several web-based tools for the display, manipulation, and annotation of phylogenetic trees, such as iTOL and Evolview, have released updates to be compatible with the latest web technologies. While those web tools operate an open server access model with a multitude of registered users, a feature-rich open source solution using current web technologies is not available. Results: Here, we present an extension of the widely used PhyloXML standard with several new options to accommodate functional genomics or annotation datasets for advanced visualization. Furthermore, PhyD3 has been developed as a lightweight tool using the JavaScript library D3.js to achieve a state-of-the-art phylogenetic tree visualisation in the web browser, with support for advanced annotations. The current implementation is open source, easily adaptable and easy to implement in third parties’ web sites. Availability: More information about PhyD3 itself, installation procedures, and implementation links are available at http://phyd3.bits.vib.be and at http://github.com/vibbits/phyd3/.

IcyTree: Rapid browser-based visualization for phylogenetic trees and networks

IcyTree is an easy-to-use application which can be used to visualize a wide variety of phylogenetic trees and networks. While numerous phylogenetic tree viewers exist already, IcyTree distinguishes itself by being a purely online tool, having a responsive user interface, supporting phylogenetic networks (ancestral recombination graphs in particular), and efficiently drawing trees that include information such as ancestral locations or trait values. IcyTree also provides intuitive panning and zooming utilities that make exploring large phylogenetic trees of many thousands of taxa feasible. Availability and Implementation: IcyTree is a web application and can be accessed directly at http://tgvaughan.github.io/icytree. Currently-supported web browsers include Mozilla Firefox and Google Chrome. IcyTree is written entirely in client-side JavaScript (no plugin required) and, once loaded, does not require network access to run. IcyTree is free software, and the source code is made available at http://github.com/tgvaughan/icytree under version 3 of the GNU General Public License.

3. Pacbio SEQUEL and Long Read Algorithms

Iso-Seq Template Preparation for Sequel Systems

Unicycle algorithm for hybrid assembly

MECAT: an ultra-fast mapping, error correction and de novo assembly tool for single-molecule sequencing reads

The last paper is old (by three months) but also gold.

4. Algorithm delight?

We have not read the following two papers, and listing them here as bookmarks to check soon.

GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly

The identification of genomic rearrangements, particularly in cancers, with high sensitivity and specificity using massively parallel sequencing remains a major challenge. Here, we describe the Genome Rearrangement IDentification Software Suite (GRIDSS), a high-speed structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph assembler. By combining assembly, split read and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line and patient tumour data, recently winning SV sub-challenge #5 of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods. GRIDSS identifies non-template sequence insertions, micro-homologies and large imperfect homologies, and supports multi-sample analysis. GRIDSS is freely available at https://github.com/PapenfussLab/gridss.

A Flow Procedure for the Linearization of Genome Sequence Graphs

Efforts to incorporate human genetic variation into the reference human genome have converged on the idea of a graph representation of genetic variation within a species, a genome sequence graph. A sequence graph represents a set of individual haploid reference genomes as paths in a single graph. When that set of reference genomes is sufficiently diverse, the sequence graph implicitly contains all frequent human genetic variations, including translocations, inversions, deletions, and insertions. In representing a set of genomes as a sequence graph one encounters certain challenges. One of the most important is the problem of graph linearization, essential both for efficiency of storage and access, as well as for natural graph visualization and compatibility with other tools. The goal of graph linearization is to order nodes of the graph in such a way that operations such as access, traversal and visualization are as efficient and effective as possible. A new algorithm for the linearization of sequence graphs, called the flow procedure, is proposed in this paper. Comparative experimental evaluation of the flow procedure against other algorithms shows that it outperforms its rivals in the metrics most relevant to sequence graphs.

5. Our Friend Ruibang on Variant Calling

16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model

16GT is a variant caller for Illumina WGS and WES germline data. It uses a new 16- genotype probabilistic model to unify SNP and indel calling in a single variant calling algorithm. In benchmark comparisons with five other widely used variant callers on a modern 36-core server, 16GT ran faster and demonstrated improved sensitivity in calling SNPs, and it provided comparable sensitivity and accuracy in calling indels as compared to the GATK HaplotypeCaller.

6. Linking Genes to the Organs They Affect - What a Novel Concept

Gene ORGANizer: Linking Genes to the Organs They Affect

One of the biggest challenges in studying how genes work is understanding their effect on the physiology and anatomy of the body. Existing tools try to address this using indirect features, such as expression levels and biochemical pathways. Here, we present Gene ORGANizer (geneorganizer.huji.ac.il), a phenotype-based tool that directly links human genes to the body parts they affect. It is built upon an exhaustive curated database that links more than 7,000 genes to ~150 anatomical parts using >150,000 gene-organ associations. The tool offers user-friendly platforms to analyze the anatomical effects of individual genes, and identify trends within groups of genes. We demonstrate how Gene ORGANizer can be used to make new discoveries, showing that chromosome X is enriched with genes affecting facial features, that positive selection targets genes with more constrained phenotypic effects, and more. We expect Gene ORGANizer to be useful in a variety of evolutionary, medical and molecular studies aimed at understanding the phenotypic effects of genes.

7. TeachEnG: a Teaching Engine for Genomics


Bioinformatics is a rapidly growing field that has emerged from the synergy of computer science, statistics, and biology. Given the interdisciplinary nature of bioinformatics, many students from diverse fields struggle with grasping bioinformatic concepts only from classroom lectures. Interactive tools for helping students reinforce their learning would be thus desirable. Here, we present an interactive online educational tool called TeachEnG (acronym for Teaching Engine for Genomics) for reinforcing key concepts in sequence alignment and phylogenetic tree reconstruction. Our instructional games allow students to align sequences by hand, fill out the dynamic programming matrix in the Needleman-Wunsch global sequence alignment algorithm, and reconstruct phylogenetic trees via the maximum parsimony and Unweighted Pair Group Method with Arithmetic mean (UPGMA) algorithms. With an easily accessible interface and instant visual feedback, TeachEnG will help promote active learning in bioinformatics. TeachEnG is freely available at http://song.igb.illinois.edu/TeachEnG/. It is written in JavaScript and compatible with Firefox, Safari, Chrome, and Microsoft Edge.

8. “University is not a Megaphone to Amplify this or that Political View” - Stanford ex-provost


But I’m actually more worried about the threat from within. Over the years, I have watched a growing intolerance at universities in this country – not intolerance along racial or ethnic or gender lines – there, we have made laudable progress. Rather, a kind of intellectual intolerance, a political one-sidedness, that is the antithesis of what universities should stand for. It manifests itself in many ways: in the intellectual monocultures that have taken over certain disciplines; in the demands to disinvite speakers and outlaw groups whose views we find offensive; in constant calls for the university itself to take political stands. We decry certain news outlets as echo chambers, while we fail to notice the echo chamber we’ve built around ourselves.

This results in a kind of intellectual blindness that will, in the long run, be more damaging to universities than cuts in federal funding or ill-conceived constraints on immigration. It will be more damaging because we won’t even see it: We will write off those with opposing views as evil or ignorant or stupid, rather than as interlocutors worthy of consideration. We succumb to the all-purpose ad hominem because it is easier and more comforting than rational argument. But when we do, we abandon what is great about this institution we serve.

It will not be easy to resist this current. As an institution, we are continually pressed by faculty and students to take political stands, and any failure to do so is perceived as a lack of courage. But at universities today, the easiest thing to do is to succumb to that pressure. What requires real courage is to resist it. Yet when those making the demands can only imagine ignorance and stupidity on the other side, any resistance will be similarly impugned.

The university is not a megaphone to amplify this or that political view, and when it does it violates a core mission. Universities must remain open forums for contentious debate, and they cannot do so while officially espousing one side of that debate.

But we must do more. We need to encourage real diversity of thought in the professoriate, and that will be even harder to achieve. It is hard for anyone to acknowledge high-quality work when that work is at odds, perhaps opposed, to one’s own deeply held beliefs. But we all need worthy opponents to challenge us in our search for truth. It is absolutely essential to the quality of our enterprise.

I fear that the next few years will be difficult to navigate. We need to resist the external threats to our mission, but in this, we have many friends outside the university willing and able to help. But to stem or dial back our academic parochialism, we are pretty much on our own. The first step is to remind our students and colleagues that those who hold views contrary to one’s own are rarely evil or stupid, and may know or understand things that we do not. It is only when we start with this assumption that rational discourse can begin, and that the winds of freedom can blow.

9. Nature Fake News

BBC claims the tabloid Nature is going to make ‘reproducibility’ a requirement for publishing, because most scientists ‘can’t replicate studies by their peers’.

According to a survey published in the journal Nature last summer, more than 70% of researchers have tried and failed to reproduce another scientist’s experiments.

Worry no more. According to BBC, the editors of Nature came up with the following intelligent strategy to fix the problem.

For its part, the journal Nature is taking steps to address the problem. It’s introduced a reproducibility checklist for submitting authors, designed to improve reliability and rigour.

Sadly, this problem is unfixable, because the fixers (Nature editors) themselves are the disease and irreproducibility is just a symptom. Curious readers may take a look at this video and this paper to learn why Nature’s problems are not fixable.

10. Separating Multiple Plasmodium Strains from High-throughput Data

Deconvolution of multiple infections in Plasmodium falciparum from high throughput sequencing data

Motivation: The presence of multiple infecting strains of the malarial parasite Plasmodium falciparum affects key phenotypic traits, including drug resistance and risk of severe disease. Advances in protocols and sequencing technology have made it possible to obtain high-coverage genome-wide sequencing data from blood samples and blood spots taken in the field. However, analysing and interpreting such data is challenging because of the high rate of multiple infections present.

Results: We have developed a statistical method and implementation for deconvolving multiple genome sequences present in an individual with mixed infections. The software package DEploid uses haplotype structure within a reference panel of clonal isolates as a prior for haplotypes present in a given sample. It estimates the number of strains, their relative proportions and the haplotypes presented in a sample, allowing researchers to study multiple infection in malaria with an unprecedented level of detail.

Availability and implementation: The open source implementation DEploid is freely available at https://github.com/mcveanlab/DEploid under the conditions of the GPLv3 license. An R version is available at https://github.com/mcveanlab/DEploid-r.

Written by M. //