YOAVGILAD, a large Chicago-based consortium of human geneticists, is taking a critical look at some of the papers published by ENCODE and GTEx.
We are joking about the ‘consortium’ paper. Yoav Gilad is just one researcher at the University of Chicago, and his papers have at most two to four authors. How they have the audacity to criticize consortium of scientists, whose papers get covered by Washington Post (of all places), is something we cannot understand.
He and his colleagues were surprised to find that certain mouse tissues had more in common with each other than with their human analogies, for example.
“So a mouse liver is a lot more similar to a mouse kidney, in terms of gene expression, than a human liver, and that was a surprise,” Snyder said. “In hindsight, this makes a lot of sense.”
One thing for sure, a paper with ‘reanalysis’ in the title is expected to dig a lot of dirt (for example, here is our modest effort) and Gilad’s paper is no exception. It goes after the most newsworthy part of the ENCODE paper. His other paper on post-mortem tissues of GTEx is linked below, but please start with Dan Graur’s brief summary on that topic.
Recently, the Mouse ENCODE Consortium reported that comparative gene expression data from human and mouse tend to cluster more by species rather than by tissue. This observation was surprising, as it contradicted much of the comparative gene regulatory data collected previously, as well as the common notion that major developmental pathways are highly conserved across a wide range of species, in particular across mammals. Here we show that the Mouse ENCODE gene expression data were collected using a flawed study design, which confounded sequencing batch (namely, the assignment of samples to sequencing flowcells and lanes) with species. When we account for the batch effect, the corrected comparative gene expression data from human and mouse tend to cluster by tissue, not by species.
The use of low quality RNA samples in whole-genome gene expression profiling remains controversial. It is unclear if transcript degradation in low quality RNA samples occurs uniformly, in which case the effects of degradation can be corrected via data normalization, or whether different transcripts are degraded at different rates, potentially biasing measurements of expression levels. This concern has rendered the use of low quality RNA samples in whole- genome expression profiling problematic. Yet, low quality samples (for example, samples collected in the course of fieldwork) are at times the sole means of addressing specific questions.
We sought to quantify the impact of variation in RNA quality on estimates of gene expression levels based on RNA-seq data. To do so, we collected expression data from tissue samples that were allowed to decay for varying amounts of time prior to RNA extraction. The RNA samples we collected spanned the entire range of RNA Integrity Number (RIN) values (a metric commonly used to assess RNA quality). We observed widespread effects of RNA quality on measurements of gene expression levels, as well as a slight but significant loss of library complexity in more degraded samples.
While standard normalizations failed to account for the effects of degradation, we found that by explicitly controlling for the effects of RIN using a linear model framework we can correct for the majority of these effects. We conclude that in instances in which RIN and the effect of interest are not associated, this approach can help recover biologically meaningful signals in data from degraded RNA samples.