Random DNA Sequence Mimics #ENCODE !!
When we chose Nothing is so Alien to the Human Mind as the Idea of Randomness as the title of our commentary on Dan Graur’s talk, we had no idea about a PNAS paper that is being widely forwarded in twitter today. The unofficial name of PNAS is ‘Papers not accepted in Science’. So, it is safe bet to assume that this brilliant paper got rejected from Nature and Science, and we will tell you why in a minute, but first let us explain what it showed.
Transcription factors (TFs) recognize short sequence motifs that are present in millions of copies in large eukaryotic genomes. TFs must distinguish their target binding sites from a vast genomic excess of spurious motif occurrences; however, it is unclear whether functional sites are distinguished from nonfunctional motifs by local primary sequence features or by the larger genomic context in which motifs reside. We used a massively parallel enhancer assay in living mouse retinas to compare 1,300 sequences bound in the genome by the photoreceptor transcription factor Cone-rod homeobox (Crx), to 3,000 control sequences. We found that very short sequences bound in the genome by Crx activated transcription at high levels, whereas unbound genomic regions with equal numbers of Crx motifs did not activate above background levels, even when liberated from their larger genomic context. High local GC content strongly distinguishes bound motifs from unbound motifs across the entire genome. Our results show that the cis-regulatory potential of TF-bound DNA is determined largely by highly local sequence features and not by genomic context.
If all that is too Greek to you, Mike White, the first author of the paper, explains in plain English in his blog.
Finding function in the genome with a null hypothesis
Last September, there was a wee bit of a media frenzy over the Phase 2 ENCODE publications. The big story was supposed to be that junk DNA is debunked ENCODE had allegedly shown that instead of being filled with genetic garbage, our genomes are stuffed to the rafters with functional DNA. In the backlash against this storyline, many of us pointed out that the problem with this claim is that it conflates biochemical and organismal definitions of function: ENCODE measured biochemical activities across the human genome, but those biochemical activities are not by themselves strong proof that any particular piece of DNA actually does something useful for us.
The claim that ENCODE results disprove junk DNA is wrong because, as I argued back in the fall, something crucial is missing: a null hypothesis. Without a null hypothesis, how do you know whether to be surprised that ENCODE found biochemical activities over most of the genome? What do you really expect non- functional DNA to look like?
In our paper in this weeks PNAS, we take a stab at answering this question with one of the largest sets of randomly generated DNA sequences ever included in an experimental test of function. We tested 1,300 randomly generated DNAs (more than 100 kb total) for regulatory activity. It turns out that most of those random DNA sequences are active. Conclusion: distinguishing function from non-function is very difficult.
Mike showed the most unexpected thing. Random DNA not only binds to transcription factor, but also -
It turns out that most of the 1,300 random DNA sequences cause reproducible regulatory effects on the reporter gene. You can see this in these results from 620 random DNA sequences below, in what I call a Tie Fighter plot:
(check his blog for the plot)
Why is such an interesting finding not worthy of a Nature paper? Here is the answer.
Nature is too committed to junk scientists and ENCODE results to acknowledge that they blew it big time on real science.
Edit.
Mike White’s tweet response shows exactly what we suspected.