Two More #ENCODE Heretics Came Out

Two More #ENCODE Heretics Came Out

Declaring a genomic site non-functional, even though a transcription factor binds to it? In a new paper published in PLOS Biology, Mike Eisen and colleagues commits to what can be considered heresy in the ENCODE world.

[Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm

Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. We used whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anteriorposterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over 40 well-characterized enhancers known to respond to these factors as well as several hundred putative new cis- regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsalventral patterning genes, whose expression we show to be quantitatively modulated by anteriorposterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.

(Note: emphasis ours)

Of course, ENCODE fanatics may explain that result by saying that human genome is more complex than fruit-flies and other lesser creatures. Check “Human Transcriptome is Extremely Complex and Snyderome is the Most Complex of All” for example.

But there is more. Lior Pachter commits to what can be considered absolute sacrilege in his new blog post. He questions some of the tools used by ENCODERs, GENCODERs and other big-science consortia and comes to the conclusion (emphasis ours)-

It is outrageous that multiple journals and consortia have published work based on a method that is essentially a black box. This degrades the quality of the science and undermines scientists who do work hard to diligently validate, benchmark and publish their methods.

More specifically, Pachter found out that the ‘flux capacitor’ method used by GTEx project for quantifying expression effectively throws away 90% of reads compared to other established reads such as Cufflinks, RSEM and eXpress.

Flux Capacitor has very poor performance. With 100 million reads, its performance is equivalent to other software programs at 10 million reads, and similarly, with 10 million reads, it has the performance of other programs at 1 million reads. I think its fair to say that Using Flux Capacitor is equivalent to throwing out 90% of the data!

Imagine what fraction of human genome ENCODE could ‘prove’ to be functional, if it used its tools at 10x performance !!

Written by M. //