Isn't Serious discussion Needed before Throwing Money into Million Genome Sequencing Projects?

According to NIH Director Francis Collins, NIH is so short of money that it took funds away from almost sure Ebola vaccine development to support other projects. Three months back, he claimed that NIH could have found Ebola vaccine by now, if it had little more money.

“NIH has been working on Ebola vaccines since 2001. It’s not like we suddenly woke up and thought, ‘Oh my gosh, we should have something ready here,’” Collins told The Huffington Post on Friday. “Frankly, if we had not gone through our 10-year slide in research support, we probably would have had a vaccine in time for this that would’ve gone through clinical trials and would have been ready.”

It’s not just the production of a vaccine that has been hampered by money shortfalls. Collins also said that some therapeutics to fight Ebola “were on a slower track than would’ve been ideal, or that would have happened if we had been on a stable research support trajectory.”

This month, the same agency announced a new mega-project on million human genome sequencing and ‘precision medicine’, which is utterly wasteful as argued by Professor Ken Weiss in three informative blog posts. Dr. Weiss has been making similar arguments since late 1990s (see “How many diseases does it take to map a gene with SNPs?”) and has been right so far. The promises made by Francis Collins prior to human genome sequencing project remain mostly unfulfilled, despite having no shortage of money and technology. Only thing that came out of personalized genomics hype is the new name ‘precision medicine’ and more hype.

In the light of all this, we wonder whether it is worth having a serious discussion before throwing away billions of dollars into a new mega boondoggle. Has anyone addressed the scientific objections made by Dr. Weiss? Have we determined why the previous claims made by Francis Collins since mid- 90s remain pipe dreams?

Parts of blog posts of Dr. Weiss are reproduced below.

Your money at work…er, waste: the million genomes project

Bulletin from the Boondoggle Department

In desperate need for a huge new mega-project to lock up even more NIH funds before the Republicans (or other research projects that are actually focused on a real problem) take them away, or before individual investigators who actually have some scientific ideas to test, we read that Francis Collins has apparently persuaded someone who’s not paying attention to fund the genome sequencing of a million people! Well, why not? First we had the (one) human genome project. Then after a couple of iterations, the 1000 genomes project, then the hundred thousand genomes ‘project’. So, what next? Can’t just go up by dribs and drabs, can we? This is America, after all! So let’s open the bank for a cool million. Dr Collins has, apparently, never met a genome he didn’t like or want to peer into. It’s not lascivious exactly, but the emotion that is felt must be somewhat similar.

We now know enough to know just what we’re (not) getting from all of this sequencing, but what we are getting (or at least some people are getting) is a lot of funds sequestered for a few in-groups or, more dispassionately perhaps, for a belief system, the belief that constitutive genome sequence is the way to conquer every disease known to mankind. Why, this is better than what you get by going to communion every week, because it’ll make you immortal so you don’t have to worry that perhaps there isn’t any heaven to go to after all.

What’s ‘precise’ about ‘precision’ medicine (besides desperate spin)?

The million genomes project

In the same breath, we’re hearing that we’ll be funding a million genomes project. The implication is that if we have a million whole genome sequences, we will have ‘precision medicine’ (personalized, too!). But is that a serious claim or is it a laugh?

A million is a large number, but if most variation in gene-based risk is due, as mountains of evidence shows, to countless very rare variants, many of them essentially new, and hordes of them perhaps per person, then even a million genome sequences will not be nearly enough to yield much of what is being promised by the term ‘precision’! We’d need to sequence everybody (I’m sure Dr Collins has that in mind as the next Major Slogan, and I know other countries are talking that way).

Don’t be naive enough to take this for something other than what it really is: (1) a ploy to secure continued funding perpetrated on his Genome Dream, but in the absence of new ideas and the presence of promises any preacher would be proud of, and results that so far clearly belie it; and (2) a way to protect influential NIH clients with major projects that no longer really merit continued protection, but which will be included in this one (3) to guarantee congressional support from our representatives who really don’t know enough to see through it or who simply believe or just want cover for the idea that these sorts of thing (add Defense contracting and NASA mega-projects as other instances) are simply good for local business and sound good to campaign on.

Yes, Francis Collins is born-again with perhaps a simplistic one-cause worldview to go with that. He certainly knows what he’s doing when it comes to marketing based on genetic promises of salvation. This idea is going to be very good for a whole entrenched segment of the research business, because he’s clever enough to say that it will not just be one ‘project’ but is apparently going to have genome sequencing done on an olio of existing projects. Rationales for this sort of ‘project’ are that long-standing, or perhaps long-limping, projects will be salvaged because they can ‘inexpensively’ be added to this new effort. That’s justified because then we don’t have to collect all that valuable data over again.

But if you think about what we already know about genome sequences and their evolution, and about what’s been found with cruder data, from those very projects to be incorporated among others, a million genome sequences will not generate anything like what we usually understand the generic term ‘precision’ to mean. Cruder data? Yes, for example, the kinds of data we have on many of these ongoing studies, based on inheritance, on epidemiological risk assessment, or on other huge genomewide mapping has consistently shown that there is scant new serious information to be found by simply sequencing between mapping-marker sites. The argument that the significance level will raise when we test the actual site doesn’t mean the signal will be strong enough to change the general picture. That picture is that there simply are not major risk factors except, certainly, some rare strong ones hiding in the sequence leaf-litter of rare or functionless variants.

Of course, there will be exceptions, and they’ll be trumpeted to the news media from the mountain top. But they are exceptions, and finding them is not the same as a proper cost-benefit assessment of research priorities. If we have paid for so many mega-GWAS studies to learn something about genomic causation, then we should heed the lessons we ourselves have learned.

Secondly, the data collected or measures taken decades ago in these huge long- term studies are often no longer state of the art, and many people followed for decades are now pushing up daisies, and can’t be followed up.

Thirdly, is the fact that the epidemiological (e.g., lifestyle, environment…) data have clearly been shown largely to yield findings that get reversed by the next study down the pike. That’s the daily news that the latest study has now shown that all previous studies had it wrong: factor X isn’t a risk factor after all. Again, major single-factor causation is elusive already, so just pouring funds on detailed sequencing will mainly be finding reasons for existing programs to buy more gear to milk cows that are already drying up.

Fourth, many if not even most of the major traits whose importance has justified mega-epidemiological longterm follow up studies, have failed to find consistent risk factors to begin with. But for many of the traits, the risk (incidence) has risen faster than the typical response to artificial selection. In that case, if genomic causation were tractably simple, such strong ‘selection’ should reflect those few genes whose variants respond to the changed environmental circumstances. But these are the same traits (obesity, stature, diabetes, autism,…..) for which mapping shows that single, simple genetic causation does not obtain (and, again, that assumes that the environmental risk factors purportedly responsible are even identified, and the yes-no results just mentioned above shows otherwise).

Worse than this, what about the microbiome or the epigenome, that are supposedly so important? Genome sequencing, a convenient way to carry on just as before, simply cannot generally turn miracles in those areas, because they require other kinds of data (and, not available from current sequencing samples nor, of course, from deceased subjects even if we had stored their blood samples).

Somatic mutation: does it cut both ways?

Beware, million genome project!

What has this got to do with the million genome project? An important fact is that SoMu’s are in body tissues but are not part of the constitutive (inherited) genome, as is routinely sampled from, say, a cheek swab or blood sample. The idea underlying the massive attempts at genomewide mapping of complex traits, and the new culpably wasteful ‘million genomes’ project by which NIH is about to fleece the public and ensure that even fewer researchers get grants because the money’s all been soaked up by DNA sequencing, Big Data induction labs, is that we’ll be able to predict disease precisely, from whole genome sequence, that is, from constitutive genome sequence of hordes of people. We discussed this yesterday, perhaps to excess. Increasing sample size, one might reason, will reduce measurement error and make estimates of causation and risk ‘precise’. That is in general a bogus self-promoting ploy, among other reasons because rare variants and measurement and sample errors or issues may not yield a cooperating signal-to-noise ratio.

So I think that the idea of wholesale, mindless genome sequencing will yield some results but far less than is promised and the main really predictable result, indeed precisely predictable result, is more waste thrown onto mega- labs, to keep them in business.

Anyway, we’re pretty consistent with our skepticism, nay, cynicism about such Big Data fads as mainly grabs in tight times for funding that’s too long- lasting or too big to kill, regardless of whether it’s generating anything really useful.

‹»The Evolution of Lungs« »Parallel de Bruijn Graph Construction and Traversal for de novo Genome Assembly«›