An interesting article on genetic privacy, or lack thereof, in New York Times -
The genetic data posted online seemed perfectly anonymous strings of billions of DNA letters from more than 1,000 people. But all it took was some clever sleuthing on the Web for a genetics researcher to identify five people he randomly selected from the study group. Not only that, he found their entire families, even though the relatives had no part in the study identifying nearly 50 people.
The researcher did not reveal the names of the people he found, but the exercise, published Thursday in the journal Science, illustrates the difficulty of protecting the privacy of volunteers involved in medical research when the genetic information they provide needs to be public so scientists can use it.
Abstract of the Science article:
Melissa Gymrek, Amy L. McGuire, David Golan, Eran Halperin, Yaniv Erlich
Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back with high probability the identities of multiple participants in public sequencing projects.
Would someone repeat the study on dogs to see whether they can find anything new about Craig Venter’s dog? :)