Geneticists pan paper that claims to predict a person's face from their DNA

Real_S · on Sept 9, 2017

Interesting conflict between these researchers.

From Erlich [0]:

"The take home message should be that identifying someone in a group of ten people requires very little effort. Anyone with access to even low dimensional data, such as basic demographic, can do that. This is not very surprising."

"To summarize this point, the title says: “Identification of individuals by trait prediction using whole- genome sequencing data” but most of the trait predictions is carried by ethnicity of the individual (genomic PCs) rather than the trait specific SNPs."

Summarizing: DNA can estimate ethnicity, which can be used to predict a range of traits, including facial structure. These results would be far more impressive if they were able to predict faces in data composed of a single ethnicity. HLI may have overstated their results, but nevertheless they raise important points about re-id.

[0] http://www.biorxiv.org/content/early/2017/09/07/185330.1

carbocation · on Sept 9, 2017

The fact that identical twins look similar suggests that most of the variation in facial identity is going to be driven by a heritable element. If you could fully execute the program encoded by a person's DNA, I'd expect to see a reproduction of their face.

However, that depends on correctly mapping the program to its output. I'm skeptical that we're at the point where we can model this correctly. I think this is why you see that all of the predicted faces in the Venter paper look like a generic/averaged face.

Ygg2 · on Sept 9, 2017

Yeah, but they also grew in same uterine environment.

mbreese · on Sept 10, 2017

They also probably grow up together, have similar diets, speak the same language... there are lots of correlated environmental factors with twins aside from their DNA.

A better comparison would be separated twins (widely separated). But I'm not sure if anyone has looked at the relative differences between twins raised together vs apart.

Mayzie · on Sept 11, 2017

> A better comparison would be separated twins (widely separated).

The two people in The Parent Trap are twins and were widely separated, yet they still looked identical, so much so that their parents could not even tell them apart.

_q3iz · on Sept 10, 2017

There is a documentary I think on Netflix of twins who find each other randomly on the internet. I think the were Korean and we're adopted by two families. One in California and another in France.

bmsran · on Sept 10, 2017

It is common to compare traits in monozygotic vs dizygotic twins to help control for shared environments.

carbocation · on Sept 9, 2017

True! But this seems to set some limits the timeline of significant environmental influence.

mchahn · on Sept 10, 2017

> The fact that identical twins look similar suggests that most of the variation in facial identity is going to be driven by a heritable element.

I don't see how you get to that conclusion. If a set of twins inherited nothing from their parents they could still be identical. The DNA could be generated randomly and then duplicated.

This is just a logic argument as clearly everyone inherits many things from parents.

aptwebapps · on Sept 10, 2017

You can argue about the effects of different genes but you can't reasonably argue that children do not get their genes from their parents.

daughart · on Sept 11, 2017

Anonymous participants in genome research have been re-identified in the past [0]. In one of these examples, people were re-identified merely using zip code, date of birth and gender. Everyone should assume that they can be identified once they have revealed even a few personal datapoints. The genome contains millions of datapoints.

[0] https://www.forbes.com/sites/adamtanner/2013/04/25/harvard-p...

mrfusion · on Sept 9, 2017

I'd really like to hear from the resident hacker news skeptic/crumudgen on this. Really shouldn't be possible at this level of technology.

For example We have no idea what genes control the length of someone's nose or forehead.

evolve2017 · on Sept 9, 2017

On mobile, so I may not get to reply fully!

We actually do have some idea about necks and I believe noses - the important elements are DNA enhancers. There was a presentation at the Society for Developmental Biology in 2012 on this topic, though I can't recall the scientist who discussed this...

aaron695 · on Sept 10, 2017

Of course it's bunk we are not even close to this level for DNA.

We don't even know if whole races are physically different at certain levels. Yet we can get complex individual data like a face from DNA?

https://medium.com/words-escape-us/are-japanese-intestines-l...

Why would Nature publish a rebuttal? What next, proof flat earthers(aka funny trolls) are wrong.

But when we look closer we see its people claiming privacy is still intact, a much more dangerous push than the original article rubbish.

andy_ppp · on Sept 9, 2017

Could you just throw loads of genomes (about 1.5gb of data) and a photo of peoples faces and train a deep learning model? It would probably do okay, unless of course environmental factors are more important than genetics here. Judging by how different siblings tend to look I’d say genetics isn’t the whole story.

allenz · on Sept 9, 2017

In theory, yes. In reality, we don't have enough data for that to work. A model that takes the whole genome as input is excessively expressive and would overfit, finding spurious correlations everywhere. In the near term, we still need to preprocess the genome to extract lower-dimensional features of interest.

posterboy · on Sept 10, 2017

you could use montecarlo to search for an effective compression of the input data. That's Singular Value Decomposition if I'm not mistaken. Dimensional Reduction is a hot topic in coding theory. Optimaly, understanding of the biologic process involved would certainly help here. DNA is thought to be higly compressed and self modifying, so a smaller encoding is unlikely. Therefore, seperation of the DNA sequences involved in the effect under scrutiny might fail on the possibly highly random inputs without good techniques and heuristics. Effectively, pre-processing could involve recompilation and all the techniques used in software analysis, only then live debugging is to be taken literal.

Reduction to spectra can be used to achieve sparseness, that cover the error margin with probalistic precission including invariants and the mentioned external factors, maybe also data aquisition errors from cost effective (read cheap:) methods.

skosuri · on Sept 9, 2017

Identical twins look pretty similar. Siblings only share half their genome with one another.

userbinator · on Sept 10, 2017

Reminds me of this satirical paper: http://languagelog.ldc.upenn.edu/nll/?p=18315

joshfraser · on Sept 9, 2017

The technology may not be quite there yet, but there's little doubt in my mind that we'll get there within the next decade or two.

dogruck · on Sept 9, 2017

And this is just a relatively uncontroverisial prediction. Wait until they move to SAT scores.