Hacker News new | past | comments | ask | show | jobs | submit login

I’m curious how they assign percentage to geographic location. For me it fails the sniff tests. Like how far back do you pick your line of ancestry (which has migrated all over the world) when you say stop and determine, this is the place 20% of my bloodline is from?

I imagine they do some sort of cluster analysis to find correlation along with self-identification. If so, then this is undeniably junk science based on junk-in junk-out statistical models (which is often the case with cluster-analysis).

I’m simply curious here, cluster analysis is the only method I can think of (other than guessing/categorizing arbitrarily).




> I imagine they do some sort of cluster analysis to find correlation along with self-identification.

I don't think it's "some" -- I think that's 100% of it.


I'd think bones from a couple hundred or thousand or ten-thousand years ago would provide a okay trail.


I don’t think the ancestry tests compare the DNA with archeological finds. If they did I wouldn’t trust it given the relatively small sample size of archeological finds with intact DNA.


The usual tests certainly don’t. But some scientific studies do: https://www.nature.com/articles/ncomms15694


I thought those were done to investigate migratory patterns, movements, and inter-connectivity of historic populations, not to establish ancestral lineage of living population, and certainly not to assign a geographical region to existing ethnic groups.


Yeah I think it's clustering and dimensionality reduction. E.g. if French people form a rough cluster in DNA space and Japanese people form another cluster, someone with a French mom and Japanese dad would show up as a data point roughly halfway between those two clusters.


I went on a little Wikipedia expedition hoping to find the methodology (and failed), but I did find an interesting (though not surprising) quote from one scientist (Adam Rutherford): “[These tests] don’t necessarily show your geographical origins in the past. They show with whom you have common ancestry today.”

So when our thread’s ancestor (pun unfortunate) says “45% Levantine” they mean they 45% of people alive today that are from that region (which includes many European immigrants). I bet this gets very messy given immigration patterns. Like which immigrants count as ancestry sample, and which don’t? For this problem I would personally use cluster analyses, however I would probably simply give up, knowing that cluster analysis would give me junk (and ultimately arbitrary) results with such noisy (and potentially skewed) data.

EDIT: The answer was right there next to this quote, in one of the aside picture for the same article[1]. They use Principal Component Analysis. Which IMO is even more fraught than cluster analysis, as you have even more control over what you want to get out of the model. If I remember correctly PCA—along with factor analyses—is used heavily in personality psychology and intelligence testing, the latter of which is very famous for radicalized pseudo science.

1: https://en.wikipedia.org/wiki/Genealogical_DNA_test#/media/F...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: