Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Another Spurious Correlations [0]. The message will always bear repeating.

[0] https://tylervigen.com/spurious-correlations



No, most of those are "actually spurious" in the sense of the two time series having nothing to do with each other. A lot of them are pretty short series as well, so it's easy for them to be correlated.

What we are looking at is Simpson's Paradox, where the true causal relationship is obscured by information that isn't obvious from the plot.

Now before you correlation != causation, there is actually a causation here that you can access with statistics.


Simpson’s paradox is when a trend or correlation is observable in each of the sub-populations, but vanishes when the data is aggregated. For example, a drug that has a strong effect on men and women when analyzed separately, but shows little effect at the population level.

This example is not Simpson’s paradox, it is simply the misuse of statistics. Statistics, being mechanical transformations of data, only have semantics within a causal model. Simply picking variables randomly and then assuming causality when the statistics behave that way is inverting the process of knowledge formation.

EDIT: Thanks for the corrections—the real data that this fictional example is based on does show Simpson’s paradox, as the dependent variable (death rates) appears to show a positive correlation with vaccination status when aggregating the population, but a negative correlation for every age group individually.


> This example is not Simpson’s paradox, it is simply the misuse of statistics.

If you read to the end of the post, you'll see that the author was using this correlation to prove that the mistake is identical to another claim related to COVID [1]. This COVID-related correlation doesn't seem as spurious as the Ghostbusters one, but that's because it's much harder to spot errors like this when variables aren't so "random".

[1]: "Vaccinated English adults under 60 are dying at twice the rate of unvaccinated people the same age"


Isn't that what's happening here? There's a guy who claims vaccination has the opposite effect to what people normally claim, that they actually cause people to die. And it seems to be because people from age 10 to 59 have been aggregated. The sub-populations, for instance ages 10-19, 20-29, etc would not be showing that vaccinated individuals are dying more often. It's only by aggregating them you get the wrong conclusion.


You’re right, I did read to the end of the post to see that but for some reason I fixated on the “contrived” example and the fallacy that it entailed.


It's still interesting to note that we assume "nothing to do with each other" with some cases, "obscured" relationships on other like the OP's second point.

Where it falls relies on the viewer's knowledge of the problem space, which can also be limited enough to lure them into false causations.

My point would be that a single graph showing two trends without any further info should never be taken as more information than "there is two trends". You'll still be free to decide there is true causality based on other information you believe.


For more on Simpson’s Paradox, see https://www.forrestthewoods.com/blog/my_favorite_paradox/, which I happened to reread this week, and which essentially serves as a spoiler for this article.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: