Statistics are meaningless without a rigorously examined causal model of the phenomenon under investigation. In my experience of statistics education, the art of crafting causal theories was scarcely addressed.
This is very true. Systematic bias in observational data collection (astronomy etc.) as well as systematic bias in experimental data collection (particle physics etc.) isn't accounted for in statistical analysis of that data.
A classic example I recall is a Feynmann story, where a group of researchers were getting very statistically sound and repeatable results of very unusual and unexplained particle track behavior in a cloud chamber. Feynmann looked at the data and said "you probably have a tiny piece of metal in the cloud chamber somewhere" and that turned out to be the explanation.
Similar examples in the social sciences include systematic bias in the preparation and administration of IQ tests to different groups of people (see Charles Murray's 'Bell Curve' vs. Stephen J. Gould's 'The Mismeasure of Man').
Hundreds of other examples can be found across all scientific disciplines, unfortunately. To quote the smartest PI I ever worked for "There's a lot of BS in statistical analysis".
I don't get this attitude, if things are correlated, it should at least make a scientist wonder why. It certainly could be random chance, but correlation can also lead to establishing a causal model or discovering a third variable. If two things keep happening in conjunction, it at least merits further investigation.
It seems like there's this extreme reaction against people behaving like correlation equals causation, but instead of over-emphasizing correlation, it gets dismissed entirely.
Yes, the classic example being the connection of lung cancer to smoking. Initially it was just a correlation, but the correlation encouraged scientists to see if there was actually a causal relationship (which of course there was). Yes, many (probably most) correlations are spurious, but the existence of a correlation is very useful for scientists looking to find a hypothesis to test.
I share this criticism. It's almost like you have to scream at people that although correlation doesn't imply _direct_ causation, it most certainly does imply some causal chain.
Ruling out chance through significance and power, what phenomenon in the world is correlated without being causally related _somehow_?
Take the worst intersection in the country this week for traffic accidents. Close the roads to the intersection at 3am and organise a contemporary expressive dance performance to rid the intersection of its evil and re-open it 30 minutes later.
Repeat every week with the worst traffic intersection for accidents.
What you find is. IT WORKS! HURRAH! These intersections are more than usually not the worst the following week! The evidence is clear. The correlation is utterly compelling. It is significant. It has power. How could it possibly be unrelated?
Now if we stop it being comically silly in our example and make it a red light & speed camera, see how the issue is much more difficult. There is a clear line of potential causation of fixing dangerous intersections. But is it really better than folk dancing for 20 minutes at 2am? [1]
[1] This example should not be interpreted as being an opposition to all red light and speed cameras.
There is a scale of certainty with causality. Evidently we aren't always good at expressing the difference between light evidence of causality and heavy evidence of causality. Do we have the right words? Are we using them?
So in other words, several of the commenters here are right. On the one hand we shouldn't jump to conclusions, but on the other hand we should listen to the clues.
Consider that we can’t even truly conclusively identify true causation using scientific experiment alone. Correlations and conjectures is ultimately all we have.
Nevertheless, the fallacy is so commonplace. You will easily find a seemingly educated person selectively balking at the notion that causal relationship is ultimately a conjecture or at the notion that causal relationship is possible, depending on their pre-existing beliefs.
Being emotionally attached to purported causal relationship X->Y, they will count all correlational evidence in favor; when pointed out that the evidence is correlational, they will wave it off with more correlational evidence.
If that causal relationship does not happen to align with their world view, of course, they will be right onto you with the old correlation-does-not-imply-causation mantra.
You should still be curious as to why those correlations exist. It's still important even when the reason is p-hacking, since then you can identify dubious statistics.
I don't remember who said that (pretty sure that it was R Monroe in xkcd, as usual): correlation is not causation but the data is heavily winking at you (it was more funny in his words)
Of course it was xkcd: https://xkcd.com/552/. The exact text is: "Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'."
The average citizen doesn't need to craft casual theories, they need to be able to look at them critically. Yes, an introductory statistics course doesn't cover everything, but it's absolutely the 20% effort that gets 80% of the results. Most of the worst misinformation I've seen lately surrounding covid, vaccines, etc, would be solved if everyone had a basic understanding of introductory stats as currently taught.
If we can get there, then we can talk about what we can do to improve things from there.
> The average citizen doesn't need to craft casual theories, they need to be able to look at them critically
Yes, I agree, and this is precisely my point. My experience with introductory stats was a heavy focus on the technical details, when in fact what would be more effective is focusing on statistical logic.
I’m specifically thinking that something like Judea Pearl’s The Book of Why would be good to introduce early in stats education.
I think the biggest thing is whether people have a desire to dig deeper or not. I seems there are many people who just want to believe what they are told as long as it matches their beliefs.
I think you underestimate the fact that most of the people we are talking about are more accustomed to "magical thinking". Any stats education that didn't align with their core beliefs would be dismissed as wrong. Even if you could proof out everything and they had the core intellectual horsepower to understand there is a level of belief in belief that underlies their worldview that you can't overcome.
I think I agree. Instead of stats I would propose critical thinking. Critical thinking was one of the more engaging classes for me. It covered a vast array of fallacies and how they can be exploited in real world. I am not sure if it is a common requirement now, but it certainly should be.
> Most of the worst misinformation I've seen lately surrounding covid, vaccines, etc, would be solved if everyone had a basic understanding of introductory stats as currently taught.
I don't know, most of the misinformation on the matter I've seen is just flat out wrong. Not misinterpreting statistics incorrect, just flat out lying, using false numbers or statements, etc. Having a cursory knowledge of statistics won't help you if you're incapable of Googling to check if a statistic is true or not.
"Most of the worst misinformation I've seen lately surrounding covid, vaccines, etc, would be solved if everyone had a basic understanding of introductory stats as currently taught."
I think a lot of it is more ideological at the "average" citizen level - on both sides. Most people aren't looking at the data, they're believing whatever they're told that aligns with their personal beliefs.
Can you point to a good source on that? The stats books that I've seen seem to treat it as a collection of tools, and causality in computer science (AI) seems like a separate subfield with the do operator and all that.
I guess you may be familiar with Judea Pearl’s work already, but he did write a popular treatment of the subject in The Book of Why. I’m not trying to put computer scientists on a pedestal, but there is something about the uncompromising rigor that comes with putting abstractions inside brainless machines.
Thanks for the tip, I'd read some material of his from the 1980s that helped me to start understanding bayesian networks, it inspired me to revisit my university statistics course which was fruitful. I am a sucker for pop sci!