Seems to me numbers 3 and 4 are quite unlikely to develop unless you studied statistics. Something like the Monty Hall problem is unobvious to most people I've met. Heck even phds discuss it. Conditional probability is really tricky and counterintuitive.
The other example is this old error, which is attributed to doctors (for some reason): You have a test which is 99 % accurate (will show the correct status) for presence of some illness. 1/1M people have the actual illness. You find someone with a positive score. What's the chance they have it? Well actually not as high as you think.
The main reason Monty Hall is non-intuitive is that the shift in probability is rather small (relatively.
To make it more intuitive, we can simply make the shift in probability absurdly high!
Imagine you have 1000 doors, behind one there is a car and behind others, there are goats. You pick one door. The show host opens all other 998 doors, except yours and another door, revealing only goats. Do you think it's more likely the car is behind your door or behind the other unclosed door?
It still depends on what the show host is doing. Maybe he only opens 998 other doors if you happened to pick the right one in the first place. To a monkey brain that's more likely than him trying to help you get the car.
At the time it was first popularized, there was no need to speculate about what the show host was doing. Statements of it often leave out explicit discussion of the broader context of the problem, but saying "Monty Hall" relates it directly back to a game show where the host always revealed a goat.
If he had the option of not opening a door and just opening the one the contestant picked immediately, then that makes it a rather different problem from the mathematical one. And AIUI that was how the gameshow worked: sometimes Monty would reveal a goat and offer you the chance to switch doors, and sometimes he wouldn't.
The problem is in the phrasing, and you're actually not improving on it. The way you explain the problem, it would still be 50/50. What changes the situation is that the show host CANNOT open the door with the car in it. The way you phrase it, it's not clear that the show host has any information that the participant doesn't have, but he does.
He does kind of improve on it. You do have information about whether the host knows which door the car is behind. If the host does not know which door the car is behind he just did something very unlikely so you should update your model based on this information. Assuming you have reasonable priors about the distribution of whether the host has information about the car then after he did what he did it is very likely he has information about where the car is.
Though actually I don't think you can then use this information to answer the question because it is tainted with the assumption that the car is not behind your door. :(
Yes, but that's not what this problem is about. Monty Hall actually knew where the car was and never picked it, that's how the show worked. The problem is that whenever the Monty Hall problem comes up, this is never clear in the phrasing. For someone who have never seen "Let's make a deal", this is not obvious, and thus people are confounded by the problem when really it's not very confounding if you knew the facts.
On a side note, I think what you talked about would be 0.2% chance: product((n-1)/n) n=3 to 1000
I'm not a stats/probability wiz, but I suppose if you need to decide between Monty Hall using his knowledge or not, you'd be fairly certain by this point...
A lot of these errors are more in the confusing phrasing or unstated assumptions.
In Monty Hall the ambiguity is around whether the host is randomly opening doors or not.
With the doctor example, they'd normally only face that question when they've ordered a test for someone, so the probability is much higher.
The other classic 'unintuitive' result is the prisoners dilemma, because people have emotional bonds to colleagues and if the break them those emotions can lead to revenge and retribution. These have to be ignored in the classic formulation, but recast it as a drug deal or spy exchange and it makes more sense to people.
At least the way I've heard it, the doctor problem is given as an answer to the question "Why don't we test everyone for HIV/cancer/horrible disease X?" See for example the discussions people had when the American Cancer Society recommended that women with average cancer risk delay their first mammogram to age 45.
It's relevant to that question, but it's also cited regularly as an example of how "even doctors can't do stats".
Which, like most people, they probably can't. But asking someone with lots of experience with a situation, a question about a superficially, but not actually, similar situation adds another level of confusion beyond inability to work out the numbers logically.
Thinking in probabilities is certainly posssible without formal training, and I'd argue most people do it, just cutting the corners and not actually doing the math.
Example calculation that I (and every other kid) did every day after shool:
- it takes 40 minutes on foot to get to home, buses go every 20 minutes on average, and get you there in 10 minutes, but sometimes they are late and sometimes some bus is broken and you will have to wait even 40 minutes
- it takes 20 minutes on foot to get to the next bus stop, so if you go on foot you may miss the bus if it goes when you're too far from both bus stops
Depending if there are people on the bus stop (so there was no bus recently), and if you see buses going the other way (so you will wait at most 20 minutes) - it makes less or more sense to wait instead of going on foot. But take into account that if you went out not directly after classes - the bus stop may be empty even if there were no buses recently.
It even makes you pay (with time or ticket money) for miscalculations :)
>Human perception and memory are often explained as optimal statistical inferences, informed by accurate prior probabilities. In contrast, cognitive judgments are usually
viewed as following error-prone heuristics, insensitive to priors. We examined the optimality of human cognition in a more realistic context than typical laboratory studies, asking people to make predictions about the duration or extent of everyday phenomena such as human life spans and the box-office take of movies. Our results suggest that
everyday cognitive judgments follow the same optimal statistical principles as perception and memory, and reveal a close correspondence between people’s implicit probabilistic models and the statistics of the world.
Maybe my Bayesian stats are rusty, but wouldn't you need to clarify what "accurate" means to get a correct answer here? i.e. tell us if, by accurate, you mean sensitivity or specificity.
EDIT: Oh, never mind. Just noticed you specifically say 99% for the presence. :)
Totally not a statistician, but I'll give it a shot.
For the sake of the argument:
test accuracy: exactly 99.0% accurate
disease incidence: exactly 1 in 1 million
Calculation:
For the sake of simple calculations, let's assume we test exactly 1 million people.
tests positive = (1 * 0.99) + (999999 * 0.01)
tests positive = (.99) + (9999.99)
tests positive = 10000.98
We'll round up for the sake of argument to 10,001 positive results. And we know that only 1 person (remember that we're testing 1 million people) is actually sick. We have 1 actual sick in 10 thousand positives tests. So the probability that the positive test that is right in front of you is actually a truly sick person are 1 in 10 thousand.
Beware, you are making a strong assumption: that the test's accuracy is the same regarding false negatives and false positives. For example, a test may not find enough "anomaly" in an ill person to trigger the positive, thus yielding a false negative; at the same time, it may as well never find any "anomaly" in a sane person, and as a consequence never give any false positive.
Back to your example, it's obvious that a test with a 0.01 probability to give a false positive is completely useless for an illness that affects 1e-6 of people.
Actual descriptions of medical tests routinely give both rates. They often call them "sensitivity" and "specificity". Good luck remembering which is which.
But if only one rate is given, that indicates they're equal. If they're not, then it's reasonable to describe the documentation as incorrect.
Consider a population of 100M people, of which 100 would have the illness. Of them, 99% = 99 would test positive and 1% = 1 would test negative. For the other 99,999,900 healthy people, 99% = 98,999,901 would test negative and 1% = 999,999 would test positive.
In total, 99 + 999,999 people would test positive. Given that a person tests positive, then, there is only a 99 / (99 + 999,999) ~= 0.01% chance that person has the illness.
For sure, this assumes that false positive rate = false negative rate = 1%, but it suffices to illustrate how a highly accurate test can produce misleading results.
A solution would be repeated retesting, as the 1st, 2nd, 3rd, and 4th consecutive positive test results would lead to 0.01%, 1%, 50%, and 99% chances. (Each additional positive test reduces the false positives by 100-fold, whereas the ill patients are very likely to get continually positive results.)
The other example is this old error, which is attributed to doctors (for some reason): You have a test which is 99 % accurate (will show the correct status) for presence of some illness. 1/1M people have the actual illness. You find someone with a positive score. What's the chance they have it? Well actually not as high as you think.