Seems to me numbers 3 and 4 are quite unlikely to develop unless you studied sta...

tomp · on Nov 4, 2015

The main reason Monty Hall is non-intuitive is that the shift in probability is rather small (relatively.

To make it more intuitive, we can simply make the shift in probability absurdly high!

Imagine you have 1000 doors, behind one there is a car and behind others, there are goats. You pick one door. The show host opens all other 998 doors, except yours and another door, revealing only goats. Do you think it's more likely the car is behind your door or behind the other unclosed door?

lmm · on Nov 4, 2015

It still depends on what the show host is doing. Maybe he only opens 998 other doors if you happened to pick the right one in the first place. To a monkey brain that's more likely than him trying to help you get the car.

maxerickson · on Nov 4, 2015

At the time it was first popularized, there was no need to speculate about what the show host was doing. Statements of it often leave out explicit discussion of the broader context of the problem, but saying "Monty Hall" relates it directly back to a game show where the host always revealed a goat.

lmm · on Nov 4, 2015

The report I read was that he didn't always do it in the real gameshow.

maxerickson · on Nov 4, 2015

If he opens a door and shows the car, the game is over with nothing to win (or they get the car?).

If he doesn't open a door, it isn't the Monty Hall problem.

lmm · on Nov 5, 2015

If he had the option of not opening a door and just opening the one the contestant picked immediately, then that makes it a rather different problem from the mathematical one. And AIUI that was how the gameshow worked: sometimes Monty would reveal a goat and offer you the chance to switch doors, and sometimes he wouldn't.

zdkl · on Nov 4, 2015

P('My Door') = P('Any other Door')?

purplelobster · on Nov 4, 2015

The problem is in the phrasing, and you're actually not improving on it. The way you explain the problem, it would still be 50/50. What changes the situation is that the show host CANNOT open the door with the car in it. The way you phrase it, it's not clear that the show host has any information that the participant doesn't have, but he does.

benmmurphy · on Nov 4, 2015

He does kind of improve on it. You do have information about whether the host knows which door the car is behind. If the host does not know which door the car is behind he just did something very unlikely so you should update your model based on this information. Assuming you have reasonable priors about the distribution of whether the host has information about the car then after he did what he did it is very likely he has information about where the car is.

Though actually I don't think you can then use this information to answer the question because it is tainted with the assumption that the car is not behind your door. :(

purplelobster · on Nov 4, 2015

Yes, but that's not what this problem is about. Monty Hall actually knew where the car was and never picked it, that's how the show worked. The problem is that whenever the Monty Hall problem comes up, this is never clear in the phrasing. For someone who have never seen "Let's make a deal", this is not obvious, and thus people are confounded by the problem when really it's not very confounding if you knew the facts.

On a side note, I think what you talked about would be 0.2% chance: product((n-1)/n) n=3 to 1000

I'm not a stats/probability wiz, but I suppose if you need to decide between Monty Hall using his knowledge or not, you'd be fairly certain by this point...

tomp · on Nov 4, 2015

Yes, I guess making that explicit would make it even more obvious.

mistermann · on Nov 4, 2015

This is the very best technique of explaining the Monty Hall problem I've ever read.

jusssi · on Nov 4, 2015

> You find someone with a positive score. What's the chance they have it?

Did you remember to consider the really tricky and counterintuitive conditional probability for this?

P('has the rare illness' | 'is at doctor's appointment with at least partially matching set of symptoms')

Assuming here that the doctor isn't just testing everyone for the rare illness.

ZeroGravitas · on Nov 4, 2015

A lot of these errors are more in the confusing phrasing or unstated assumptions.

In Monty Hall the ambiguity is around whether the host is randomly opening doors or not.

With the doctor example, they'd normally only face that question when they've ordered a test for someone, so the probability is much higher.

The other classic 'unintuitive' result is the prisoners dilemma, because people have emotional bonds to colleagues and if the break them those emotions can lead to revenge and retribution. These have to be ignored in the classic formulation, but recast it as a drug deal or spy exchange and it makes more sense to people.

kevinnk · on Nov 4, 2015

At least the way I've heard it, the doctor problem is given as an answer to the question "Why don't we test everyone for HIV/cancer/horrible disease X?" See for example the discussions people had when the American Cancer Society recommended that women with average cancer risk delay their first mammogram to age 45.

ZeroGravitas · on Nov 4, 2015

It's relevant to that question, but it's also cited regularly as an example of how "even doctors can't do stats".

Which, like most people, they probably can't. But asking someone with lots of experience with a situation, a question about a superficially, but not actually, similar situation adds another level of confusion beyond inability to work out the numbers logically.

ajuc · on Nov 4, 2015

Thinking in probabilities is certainly posssible without formal training, and I'd argue most people do it, just cutting the corners and not actually doing the math.

Example calculation that I (and every other kid) did every day after shool:

- it takes 40 minutes on foot to get to home, buses go every 20 minutes on average, and get you there in 10 minutes, but sometimes they are late and sometimes some bus is broken and you will have to wait even 40 minutes

- it takes 20 minutes on foot to get to the next bus stop, so if you go on foot you may miss the bus if it goes when you're too far from both bus stops

Depending if there are people on the bus stop (so there was no bus recently), and if you see buses going the other way (so you will wait at most 20 minutes) - it makes less or more sense to wait instead of going on foot. But take into account that if you went out not directly after classes - the bus stop may be empty even if there were no buses recently.

It even makes you pay (with time or ticket money) for miscalculations :)

eli_gottlieb · on Nov 4, 2015

There's indeed a paper showing that's roughly what's going on: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.69.2...

>Human perception and memory are often explained as optimal statistical inferences, informed by accurate prior probabilities. In contrast, cognitive judgments are usually viewed as following error-prone heuristics, insensitive to priors. We examined the optimality of human cognition in a more realistic context than typical laboratory studies, asking people to make predictions about the duration or extent of everyday phenomena such as human life spans and the box-office take of movies. Our results suggest that everyday cognitive judgments follow the same optimal statistical principles as perception and memory, and reveal a close correspondence between people’s implicit probabilistic models and the statistics of the world.

paulojreis · on Nov 4, 2015

Maybe my Bayesian stats are rusty, but wouldn't you need to clarify what "accurate" means to get a correct answer here? i.e. tell us if, by accurate, you mean sensitivity or specificity.

EDIT: Oh, never mind. Just noticed you specifically say 99% for the presence. :)

roel_v · on Nov 4, 2015

Could you show how much it is?

dsp1234 · on Nov 4, 2015

Totally not a statistician, but I'll give it a shot.

For the sake of the argument:

  test accuracy: exactly 99.0% accurate
  disease incidence: exactly 1 in 1 million

Calculation:

  For the sake of simple calculations, let's assume we test exactly 1 million people.

  tests positive = (1 * 0.99) + (999999 * 0.01)
  tests positive = (.99) + (9999.99)
  tests positive = 10000.98

We'll round up for the sake of argument to 10,001 positive results. And we know that only 1 person (remember that we're testing 1 million people) is actually sick. We have 1 actual sick in 10 thousand positives tests. So the probability that the positive test that is right in front of you is actually a truly sick person are 1 in 10 thousand.

pif · on Nov 4, 2015

Beware, you are making a strong assumption: that the test's accuracy is the same regarding false negatives and false positives. For example, a test may not find enough "anomaly" in an ill person to trigger the positive, thus yielding a false negative; at the same time, it may as well never find any "anomaly" in a sane person, and as a consequence never give any false positive. Back to your example, it's obvious that a test with a 0.01 probability to give a false positive is completely useless for an illness that affects 1e-6 of people.

dspeyer · on Nov 4, 2015

Actual descriptions of medical tests routinely give both rates. They often call them "sensitivity" and "specificity". Good luck remembering which is which.

But if only one rate is given, that indicates they're equal. If they're not, then it's reasonable to describe the documentation as incorrect.

aidenn0 · on Nov 4, 2015

I have no trouble remembering which is which, as "sensitivity" is used in a way quite similar to its everyday use.

zdkl · on Nov 4, 2015

Spot the domain-specific knowledge :)

While true in the real world this wasn't part of the problem as written above!

akrolsmir · on Nov 4, 2015

About 0.01%, or 1 in 10,000.

Consider a population of 100M people, of which 100 would have the illness. Of them, 99% = 99 would test positive and 1% = 1 would test negative. For the other 99,999,900 healthy people, 99% = 98,999,901 would test negative and 1% = 999,999 would test positive.

In total, 99 + 999,999 people would test positive. Given that a person tests positive, then, there is only a 99 / (99 + 999,999) ~= 0.01% chance that person has the illness.

pif · on Nov 4, 2015

Your answer is as problematic as dsp1234's one.

akrolsmir · on Nov 4, 2015

For sure, this assumes that false positive rate = false negative rate = 1%, but it suffices to illustrate how a highly accurate test can produce misleading results.

A solution would be repeated retesting, as the 1st, 2nd, 3rd, and 4th consecutive positive test results would lead to 0.01%, 1%, 50%, and 99% chances. (Each additional positive test reduces the false positives by 100-fold, whereas the ill patients are very likely to get continually positive results.)

adrianN · on Nov 4, 2015

Only if your tests are all independent. Doing the same test twice probably doesn't buy you anything.