> I have two children. At least one of them is a boy born on Tuesday. What is the probability that both children are boys?
An interesting thing about this problem is the unspoken assumption of what happens in other counterfactual worlds. If the person always answers the question "is one of your kids a boy born on Tuesday?" then the problem is solvable. But if a different family history would've caused the person to answer a different question ("born on a Monday" instead of Tuesday), then the answer would depend on the person's algorithm. Eliezer gave a dramatized explanation here: https://www.lesswrong.com/posts/Ti3Z7eZtud32LhGZT/my-bayesia...
Further on this path, there are seemingly basic questions that cause disagreement among actual statisticians. For example, see the voltmeter story in https://en.wikipedia.org/wiki/Likelihood_principle:
> An engineer draws a random sample of electron tubes and measures their voltages. The measurements range from 75 to 99 Volts. A statistician computes the sample mean and a confidence interval for the true mean. Later the statistician discovers that the voltmeter reads only as far as 100 Volts, so technically, the population appears to be "censored". If the statistician is orthodox this necessitates a new analysis. However, the engineer says he has another meter reading to 1000 Volts, which he would have used if any voltage had been over 100. This is a relief to the statistician, because it means the population was effectively uncensored after all. But later, the statistician ascertains that the second meter was not working at the time of the measurements. The engineer informs the statistician that he would not have held up the original measurements until the second meter was fixed, and the statistician informs him that new measurements are required. The engineer is astounded: "Next you'll be asking about my oscilloscope!"
Maybe I'm missing something about the voltmeter example. My assumption is that the 100-volt-maximum meter can distinguish between 100 volts and more than 100 volts, in which case there's no problem. If the voltmeter doesn't accurately indicate whether or not a measurement is outside of its range then the statistician is correct that everything should be re-measured.
Do some people think that the possibility of not being able to take accurate measurements is the same as not having taken accurate measurements?
EDIT: Maybe the ambiguity is in what the engineer would have recorded if finding a voltage >100 volts while the other meter was broken? It's like undefined behavior in programming; if you know your software will have undefined behavior when encountering certain data then you can't trust whether the output is valid unless there's independent confirmation that the data won't cause undefined behavior. If the statistician doesn't have certainty that the engineer will have defined behavior (e.g. say "I couldn't complete the measurements" vs. undefined behavior like writing down "99" or exploding) then they of course want to re-measure.
I have a box with 1000 tubes, exactly 1% of which are beyond my capacity to measure.
I take a random sample of 20, and call that a roughly 80% chance of being able too measure all of them.
Let's say the average is 50V and the ones beyond my measurement average 150V. We can use this info to solve for the average of the tubes we can measure: 25V.
So, in 80% of the cases we'll get an average measurement of 25V.
In 20% of the cases we're going to realize our mistake, buy a better voltmeter, and redo everything. In this case we can measure everything so we get the correct number: 50V.
25V * 0.8 + 50 * 0.2 = 30V
This, of course, won't be statistically significant. But let's get a lot of researchers together to measure a lot of tubes. And let's say 100V limit meters are common so half make the same mistake.
Now we have a statistically significant 40V. Wrong, wrong wrong.
> In the correct version of this story, the mathematician says "I have two children", and you ask, "Is at least one a boy?", and she answers "Yes". Then the probability is 1/3 that they are both boys.
I don’t understand this reasoning. If at least one is a boy, the only configurations I can think of is 1 boy 1 girl or 2 boys. Where does the 1/3 come from?
With 2 children, there are 4 configurations of equal probability. The one with 1 boy 1 girl occurs twice. Take away the 2 girl case, then 2 boys is 1 in 3.
Yeah, the way the problem is formulated though there’s absolutely no indication that order matters so how are there two configurations within which there’s 1 boy and 1 girl?
Order doesn't matter in the sense that the observed data set is unordered (just counts of girls and boys). What matters is how many ways there are that the universe can give rise to those unordered data sets. And in fact, there are more ways that the universe can give rise to the unordered state 1 boy 1 girl, than to the unordered state 2 boys. For similar reasons , there are more ways in which your papers can be in a mess across your desk than ways in which your papers can be neatly piled up.
And to count how many ways the universe can give rise to the unordered data sets, the usual technique is to expand the unordered data sets into all the equivalent ordered data sets, and count the latter.
Another way to think about it is counting the probability of getting k boys out of 2 children.
0 boys - 1/4
1 boy - 1/2
2 boys - 1/4
There's a half chance of getting exactly one boy, and one way to calculate this is by noticing there are two different ways to get one boy if we take order in account. You are right that the orderings don't matter in this case, so we could also e.g. model this with a binomial distribution. Once you know there are >= 1 boys, the chance you have two is 0.25/(0.25+0.5) = 1/3.
Because the order exists even if it doesn't matter (at least for two children, maybe not for two quantum particles).
With the risk of being accused of binarism, there are four distinct possibilities with (close to) equal a priory probability of 25%: older boy/younger boy, older boy/younger girl, older girl/younger boy, and older girl/younger girl.
Discarding the girl/girl case leaves three equally probable cases.
I immediately modelled the problem like you did, then I thought of this interesting variation:
"I have two children, Michael and Alex. Michael is a boy. What's the probability of both being boys?"
If you make a truth table with names as columns, you clearly have only two possibilities for Michael=1.
However if you pick older/younger again you're back to 3 possible states.
I think the answer is still 1/3, but it's a trickier one to reason about immediately.
It seems the question adds information by naming the children, but there's a hidden statement in the form "at least one of them is Michael", which invalidates a truth table with names as columns.
I can only conclude that birth order is an underlying property of the entity. A strict, real differentiator as much as sex is. Names aren't, so names don't add information in this case.
In the original problem you start by assuming 4 equally-probable cases Bb Bg Gb Gg [1] and you ask a question. Depending on the answer you are in the Bb/Bg/Gb or the Gg subsets.
In your variant you need additional assumptions. Will the person always tell you the sex and name of the eldest? Or the names of the boys?
“Michael is a boy” is not really different from “the youngest is a boy”. The probability of both being boys depends on why are you being told that.
[1] Depending in the context the assumption may not be appropriate (a extreme example may be China).
> I have two children. At least one of them is a boy born on Tuesday. What is the probability that both children are boys?
An interesting thing about this problem is the unspoken assumption of what happens in other counterfactual worlds. If the person always answers the question "is one of your kids a boy born on Tuesday?" then the problem is solvable. But if a different family history would've caused the person to answer a different question ("born on a Monday" instead of Tuesday), then the answer would depend on the person's algorithm. Eliezer gave a dramatized explanation here: https://www.lesswrong.com/posts/Ti3Z7eZtud32LhGZT/my-bayesia...
Further on this path, there are seemingly basic questions that cause disagreement among actual statisticians. For example, see the voltmeter story in https://en.wikipedia.org/wiki/Likelihood_principle:
> An engineer draws a random sample of electron tubes and measures their voltages. The measurements range from 75 to 99 Volts. A statistician computes the sample mean and a confidence interval for the true mean. Later the statistician discovers that the voltmeter reads only as far as 100 Volts, so technically, the population appears to be "censored". If the statistician is orthodox this necessitates a new analysis. However, the engineer says he has another meter reading to 1000 Volts, which he would have used if any voltage had been over 100. This is a relief to the statistician, because it means the population was effectively uncensored after all. But later, the statistician ascertains that the second meter was not working at the time of the measurements. The engineer informs the statistician that he would not have held up the original measurements until the second meter was fixed, and the statistician informs him that new measurements are required. The engineer is astounded: "Next you'll be asking about my oscilloscope!"