Here is statistical hypothesis testing 101 in a nutshell: Say, you have a kitty ...

Here is statistical hypothesis testing 101 in a nutshell:

Say, you have a kitty cat and your vet does a blood count, say, whatever that is, and gets a number.

Now you want to know if your cat is sick or healthy.

Okay. From a lot of data on what appear to be healthy cats, we know what the probability distribution is for the blood count number.

So, we make a hypothesis that our cat is healthy. So, with this hypothesis, presto, bingo, we know the distribution of the number we got. We call this the null hypothesis because we are assuming that the situation is null, that is, nothing wrong, that is, that our cat is healthy.

Now, suppose our number falls way out in a tail of that distribution.

So, we say, either (A) our cat is healthy and we have observed something rare or (B) the rare is too rare for us to believe, and we reject the null hypothesis and conclude that our cat is sick.

Historically that worked great for testing a roulette wheel that was crooked.

So, as many before you, if you think about that little procedure too long, then you start to have questions! A lot of good math people don't believe statistical hypothesis testing; typically if it is their father, mother, wife, cat, son, or daughter, they DO start to believe!

Issues:

(1) Which tail of the distribution, the left or the right? Maybe in some context with some more information, we will know. E.g., for blood pressure for the elderly, we consider the upper tail, that is, blood pressure too high. For a sick patient, maybe we consider blood pressure too low unless they are sick from, say, cocaine in which case we may consider too high. So, which tail is not in the little two set dance I gave. Hmm, purists may be offended, often the case in statistics looked at too carefully! But, again, if it's your dear, total angel of a perfect daughter, then ...!

(2) If we have data on healthy kitty cats, what about also sick ones? Could we use that data? Yes, and we should. But in some real situations all we have a shot at getting is the data on the healthy -- e.g., maybe we have oceans of data on the healthy case (e.g., a high end server farm) but darned little data on the sick cases, e.g., the next really obscure virus attack.

(3) Why the tails at all? Why not just any area of low probability? Hmm .... Partly because we worship at the alter of central tendency?

Another reason is a bit heuristic: By going for the tails, for any selected false alarm rate, we maximize the area of our detection rate.

Okay, then we could generalize that to multidimensional data, e.g., as might get from several variables from a kitty cat, dear, angel perfect daughter, or a big server farm. That is, the distribution of the data in the healthy case looks like the Catskill Mountains. Then we pour in water to create lakes (assume they all seek the same level). The false alarm rate is the probability of the ground area under the lakes. A detection is a point in a lake. For a lower false alarm rate, we drain out some of the water. We maximize the geographical area for the false alarm rate we are willing to tolerate.

Well, I cheated -- that same nutshell also covers some of semester 102.

For more, the big name is E. Lehmann, long at Berkeley.

Go for it!