Look at the various analyses in the threads. Mine was a frequentist analysis based on independent categorizations, which has a p-val of ~.0002. Others have posted more sophisticated frequentist and Bayesian analyses based on priors and the subject having advance knowledge of the number of ADs present.
But no matter what assumptions were made, no p-val was greater than .001, which is quite low for n=12 with a single test. Our generally accepted threshold is p<.05. She literally had a perfect score.
Also, saying "An actual test would need to allow any possible sample including those that had zero Parkinson's patients" indicates you don't understand experimental design. Splitting the data into equal groups maximizes your chances of detecting something when effect sizes are small, since sensitivity is related to minimum group size. (P-values are hurt more by low sample sizes than they gain by large ones, which is why a 3/9 split is less powerful than a 6/6 split.)
Not only did she have a perfect score, but she "adamantly" corrected a mistake in the control group. That must have some impact on the Bayesian estimate too. Her correction is worth something, and her confidence in providing that correction is worth something.
But no matter what assumptions were made, no p-val was greater than .001, which is quite low for n=12 with a single test. Our generally accepted threshold is p<.05. She literally had a perfect score.
Also, saying "An actual test would need to allow any possible sample including those that had zero Parkinson's patients" indicates you don't understand experimental design. Splitting the data into equal groups maximizes your chances of detecting something when effect sizes are small, since sensitivity is related to minimum group size. (P-values are hurt more by low sample sizes than they gain by large ones, which is why a 3/9 split is less powerful than a 6/6 split.)