Hacker News new | past | comments | ask | show | jobs | submit login

Nasim Taleb wrote extensively on this.

> P-values are shown to be extremely skewed and volatile, regardless of the sample size n, and vary greatly across repetitions of exactly same protocols under identical stochastic copies of the phenomenon; such volatility makes the minimum p value diverge significantly from the "true" one. Setting the power is shown to offer little remedy unless sample size is increased markedly or the p-value is lowered by at least one order of magnitude.

https://arxiv.org/abs/1603.07532

Video: https://www.youtube.com/watch?v=8qrfSh07rT0




Are you sure this answers my question? Note that I was neither asking why p-hacking is bad nor even why p-values are bad.


Yes, it answers your question.

What you call "p-value" is a sample from the "p-value distribution" of your experiment.

Taleb shows you can sample a p-value of 0.05 when the actual "true" p-value is 0.12.


You are not answering his question, which is about the situation where the null hypothesis is true (what Taleb would call a "true" p-value of 0.5; for some reason he decided to define as the "true" p-value the average of the distribution - but it's worth noticing that there is not such a thing as a "true" p-value).

The only thing that matters to answer his question is the sampling distribution of the p-value when the null hypothesis is true, which is uniform in [0 1].

> Could someone clarify what is meant here?

If the null hypothesis is true every p-value is equally likely, by construction. However, getting a extreme value (under a small threshold) is less likely that getting not-so-extreme value (under a not-so-small threshold). The probability of getting a p-value below 0.05 is 5%, the probability of getting a p-value below 0.25 is 25%, the probability of getting a p-value below 1 is 100%.

If you show me a drug that cured 2 out of 10 ebola patients, when it's known that 10% of patients recover without treatment, I won't be impressed (high p-value). If you show me a drug that cured 9 out of 10 patients you're onto something (low p-value).

> At what point would this author say something is not consistent with the null hypothesis?

The author just doesn't like hypothesis testing. His view is that the null hypothesis is always false and everything is consistent with it.


> The probability of getting a p-value below 0.05 is 5%, the probability of getting a p-value below 0.25 is 25%, the probability of getting a p-value below 1 is 100%.

But getting a p value between 0.82 and 0.87 is also just 5% probability, so could also be seen as a probably-won't-happen-by-chance event. Sure it's clear most of the time which 5% is the significant one,but not always, for example paradoxically the same result can be significant or not, depending on what your intended stopping criterion was, even if you happened to have to stop at the same time using either criterion. This is because the two variants would lead to declaring a different 5% portion of the null's possible outcomes as the significant part.


I guess I have to sleep on this because at a quick glance I can't really make sense of how it answers my question.

I do find it ironic though that this is so difficult to explain that I apparently have to read a paper to understand it... I would've thought the blog post was trying to explain things in simple terms...


Well, if this stuff was easy to understand, we wouldn't be having a crisis because of it, would we?

Probability is highly non-intuitive (kind of like quantum mechanics), so most people (including scientists) don't understand it and just memorize "protocols" and "formulas" and "p-value stuff".

https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1...


The crisis isn't caused by ignorance. It isn't like if people took some courses on stats or just chose another system we'd resolve everything. Replace p values with any other system for distinguishing publishable work and non publishable work and you end up in the same place.

Science is messy but eventually consistent. Everybody should be cautious of single results and instead rely on the community to synthesize those results into predictive models.


How about evaluating a scientist beyond how many "publishable" discoveries he makes? There are so many other roles a scientist can play in the community beyond that.

We've learned not to value developers based on lines of code written, and to value refactoring and elimination of code, scientist can also do the equivalent. Scrutinize, weed out, reinterpret old things, mentor others, etc. Much stuff beyond finding p<0.05.

But such things are rather seen a side quests, secondary purposes, almost like a hobby (e.g. writing a textbook).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: