"Not significant" means that the probability is >=5% their result was obtained by chance.
We've settled as a community on a convention that we don't claim an effect is real until it is supported by data ("statistically significant") ie. <5% likely to be explained by chance in your results.
"Significant" does not mean big or important in this context. It means better than 5% unlikely to be (un)lucky data.
The threshold for significance lies in the eye of the beholder. A particle physicist might not be satisfied with anything over 0.01 %. A social scientist might be happy to see 10 %.
The 5 % number you mention is completely arbitrary and often woefully inappropriate.
Look at it from a betting perspective. Can you earn more than 10 × your investment if the null hypothesis is false? Then anything less likely than 10 % is significant.
It's a convention for scientific reporting. Your trades are not bound by this convention.
The parameter value is not arbitrary. It's a convention arrived at after hundreds of years. If it were arbitrary, p=0.999 or p=0.00001 would be just as good. We've settled on p=0.05 being usefully convincing but not crazy demanding to obtain by experiment with noisy measurements.
Null hypothesis testing was invented less than 100 years ago by Fisher, who completely arbitrarily picked 0.05 [0]. That value was not arrived at through wisdom of experience, and certainly not after hundreds of years of practice.
Though it has now indeed become conventional to test with p=0.05, there is nothing wrong with reporting an effect that fails the null hypothesis test. At least that is the position of the American Statistical Association [1].
Thanks for these refs. I read [1] carefully and I take your point that it’s ok to strictly report whatever the data says.
On the value itself, we are quibbling about the meaning of ‘arbitrary’: Fisher certainly could have chosen another value, but not all values would be considered useful. Some expertise about the nature of real world data and the minds of statisticians is encoded in the chosen value.
If I propose that we change the convention to use 1e-12 instead and you think ‘that’s too small, I prefer it the way it is’, then it’s not arbitrary in the sense I mean.
The thing you seem to be missing is that there's no one number that's a meaningful limit for all purposes.
What probability you accept as significant should depend entirely on how you plan to use the results. Something with a p value of a staggering 70 % (i.e. it's more likely not true than true) is significant if the payoff is good when it's true, and the cost is small when it's not true.
And 70 % is very far from 5 %!
Then again, if the payoff is tiny compared to the cost, you might ask for a p-value of less than 0.01 %, in order for it to make sense to take the chance on it.
Think like a poker player: a hand that has 1/4 chance of winning needs better than a 3-to-1 payout when it wins to be playable. Conversely, when the pot offers you a 3-to-1 payout, you better make sure your hand has more than a 1/4 chance of winning.
They didn't claim it was real did they? Just read out the result which was lower but not in a significant way. I've read hundreds of papers do the same.
By convention, this means "indistinguishable from", so reporting that it is lower is an unsupported claim. They would be equally justified in reporting that it was higher, ie. not at all.
It was lower though, just not significantly so (depending on your threshold for significance) - that's the standard way of reporting it. You can't just chunk part of the sentence and take issue with it, the sentence in it's entirety is accurate.
But if "result [was] lower but not in a significant way" means "result was not proven to be lower", how does saying "lower but not really lower" make ever any sense? It seems to me that such nonsensical formulation ought not to be ever used by anyone.
Because significance thresholds can vary pretty dramatically. Plenty of experiments done in physics for instance have reported results even though they didn't yet reach a 5 sigma threshold (3x10e-7). In physics something can be highly highly likely but still not 'significant' enough to warrant a discovery. They simply couch it as, hey this was the result and even though it isn't 'significant' the high likelihood may warrant additional research here. Reporting a binary significant/not significant is far less useful.
> "Not significant" means that the probability is >=5% their result was obtained by chance.
Ackchually... p-value represents the probability that results like these would be observed even if there was no difference between the two choices, simply due to chance.
We've settled as a community on a convention that we don't claim an effect is real until it is supported by data ("statistically significant") ie. <5% likely to be explained by chance in your results.
"Significant" does not mean big or important in this context. It means better than 5% unlikely to be (un)lucky data.