Statistically speaking, 'a significant negative impact' doesn't mean the same as...

rbanffy · on June 8, 2012

The part quoted said it clearly

In the first case, the use of a statically typed programming language had a significant negative impact

Considering the size of the sample, I'd guess the difference is rather large to considered significant. By looking at the numbers quickly, it seems to be around 25%. I'm a bit shocked, in fact, because in my own experience, the difference is much larger, but this experiment controls for language and my experience doesn't.

okamiueru · on June 8, 2012

Yes, I was able to read that in both your comments, as well as in the paper.

I suspect you are not understanding my point, or what I said about the meaning of the use of the word "significance" when used in statistics.

edit: Say you flip a loaded coin that is 50.1% likely to be heads. Now you want to test whether this is loaded, and flip the coin a certain number of times and count the outcomes. If the number of times you flip the coin is too low, you won't be able to say the coin is 'significantly loaded'. It might be either way, you don't have enough data. If you flip it enough times, you will be able to say something about it -- i.e. that either it significantly is, or significantly isn't loaded.

However, in vernacular English, you would still say that the difference isn't very significant. Who cares if it is 50% or 50.1%.

sampo · on June 8, 2012

There is no reason to quarrel about the meaning of 'significance' in statistical testing vs. in natural language. Just look at figs. 4 and 5 in the paper, and see that mean times spent for the scanner task were:

5.17 hours (dynamical typing) 7.71 hours (static typing)

And that the difference in statistically significant (p=0.04, Mann-Whitney U-test). Whether 5 vs. 8 hours is significant in the natural language sense, everyone can decide for themselves.

rbanffy · on June 8, 2012

Yes, but I'd risk that a 25% deviation is significant when the sample is 49 students. The smaller the sample, the larger the deviation must be to be significant, but 25% is quite a difference.

Also, it's worth to notice they controlled for language - they used the same language in two flavors - to isolate the typing system difference. It's not a Lisp vs. C thing.