I normally dislike working with survey data since there
is a high possibility of selection bias among
the respondents.[..] For this reason, I will
show confidence intervals whenever possible to
reflect the proportionate uncertainty for
groupings with insufficient data [..]
That... is not how statistics work(?). I mean – confidence intervals help with small sample sizes, but they do nothing for systematic errors such as those introduced by selection bias.
[continued from above] and to also account for
possibility that a minority of respondents may
be dishonest and nudge their programming ability
a few points higher than the truth.
There's a surprising amount of assumptions that went into this sentence. I'd question the assertions that:
- people are "dishonest" (My intuition would point to a subconscious bias more than actual dishonesty)
- It's a minority (The second chart shows that >50% of respondents with one year of experience or less rate themselves as better than average).
- That subconscious biases only work in one direction
... and once again I have no idea how confidence intervals can help. A large interval may indicate bad measurement. It may also indicate high variability in the actual data.
Also, keep in mind that these groupings alone
do not imply a causal relationship between
the two variables.
... someone paid attention in his middle-school statistics club...
Employing traditional regression analysis to
build a model for predicting programming ability
would be tricky: does having more experience
cause programming skill to improve, or does having
strong innate technical skill cause developers to
remain in the industry and grow?
... but failed statistics 101. A regression analysis doesn't care about causality. If "Mac users are more likely to be college-educated" it doesn't matter that "buying a Mac may not actually make you smarter". I can still make the prediction "a given Mac user is more likely to have a college degree".
Microaggressions aside, these are fair counterpoints. I spent far less time editing the body of the post than optimizing the visualizations/Jupyter Notebook (especially in this particular post). I've taken more care in future posts since.
If you are looking for feedback, here is another suggestion for future posts: I found the use of violin plots for discrete data to be confusing. To be honest, I still am not sure how to interpret the unlabeled Y axis. I think a histogram would have been easier for me (and others) to interpret.
But suggestions aside, I found your article to be interesting. Thank you for it.
I second the criticism of using violin plots here. A violin plot is a kernel density plot. It is designed to show a distribution at a scale much larger than variations in the data. Its raison d'etre is to smooth out and aggregate these small-scale variations.
But in this survey data, the values in the distribution are spaced far apart. The discretization is so large that the violin plots show meaningless and weirdly inconsistent curves between x-values which actually have data. A bar plot would be much clearer.
OP, I think you have done a fine job of styling your plots tastefully. But I recommend taking another look at the visual language you have chosen to communicate the data.
- people are "dishonest" (My intuition would point to a subconscious bias more than actual dishonesty)
- It's a minority (The second chart shows that >50% of respondents with one year of experience or less rate themselves as better than average).
- That subconscious biases only work in one direction
... and once again I have no idea how confidence intervals can help. A large interval may indicate bad measurement. It may also indicate high variability in the actual data.
... someone paid attention in his middle-school statistics club... ... but failed statistics 101. A regression analysis doesn't care about causality. If "Mac users are more likely to be college-educated" it doesn't matter that "buying a Mac may not actually make you smarter". I can still make the prediction "a given Mac user is more likely to have a college degree".