The option is opt-in (which is good, the fickle reddit community would revolt otherwise) which means almost nobody uses it. If Reddit would remind their users frequently (e.g. at the end of popular Reddit Blog posts as an aside) or reward people for enabling the option (free Reddit Gold for a week, etc.) I'm sure many more people would sign up.
I'm not so sure about that. I believe the problem has been solved, the solution just isn't widely known yet. I read a paper on arxiv probably a year ago that describes a method that seems pretty straightforward and secure, but I've never seen anything about it since. It involved, essentially, throwing out any records which could actually contribute to a change in a statistical measure. You basically end up finding what aspects of the data are actually identifiable, and throw out any records that contain that. It's guaranteed not to screw up your observations because, by definition, if something is statistically significant it has to show up often enough that it CANT be used to single out a source.
Given the dismal history of anonymization, a paper on arvix is roughly up there with a blogger saying 'I've proven p!=np'...
> It's guaranteed not to screw up your observations because, by definition, if something is statistically significant it has to show up often enough that it CANT be used to single out a source.
What's 'statistically significant' here? The usual p<0.05 convention? You realize that there can be multiple measurements or pieces of data all of which individually have p>0.05 but together have p<<0.05... Information leakage should be measured in bits, not p-values.
(This kind of aggregation is one of the benefits of approaches like meta-analysis.)
EDIT: (Sorry for all the parentheticals.)