Hacker News new | past | comments | ask | show | jobs | submit login

If I understand correctly, they just call it a limitation of the approach and invite to sample the user base rather than collect data systematically (plausible deniability is a bit of a moot point, once you have been exposed as a dog with 95% confidence, pleading that there is still a small chance you might not be one doesn't really help).

I am not sure how any of that helps in a mass collection of data like OS telemetry.




I suspect you did not understand correctly? I just read their section 6, and they neither "just call it a limitation" (the section is three pages long, with several recommendations) nor invite anyone to sample the user base (this generally just focuses the privacy loss on the sampled people).

Which text were you reading that lead you to this conclusion?


On sampling:

> It is likely that some attackers will aim to target specific users by isolating and analyzing reports from that user, or a small group of users that includes them. Even so, some randomly-chosen users need not fear such attacks at all...

For the limitation, the whole section 6.1 explains that this only protects a single question. If you collect more than single question, you must rely on other techniques to protect the privacy.


Yes, I think you've misunderstood.

The text you've quoted is about how a random subset of the population is already immune to the issue of repeated queries, not that subsampling the population helps in any way. If you don't interrupt the quotation mid-sentence, it reads:

> Even so, some randomly-chosen users need not fear such attacks at all: with probability (1/2 f)^h, clients will generate a Permanent randomized response B with all 0s at the positions of set Bloom filter bits. Since these clients are not contributing any useful information to the collection process, targeting them individually by an attacker is counter-productive.

The whole of section 6.1 is not about how it only protects a single question, it is about how one ensures that the single-question protections generalize to larger surveys, concluding that

> This issue, however, can be mostly handled with careful collection design.


Sampling and taking a random subset of the population are synonymous.

But my point is precisely that this technique helps with a single question. As soon as you are doing continuous mass collection you don't really get any privacy protection from this technique, and you have to rely on other techniques (encryption, etc).


There are two kinds of sampling involved here: selecting whom to ask a question, and individuals selecting their responses. The random subset of the population is determined by their own choices, that lead them to never say anything useful. An attacker has no influence on this, so if their target is within that group, the attack can't succeed.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: