If I understand correctly, they just call it a limitation of the approach and in...

frankmcsherry · on Sept 12, 2017

I suspect you did not understand correctly? I just read their section 6, and they neither "just call it a limitation" (the section is three pages long, with several recommendations) nor invite anyone to sample the user base (this generally just focuses the privacy loss on the sampled people).

Which text were you reading that lead you to this conclusion?

cm2187 · on Sept 12, 2017

On sampling:

> It is likely that some attackers will aim to target specific users by isolating and analyzing reports from that user, or a small group of users that includes them. Even so, some randomly-chosen users need not fear such attacks at all...

For the limitation, the whole section 6.1 explains that this only protects a single question. If you collect more than single question, you must rely on other techniques to protect the privacy.

frankmcsherry · on Sept 12, 2017

Yes, I think you've misunderstood.

The text you've quoted is about how a random subset of the population is already immune to the issue of repeated queries, not that subsampling the population helps in any way. If you don't interrupt the quotation mid-sentence, it reads:

> Even so, some randomly-chosen users need not fear such attacks at all: with probability (1/2 f)^h, clients will generate a Permanent randomized response B with all 0s at the positions of set Bloom filter bits. Since these clients are not contributing any useful information to the collection process, targeting them individually by an attacker is counter-productive.

The whole of section 6.1 is not about how it only protects a single question, it is about how one ensures that the single-question protections generalize to larger surveys, concluding that

> This issue, however, can be mostly handled with careful collection design.

cm2187 · on Sept 12, 2017

Sampling and taking a random subset of the population are synonymous.

But my point is precisely that this technique helps with a single question. As soon as you are doing continuous mass collection you don't really get any privacy protection from this technique, and you have to rely on other techniques (encryption, etc).

yorwba · on Sept 12, 2017

There are two kinds of sampling involved here: selecting whom to ask a question, and individuals selecting their responses. The random subset of the population is determined by their own choices, that lead them to never say anything useful. An attacker has no influence on this, so if their target is within that group, the attack can't succeed.