Hacker News new | past | comments | ask | show | jobs | submit login

IMHO an opt-out system is never going to work for this.

Story time: while I still worked at Facebook, there was a company wide project for data attribution to comply with opting out of personalization (I believe to comply with an EU directive but don't quote me on that). The idea was to identify the source of any data by esentially tagging it on a granular level. This affected all the offline data procesing and ML processes but also code (online and offline) where various tools and systems were being built to analyze sites of data usage to detect use of personalized data and respect opt outs where applicable.

I made two predictions at the early stages of this:

1. The tools and systems would tell us "all data is used for everything" and

2. Creating new non-personalized data pipelines and add things to them would be far easier and faster than trying to remove personalzied data from existing pieplines. Then, things like ad performance become an optimizatino problem with a clear benchmark (eg new unpersonalized ad serving pipeline vs the old personalized pipeline).

A lot of work did, I believe, basically confirm (1). Untangling that seems, at least to me, to be a Sisyphean task. I don't know where this project ended up since I left while it's ongoing but I stand by (2).

My point is that (IMHO) opt-out just doesn't work for this kind of thing. If we really care about data privacy and authorized use of data at some point we will need to take the oposite approach and enumerate what data we're allowed to use.




When I'm met with the response/resistance of "if you care so much then just OPT-OUT!" my usual retort is: "if it weren't just about money, it'd be OPT-IN."

Almighty Dollar™, god...


> The tools and systems would tell us "all data is used for everything"

Reminds me of the tendency in program analysis to just spit out top (e.g. all possible values) because the state space overwhelms automated reasoning, and the only way back is annotating/rewriting code to help the analyzer forward.

But, as someone who doesn't think/deal with data much, this is a surprise to me. It makes sense though. Does this mean our data is forever tainted, as were?

Since you've been proximate to this work: do you think there's any hope in flipping the script such that possessing/processing this data is a liability rather than an asset? That's one of the few ways I can see to align incentives between highly monied-interests and the targets of data collection. (I'm doubt this will happen in the US in my lifetime, barring a major, 9/11-level event where abuse of this data is directly tied to the catastrophe.)


Processing data is already a liability. The compute power isn't free. The storage isn't free. The engineering time isn't free. It's just that it produces more value than what it costs or people wouldn't do it.

I believe this project was in response to the EU's Digital Services Act ("DSA"). Now IANAL but it always struck me as ambiguous as to what constitutes "personalization". Like, can your data be used to train an ML system? What if identity is scrubbed from that data? What if it's aggregated in some way so there's no individual user activity at all?

It also raises questions about every aspect of product behavior. Let's say you see a video post on your news feed. You get shown a snippet of the comments. Usually this is from friends but maybe it's just a comment that we thought might interest you. Is that personalization? What about a shared link where a snippet is shown from the article?

Also, the DSA seems to apply to on-site contextual personalization. What about off-site? Is that covered? Is there a separate law for that? Can I use derived demographics from a display ad pixel but not what pages you've liked and what groups you participate in on site?

Can I use your friends' activity to customize your experience?

The list goes on.

I'm a fan of clear legislation. If the goal is, for example, to allow opt out of personalized ads based on contextual on-site behaviour, we should enumerate what applications (eg feed ads) are covered and what information you're allowed to use if someone opts out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: