> The tools and systems would tell us "all data is used for everything"
Reminds me of the tendency in program analysis to just spit out top (e.g. all possible values) because the state space overwhelms automated reasoning, and the only way back is annotating/rewriting code to help the analyzer forward.
But, as someone who doesn't think/deal with data much, this is a surprise to me. It makes sense though. Does this mean our data is forever tainted, as were?
Since you've been proximate to this work: do you think there's any hope in flipping the script such that possessing/processing this data is a liability rather than an asset? That's one of the few ways I can see to align incentives between highly monied-interests and the targets of data collection. (I'm doubt this will happen in the US in my lifetime, barring a major, 9/11-level event where abuse of this data is directly tied to the catastrophe.)
Processing data is already a liability. The compute power isn't free. The storage isn't free. The engineering time isn't free. It's just that it produces more value than what it costs or people wouldn't do it.
I believe this project was in response to the EU's Digital Services Act ("DSA"). Now IANAL but it always struck me as ambiguous as to what constitutes "personalization". Like, can your data be used to train an ML system? What if identity is scrubbed from that data? What if it's aggregated in some way so there's no individual user activity at all?
It also raises questions about every aspect of product behavior. Let's say you see a video post on your news feed. You get shown a snippet of the comments. Usually this is from friends but maybe it's just a comment that we thought might interest you. Is that personalization? What about a shared link where a snippet is shown from the article?
Also, the DSA seems to apply to on-site contextual personalization. What about off-site? Is that covered? Is there a separate law for that? Can I use derived demographics from a display ad pixel but not what pages you've liked and what groups you participate in on site?
Can I use your friends' activity to customize your experience?
The list goes on.
I'm a fan of clear legislation. If the goal is, for example, to allow opt out of personalized ads based on contextual on-site behaviour, we should enumerate what applications (eg feed ads) are covered and what information you're allowed to use if someone opts out.
Reminds me of the tendency in program analysis to just spit out top (e.g. all possible values) because the state space overwhelms automated reasoning, and the only way back is annotating/rewriting code to help the analyzer forward.
But, as someone who doesn't think/deal with data much, this is a surprise to me. It makes sense though. Does this mean our data is forever tainted, as were?
Since you've been proximate to this work: do you think there's any hope in flipping the script such that possessing/processing this data is a liability rather than an asset? That's one of the few ways I can see to align incentives between highly monied-interests and the targets of data collection. (I'm doubt this will happen in the US in my lifetime, barring a major, 9/11-level event where abuse of this data is directly tied to the catastrophe.)