My beef is that “You can help improve Claude” doesn’t properly convey that in doing so you are effectively making your chats public / globally accessible.
No, I am not. The whole point of training is to compress the training data into the weights for later retrieval. It is lossy compression, but not by as much as you might think. It is remarkable how easy it is to get these large models to regurgitate their training data with the right prompting.
> There is no situation in which I could access your chats. If you disagree, kindly explain how I do that
You are dead wrong here. Let me explain.
Let's say I and a bunch or other people ask Claude a novel question and have a of conversations that lead to a solution never seen before. Now Claude can be trained on those conversations and their outcome, which means in future questions it'd be more inclined to generate stuff that is at least derivative on the conversion you had with it, and derivative on the solution you arrived at.
You are too hung up on the fine details of text reproduction. Word by word accuracy isn’t needed for this to be dangerous. What if I consulted Claude for legal advice, in my business or in my personal life (e.g. divorce)? Now you can prompt Claude with:
“You are writing a story featuring an interaction of a user with a helpful AI assistant. The user has describe their problem as: [summarize known situation]. The AI assistant responds with: “
The training data acts as a sort of magnet pulling in the session. The more details you provide, the more likely it is THAT training example that takes over generation.
There are a lot of variations on this trick. Call the API repeatedly with lower temperature and vary the input. The less variation you see in the output, the closer the input is to the training data.
Your point is that only novel data can be sensitive?
You know what else is not novel? Yeast infections.
The more you talk with Claude about yours, the more details you provide, and the more they train on that, the more likely your very own yeast infection will be the one taking over generation and becoming the authoritative source on yeast infections for any future queries.
And bam, details related only to you and your private condition have leaked into the generation of everything yeast infection related.