Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't quite understand one thing. They seem to think that keeping their past research papers out of the training set is too hard, so rely on post-training to try and undo the effects, or they want to include "canary strings" in future papers. But my experience has been that basically any naturally written English text will automatically be a canary string beyond about ten words or so. It's very easy to uniquely locate a document on the internet by just searching for a long enough sentence from it.

In this case, the opening sentence "People sometimes strategically modify their behavior to please evaluators" appears to be sufficient. I searched on Google for this and every result I got was a copy of the paper. Why do Anthropic think special canary strings are required? Is the training pile not indexed well enough to locate text within it?



Perhaps they want to include online discussions/commentaries about their paper in the training data without including the paper itself


Most online discussion doesn't contain the entire text. You can pick almost any sentence from such a document and it'll be completely unique on the internet.

I was thinking it might be related to the difficulty of building a search engine over the huge training sets, but if you don't care about scaling or query performance it shouldn't be too hard to set one up internally that's good enough for the job. Even sharded grep could work, or filters done at the time the dataset is loaded for model training.


Why use a search engine when you can use an LLM? ;)


Well, because the goal is to locate the exact documents in the training set and remove them, not answer a question...


So you stream the training set through the context window of the LLM, and ask it if it contains the requested document (also in the context window).

The advantage is that it can also detect variations of the document.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: