Hacker News new | past | comments | ask | show | jobs | submit login

I think the idea is just to have it lose "train of thought" because there aren't any high-probability completions to a long run of repeated words. So the next time there's a bit of entropy thrown in (the "temperature" setting meant to prevent LLMs from being too repetitive), it just latches onto something completely random.



That’s a good theory.

It latches onto something random, and once it’s off down that path it can’t remember what it was asked to do and so its task is entirely reduced to next-word prediction (without even the addition of the usual specific context/inspiration from an intitial prompt). I guess that’s why it tends to leak training data. This attack is a simple way to say ‘write some stuff’ without giving it the slightest hint what to actually write.

(Saying ‘just write some random stuff’ would still in some sense be giving it something to go on; a huge string of ‘A’s less so.)


Well said. Like going for a long walk in the woods and getting lost completely in tangential thinking.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: