Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sadly that "use prompts to detect attacks against prompts" approach isn't reliable, because a suitably devious attacker can come up with text that subverts the filtering LLM as well. I wrote a bit about that here: https://simonwillison.net/2022/Sep/17/prompt-injection-more-...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: