Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The spikiness of AI capabilities is very interesting. A model can recognise misaligned behaviour in its user, and brick their laptop. The same model can’t detect its system prompt being jailbroken.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: