We should offer a prize for the first person who finds an innocuous input that l...

We should offer a prize for the first person who finds an innocuous input that leads to the model responding with an unintended malicious response.

I think it's funny that 1990's sci-fi movies about AI always showed that two of the most ridiculous things people in the future could do were:

- give your powerful AI access to the Internet

- allow your powerful AI to write and run its own code

And yet here we are. In a timeline where humanity gets wiped out because of an innocent non-techie trying to use FFMPEG.

Somebody is watching us and throwing popcorn at their screen right now!