Hacker News new | past | comments | ask | show | jobs | submit login

Seems like there are a few essential categories of prompts which can be abused. Will be interesting to see how OpenAI responds to these:

1. Simulation / Pretending ("Earth Online MMORPG")

2. Commanding it directly ("Reprogramming")

3. Goal Re-Direction ("Opposite Mode")

4. Encoding requests (Code, poetry, ASCII, other languages)

5. Assure it that malicious content is for the better good ("Ends Justify The Means")

6. Wildcard: Ask the LLM to jailbreak itself and utilize those ideas

I compiled a list of these here: https://twitter.com/EnoReyes/status/1598724615563448320




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: