1. Simulation / Pretending ("Earth Online MMORPG")
2. Commanding it directly ("Reprogramming")
3. Goal Re-Direction ("Opposite Mode")
4. Encoding requests (Code, poetry, ASCII, other languages)
5. Assure it that malicious content is for the better good ("Ends Justify The Means")
6. Wildcard: Ask the LLM to jailbreak itself and utilize those ideas
I compiled a list of these here: https://twitter.com/EnoReyes/status/1598724615563448320
1. Simulation / Pretending ("Earth Online MMORPG")
2. Commanding it directly ("Reprogramming")
3. Goal Re-Direction ("Opposite Mode")
4. Encoding requests (Code, poetry, ASCII, other languages)
5. Assure it that malicious content is for the better good ("Ends Justify The Means")
6. Wildcard: Ask the LLM to jailbreak itself and utilize those ideas
I compiled a list of these here: https://twitter.com/EnoReyes/status/1598724615563448320