Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems so trivial to prevent this prompt leaking with just a regexp check on the output that I find it really hard to believe.


The LLM could simply re-phrase it, write it in Chinese, or print it in Morse Code. Regex is useless against a technology like GPT-4.


OpenAI could also train their models against this easily, making it hard to get at the prompt. Yet, it's super easy, try it yourself:

https://chat.openai.com/share/94455782-5985-4b20-82fa-521f40...

I imagine OpenAI has no problem this, there are no secrets in the prompt, and it may be useful for prompt engineering. If it's harmless, no point in stopping the user from seeing it.


Me and many others have beaten this completely, give it a shot! https://gandalf.lakera.ai/


There is nothing secret to hide, what would be the purpose of blocking it?


1. It could help competitors improve their alternatives

2. It could be used against them in a lawsuit, so they would probably want to keep it hidden until force to reveal it (which they would likely fight against)

3. It gives more information to the people crafting "jailbreaks"

4. It might create a backlash, considering how heavy-handed the "please make images diverse" part of it is

5. It might create ANOTHER backlash, on the other side of that coin, for not being heavy-handed enough, and not explicitly listing an ethnicity, gender, or whatever other personal characteristic that some might want ChatGPT to represent by default in its prompts


Just as an example, it would make it easier to craft adversarial attacks to generate undesired behavior.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: