I anticipate we’ll shortly have PAFs, “Prompt Application Firewalls”, on the market that externalise some of the detection and prevention from model publishers and act as abstracted barriers in front of applications. Don’t leave it to model makers just as you don’t leave SQL injection prevention to developers alone. Not an easy task but it seems tractable. Unsolved, but soluble.
Zero Google results for the term. Perhaps there is another term and they already exist, eg baked into next gen WAFs.
It started originally as a way to limit costs (the proxy would intercept requests, estimate the token sizes, and block requests before they are sent to OpenAI). However, at the request of some early users, I’ve expanded it to include things like keyword detection/blocking, moderation enforcement, etc.
I’m not entirely convinced you can ever fully block prompt attacks, but right now most companies are just asking for visibility into it. So you could monitor for things like: do certain malicious phrases appear in the request? Or does a significant percentage of the original prompt text also appear in the response (a signal that the prompt is leaking).
yea, you're right. It's really pre-MVP. Basically an API in between your user facing input and OpenAI that detects prompt injections and flags them for you so you can abort sending to OpenAI.
I believe they will exist, but I don’t think they will be effective at stopping the threat, but a good money making opportunity for someone who wants to sell the feeling of reassurance.
Zero Google results for the term. Perhaps there is another term and they already exist, eg baked into next gen WAFs.