This is neat and largely spot on, but I do want to call out one specific thing t...

CrLf · on Dec 19, 2011

"using a regex to try and match bad things"

And that's the core failure, not the use of regexes.

Whatever you are using to filter user input, you always filter the good things in, and not the bad things out.

ajross · on Dec 19, 2011

Exactly. For example the common case of filtering "text to be displayed via the web" is probably best expressed as a (1) a validating conversion to utf8 followed by (2) a regex that translates characters outside the safe range to XML entities.

simonw · on Dec 19, 2011

I agree that the solution isn't to validate incoming input.

The way to avoid injection attacks is to make sure you're correctly escaping user input when you use it to compose queries - preferably using a separate abstraction layer rather than calling the escaping function manually.

The most effective tools handle escaping by default, and make you have to work hard to avoid escaping - Django escapes anything output in to a template for example, and the Django ORM handles SQL escaping for you.