Hacker News new | past | comments | ask | show | jobs | submit login

This is neat and largely spot on, but I do want to call out one specific thing that I strongly, strongly disagree with:

> Validate user input used in SSJS commands with regular expressions.

If I had a dollar for every time I saw someone trying and failing to solve things like SQL injection, XSS, and path injections with regexes, I'd be a millionaire. It's ridiculously hard to get it right. The best solution is: don't ever, ever put user input into code. Ever. Build it based on user input, but don't put user input into the code.

Edit: Just because a lot of people may not be familiar with the pitfalls, here are a couple things:

- What character set is your input string, what character set is your regex engine using, and what character set is the consumer of the input expecting? An impedance mismatch at any of these points could allow malicious strings to go right through, even if the regex would normally match it just fine. For instance, if you're emitting XML and you're using a regex to try and match bad things in that XML, consider that your regex might miss, say, UTF-16, which could be totally valid in the XML if you change the charset in the definition tag.

- You're building something that generates a file path; you want to restrict the user from moving up the directory structure and also want to make sure they're not writing into the 'foo' directory. So you do (in effect): if path =~ /\.\.\// then bail else return_file(path.replace('foo/', '')) end -- if a malicious user passes in '../../../../etc/passwd' then it'll get caught by your filter; if they pass in 'foo/bar.txt' then it'll really read 'bar.txt'; but if they pass in '.foo/./.foo/./.foo/./.foo/./etc/passwd' then it'll read '../../../../etc/passwd'. If you're modifying things after the regex, be very careful that you're not compromising the regex.




"using a regex to try and match bad things"

And that's the core failure, not the use of regexes.

Whatever you are using to filter user input, you always filter the good things in, and not the bad things out.


Exactly. For example the common case of filtering "text to be displayed via the web" is probably best expressed as a (1) a validating conversion to utf8 followed by (2) a regex that translates characters outside the safe range to XML entities.


I agree that the solution isn't to validate incoming input.

The way to avoid injection attacks is to make sure you're correctly escaping user input when you use it to compose queries - preferably using a separate abstraction layer rather than calling the escaping function manually.

The most effective tools handle escaping by default, and make you have to work hard to avoid escaping - Django escapes anything output in to a template for example, and the Django ORM handles SQL escaping for you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: