To me, "sanitizing inputs" implies a transformation of the data into a string that can be "safely" evaluated as code which hopefully yields the input data. Instead you should be able to just mark a piece of the code as data, that will never be tokenized or parsed or anything, just dropped directly into a buffer.
"Prepared statements" sounds EXACTLY like what I was thinking! I don't understand why people would ever use anything else.
Ah, I see! It's a cool idea, but .. let's try to be maximally obtuse and pedantic today. I'm a developer and it's HN after all.
[4]tree is also code that yields data. At the end of the day some kind of parser needs to decide what to do with your data and [ ] is just another way of escaping special characters. In this case it escapes entire strings instead of individual characters. It's your special way of sanitizing the input.
Questions: Who is responsible for the number? What is this number: bytes, "characters", runes? What happens if the number is wrong? (If you expose this number to external factors of any kind you get a special, interesting new breed of SQL injection.)
In practice you'd probably do something like:
my_special_superduper_safety_syntax_preprocessor("SELECT * FROM users WHERE username=$$$", "peter")
Which will yield something like:
"SELECT * FROM users WHERE username=[5]peter"
.. so you don't have to calculate the number. If we're doing this, why not just go for:
exec("SELECT * FROM users WHERE username=?", "peter")
.. and be done with it.
> I don't understand why people would ever use anything else.
Yes, I agree. Usually it's some interesting combination of laziness and ignorance.
Yes, I almost certainly would write that wrapper to make it impossible for me to pair the wrong length with a string. And yes, it technically is still code, but at least the bytes are not being "looked at" in a way where they might get executed in some way. Again, a "Prepared Statement" is what I was hoping would exist (also for perf reasons) so I'm happy to know it does.
If I were writing it, I would exclusively use the "Prepared Statements" technique, and for people typing in SQL queries by hand, the sole string construct would be as I described.
I'm a Ziguana, so in my design, the number would be "bytes". You would have to "calculate" the number of bytes in languages like Swift or JavaScript. But still, overall it's a better idea to me than turning ' into \' and many other convoluted transformations that are often incorrect and are also just throwing away CPU cycles for no reason whatsoever.
In any case, I would still be semi-offended if I learned that "Prepared Statements" transformed the data in any way whatsoever. In the compromise solution where SQL has a string construct, I want the SQL tokenizer to do this:
if (cur_token.kind == .data) {
// copy data out
@memcpy(
some_buffer,
cur[0..cur_token.value]
);
// skip over the whole thing
// before generating next token
cur = cur[cur_token.value..];
}
Even better if the data is not sent alongside code at all.