Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could someone please explain to me why "sanitizing database inputs" was ever considered a good idea? Why not just add a feature in SQL like so?

  SELECT * FROM users WHERE username = [<LEN_OF_TEXT>]raw-text-of-len-not-parsed-at-all
E.g.

  SELECT * FROM users WHERE username = [21]flyin' and wavin' guy
                                           ^^^^^^^^^^^^^^^^^^^^^ 
                          these 21 chars are NOT parsed AT ALL, just taken as data

I am not very familiar with SQL so you might need a different prefix but hopefully the idea is obvious.


How is this not "sanitizing inputs"?

The basic idea behind your proposal exists and is called prepared statements. It's actually, I hope, the normal way to write queries these days.

You write your query like: "SELECT * FROM users WHERE username = ?" and execute your query like "execute(query, username)".

The problem? It's optional.


To me, "sanitizing inputs" implies a transformation of the data into a string that can be "safely" evaluated as code which hopefully yields the input data. Instead you should be able to just mark a piece of the code as data, that will never be tokenized or parsed or anything, just dropped directly into a buffer.

"Prepared statements" sounds EXACTLY like what I was thinking! I don't understand why people would ever use anything else.


Ah, I see! It's a cool idea, but .. let's try to be maximally obtuse and pedantic today. I'm a developer and it's HN after all.

[4]tree is also code that yields data. At the end of the day some kind of parser needs to decide what to do with your data and [ ] is just another way of escaping special characters. In this case it escapes entire strings instead of individual characters. It's your special way of sanitizing the input.

Questions: Who is responsible for the number? What is this number: bytes, "characters", runes? What happens if the number is wrong? (If you expose this number to external factors of any kind you get a special, interesting new breed of SQL injection.)

In practice you'd probably do something like:

my_special_superduper_safety_syntax_preprocessor("SELECT * FROM users WHERE username=$$$", "peter")

Which will yield something like:

"SELECT * FROM users WHERE username=[5]peter"

.. so you don't have to calculate the number. If we're doing this, why not just go for:

exec("SELECT * FROM users WHERE username=?", "peter")

.. and be done with it.

> I don't understand why people would ever use anything else.

Yes, I agree. Usually it's some interesting combination of laziness and ignorance.


Yes, I almost certainly would write that wrapper to make it impossible for me to pair the wrong length with a string. And yes, it technically is still code, but at least the bytes are not being "looked at" in a way where they might get executed in some way. Again, a "Prepared Statement" is what I was hoping would exist (also for perf reasons) so I'm happy to know it does.

If I were writing it, I would exclusively use the "Prepared Statements" technique, and for people typing in SQL queries by hand, the sole string construct would be as I described.

I'm a Ziguana, so in my design, the number would be "bytes". You would have to "calculate" the number of bytes in languages like Swift or JavaScript. But still, overall it's a better idea to me than turning ' into \' and many other convoluted transformations that are often incorrect and are also just throwing away CPU cycles for no reason whatsoever.

In any case, I would still be semi-offended if I learned that "Prepared Statements" transformed the data in any way whatsoever. In the compromise solution where SQL has a string construct, I want the SQL tokenizer to do this:

  if (cur_token.kind == .data) {
    // copy data out
    @memcpy(
      some_buffer, 
      cur[0..cur_token.value]
    );
    // skip over the whole thing
    // before generating next token
    cur = cur[cur_token.value..];
  }
Even better if the data is not sent alongside code at all.


Most (all?) SQL client libraries will allow you indicate a parameter placeholder and supply that parameter value separately.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: