Could someone please explain to me why "sanitizing database inputs" was ever con...

whatnow37373 · 2025-03-18T11:09:35 1742296175

How is this not "sanitizing inputs"?

The basic idea behind your proposal exists and is called prepared statements. It's actually, I hope, the normal way to write queries these days.

You write your query like: "SELECT * FROM users WHERE username = ?" and execute your query like "execute(query, username)".

The problem? It's optional.

Validark · 2025-03-18T11:28:22 1742297302

To me, "sanitizing inputs" implies a transformation of the data into a string that can be "safely" evaluated as code which hopefully yields the input data. Instead you should be able to just mark a piece of the code as data, that will never be tokenized or parsed or anything, just dropped directly into a buffer.

"Prepared statements" sounds EXACTLY like what I was thinking! I don't understand why people would ever use anything else.

whatnow37373 · 2025-03-18T12:04:56 1742299496

Ah, I see! It's a cool idea, but .. let's try to be maximally obtuse and pedantic today. I'm a developer and it's HN after all.

[4]tree is also code that yields data. At the end of the day some kind of parser needs to decide what to do with your data and [ ] is just another way of escaping special characters. In this case it escapes entire strings instead of individual characters. It's your special way of sanitizing the input.

Questions: Who is responsible for the number? What is this number: bytes, "characters", runes? What happens if the number is wrong? (If you expose this number to external factors of any kind you get a special, interesting new breed of SQL injection.)

In practice you'd probably do something like:

my_special_superduper_safety_syntax_preprocessor("SELECT * FROM users WHERE username=$$$", "peter")

Which will yield something like:

"SELECT * FROM users WHERE username=[5]peter"

.. so you don't have to calculate the number. If we're doing this, why not just go for:

exec("SELECT * FROM users WHERE username=?", "peter")

.. and be done with it.

> I don't understand why people would ever use anything else.

Yes, I agree. Usually it's some interesting combination of laziness and ignorance.

Validark · 2025-03-18T18:06:16 1742321176

Yes, I almost certainly would write that wrapper to make it impossible for me to pair the wrong length with a string. And yes, it technically is still code, but at least the bytes are not being "looked at" in a way where they might get executed in some way. Again, a "Prepared Statement" is what I was hoping would exist (also for perf reasons) so I'm happy to know it does.

If I were writing it, I would exclusively use the "Prepared Statements" technique, and for people typing in SQL queries by hand, the sole string construct would be as I described.

I'm a Ziguana, so in my design, the number would be "bytes". You would have to "calculate" the number of bytes in languages like Swift or JavaScript. But still, overall it's a better idea to me than turning ' into \' and many other convoluted transformations that are often incorrect and are also just throwing away CPU cycles for no reason whatsoever.

In any case, I would still be semi-offended if I learned that "Prepared Statements" transformed the data in any way whatsoever. In the compromise solution where SQL has a string construct, I want the SQL tokenizer to do this:

  if (cur_token.kind == .data) {
    // copy data out
    @memcpy(
      some_buffer, 
      cur[0..cur_token.value]
    );
    // skip over the whole thing
    // before generating next token
    cur = cur[cur_token.value..];
  }

Even better if the data is not sent alongside code at all.

billpg · 2025-03-18T10:37:40 1742294260

Most (all?) SQL client libraries will allow you indicate a parameter placeholder and supply that parameter value separately.