Using LLMs to write SQL is a fascinating case because there are so many traps you could fall into that aren't really the fault of the LLM.
My favorite example: you ask the LLM for "most recent restaurant opened in California", give it a schema and it tries "select * from restaurants where state = 'California' order by open_date desc" - but that returns 0 results, because it turns out the state column uses two-letter state abbreviations like CA instead.
There are tricks that can help here - I've tried sending the LLM an example row from each table, or you can set up a proper loop where the LLM gets to see the results and iterate on them - but it reflects the fact that interacting with databases can easily go wrong no matter how "smart" the model you are using is.
> that returns 0 results, because it turns out the state column uses two-letter state abbreviations like CA instead.
As you’ve identified, rather than just giving it the schema you give it the schema and a some data when you tell it what you want.
A human might make exactly the same error - based on misassumption - and would then look at the data to see why it was failing.
If we assume that a LLM would magically realise that when you ask it to find something based on an identifier which you tell it is ‘California’ it would magically assume that the query should be based on ‘CA’ rather than what you told it, then that’s not really the fault of the LLM.