I agree that the "forcing valid json output" is super cool.
But it's unrelated to the problem of LLM hallucinations. A hallucination that's been validated as correct json is still a hallucination.
And if your problem space is simple enough that you can validate the output of an LLM well enough to prove it's free of hallucinations, then your problem space doesn't need an LLM to solve it.
> your problem space doesn’t need an LLM to solve it
Hmmm… kinda opinion right?
I’m saying; in specific situations, you can validate the output and aggregate solutions based on deterministic criteria to mitigate hallucinations.
You can use statistical methods (eg. There’s a project out there that generates tests and uses “on average tests pass” as a validation criteria) to reduce the chance of an output hallucination to probability threshold that you’re prepared to accept… for certain types of problems.
That the problem space is trivial or not … that’s your opinion, right?
It has no bearing on the correctness of what I said.
There’s no specific reason to expect that just like you can validate output against a grammar to require output that is structurally correct, you can’t validate output against some logical criteria (eg. unit tests) to require output that is logically correct against the specified criteria.
It’s not particularly controversial.
Maybe the output isn’t perfectly correct if you don’t have good verification steps for your task, maybe the effort required to build those validators is high, I’m just saying: it is possible.
But it's unrelated to the problem of LLM hallucinations. A hallucination that's been validated as correct json is still a hallucination.
And if your problem space is simple enough that you can validate the output of an LLM well enough to prove it's free of hallucinations, then your problem space doesn't need an LLM to solve it.