> Non-deterministic behavior will never be trusted by default, as it's simply no...

Sharlin · 2025-05-15T07:34:37 1747294477

Proving the correctness of the “improvements” is another thing entirely, though.

NitpickLawyer · 2025-05-15T07:55:41 1747295741

I agree. At first the problems that you try to solve need to be verifiable.

But there's progress on many fronts on this. There's been increased interest in provers (natural language to lean for example). There's also been progress in LLM-as-a-judge on open-ish problems. And it seems that RL can help with extracting step rewards from sparse rewards domains.