Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You just sample from a grammar and you automatically get 100%; who knows but it seems the most likely thing they are doing. llama.cpp has supported this for a while ( using a BNF-style grammar -- https://github.com/ggerganov/llama.cpp/blob/master/grammars/... )

edit: oh actually, we do sort of know -- they call out jsonformer as an inspiration in the acknowledgements

https://github.com/1rgs/jsonformer



Using this in a naive way can easily degenerate into the LLM outputting syntactically/gramatically valid tokens that make no sense, like in this example: https://community.openai.com/t/json-format-causes-infinite-n...

This might be even more pronounced when the output is restricted more using the JSON schema.

So the heavy lifting here was most likely to align the model to avoid/minimize such outcomes, not in tweaking the token sampler.


Isn't your example showing an issue w/ the opposite approach, where someone is getting bad output w/ an earlier openAI json mode that worked via training rather than mechanical output restriction to conform to a schema?

FWIW (not too much!) I have used llama.cpp grammars to restrict to specific formats (not particular json, but an expected format), fine-tuned phi2 models, and I didn't hit any issues like this.

I am not intuitively seeing why restricting sampling to tokens matching a schema would cause the LLM to converge on valid tokens that make no sense...

Are there examples of this happening w/ people using e.g. jsonformer?


You're basically taking the model "off policy" when you bias the decoder, which can definitely make weird things happen.


Oh, thanks for the links. Super interesting!




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: