You just sample from a grammar and you automatically get 100%; who knows but it ...

senko · on Aug 6, 2024

Using this in a naive way can easily degenerate into the LLM outputting syntactically/gramatically valid tokens that make no sense, like in this example: https://community.openai.com/t/json-format-causes-infinite-n...

This might be even more pronounced when the output is restricted more using the JSON schema.

So the heavy lifting here was most likely to align the model to avoid/minimize such outcomes, not in tweaking the token sampler.

dilap · on Aug 6, 2024

Isn't your example showing an issue w/ the opposite approach, where someone is getting bad output w/ an earlier openAI json mode that worked via training rather than mechanical output restriction to conform to a schema?

FWIW (not too much!) I have used llama.cpp grammars to restrict to specific formats (not particular json, but an expected format), fine-tuned phi2 models, and I didn't hit any issues like this.

I am not intuitively seeing why restricting sampling to tokens matching a schema would cause the LLM to converge on valid tokens that make no sense...

Are there examples of this happening w/ people using e.g. jsonformer?

TheEzEzz · on Aug 6, 2024

You're basically taking the model "off policy" when you bias the decoder, which can definitely make weird things happen.

crowcroft · on Aug 6, 2024

Oh, thanks for the links. Super interesting!