You just sample from a grammar and you automatically get 100%; who knows but it seems the most likely thing they are doing. llama.cpp has supported this for a while ( using a BNF-style grammar -- https://github.com/ggerganov/llama.cpp/blob/master/grammars/... )
edit: oh actually, we do sort of know -- they call out jsonformer as an inspiration in the acknowledgements
Isn't your example showing an issue w/ the opposite approach, where someone is getting bad output w/ an earlier openAI json mode that worked via training rather than mechanical output restriction to conform to a schema?
FWIW (not too much!) I have used llama.cpp grammars to restrict to specific formats (not particular json, but an expected format), fine-tuned phi2 models, and I didn't hit any issues like this.
I am not intuitively seeing why restricting sampling to tokens matching a schema would cause the LLM to converge on valid tokens that make no sense...
Are there examples of this happening w/ people using e.g. jsonformer?
edit: oh actually, we do sort of know -- they call out jsonformer as an inspiration in the acknowledgements
https://github.com/1rgs/jsonformer