You can use structured generation instead of fiddling with the prompt, which is ...

codetrotter · on May 25, 2024

Does this Python package control the LLMs using something other than text? Or is the end result still that that Python package wraps your prompt with additional text containing additional instructions that become part of the prompt itself?

tikhonj · on May 25, 2024

Looks like it actually changes how you do token generation to conform to a given context-free grammar. It's a way to structure how you sample from the model rather than a tweak to the prompt, so it's more efficient and guarantees that the output matches the formal grammar.

There's a reference to the paper that describes the method at the bottom of the README: https://arxiv.org/pdf/2307.09702

sp332 · on May 25, 2024

The output of the LLM is not just one token, but a statistical distribution across all possible output tokens. The tool you use to generate output will sample from this distribution with various techniques, and you can put constraints on it like not being too repetitive. Some of them support getting very specific about the allowed output format, e.g. https://github.com/ggerganov/llama.cpp/blob/master/grammars/... So even if the LLM says that an invalid token is the most likely next token, the tool will never select it for output. It will only sample from valid tokens.

progbits · on May 25, 2024

No it limits what tokens the LLM can output. The output is guaranteed to follow the schema.