Does this Python package control the LLMs using something other than text? Or is the end result still that that Python package wraps your prompt with additional text containing additional instructions that become part of the prompt itself?
Looks like it actually changes how you do token generation to conform to a given context-free grammar. It's a way to structure how you sample from the model rather than a tweak to the prompt, so it's more efficient and guarantees that the output matches the formal grammar.
The output of the LLM is not just one token, but a statistical distribution across all possible output tokens. The tool you use to generate output will sample from this distribution with various techniques, and you can put constraints on it like not being too repetitive. Some of them support getting very specific about the allowed output format, e.g. https://github.com/ggerganov/llama.cpp/blob/master/grammars/... So even if the LLM says that an invalid token is the most likely next token, the tool will never select it for output. It will only sample from valid tokens.