We extensively use vLLM's support for Outlines Structured Output with small lang...

We extensively use vLLM's support for Outlines Structured Output with small language models (llama3 8B, for example) in Zep[0][1]. OpenAI's Structured Output is a great improvement on JSON mode, but it is rather primitive compared to vLLM and Outlines.

# Very Limited Field Typing

OpenAI offers a very limited set of types[2] (String, Number, Boolean, Object, Array, Enum, anyOf) without the ability to define patterns and max/min lengths. Outlines supports defining arbitrary RegEx patterns, making extracting currencies, phone numbers, zip codes, comma-separated lists, and more a trivial exercise.

# High Schema Setup Cost / Latency

vLLM and Outlines offer near-zero cost schema setup: RegEx finite state machine construction is extremely cheap on the first inference call. While OpenAI's context-free grammar generation has a significant latency penalty of "under ten seconds to a minute". This may not impact "warmed-up" inference but could present issues if schemas are more dynamic in nature.

Right now, this feels like a good first step, focusing on ensuring the right fields are present in schema-ed output. However, it doesn't yet offer the functionality to ensure the format of field contents beyond a primitive set of types. It will be interesting to watch where OpenAI takes this.

[0] https://help.getzep.com/structured-data-extraction

[1] https://help.getzep.com/dialog-classification

[2] https://platform.openai.com/docs/guides/structured-outputs/s...