[Other BAML creator here!] one time we told a customer to do this to fix small json mistakes but turns out their customers don't tolerate a +20-30s increase in latency for regenerating a long json structure.
We instead had to write a parser to catch small mistakes like missing commas, quotes etc, and parse content even if there's things like reasoning in the response, like here: https://www.promptfiddle.com/Chain-of-Thought-KcSBh
I'm not sure I understand, in the docs for the python client it says that BAML types get converted to Pydantic models, doesn't that step include the extra latency you mentioned?
My bad, I think I didnt explain correctly. Basically you have two options when a "," is missing (amongst other issues) in an LLM output which causes a parsing issue:
- retry the request, which may take 30+ secs (if your LLM outputs are really long and you're using something like gpt4)
- fix the parsing issue
In our library we do the latter. The conversion from BAML types to Pydantic ones is a compile-time step unrelated to the problem above. That doesn't happen at runtime.
tb = TypeBuilder()
tb.Person.add_property("last_name", tb.string().list())
tb.Person.add_property("height", tb.float().optional()).description(
"Height in meters"
)
tb.Hobby.add_value("chess")
for name, val in tb.Hobby.list_values():
val.alias(name.lower())
tb.Person.add_property("hobbies", tb.Hobby.type().list()).description(
"Some suggested hobbies they might be good at"
)
# no_tb_res = await b.ExtractPeople("My name is Harrison. My hair is black and I'm 6 feet tall.")
tb_res = await b.ExtractPeople(
"My name is Harrison. My hair is black and I'm 6 feet tall. I'm pretty good around the hoop.",
{"tb": tb},
)
assert len(tb_res) > 0, "Expected non-empty result but got empty."
for r in tb_res:
print(r.model_dump())
Neat, thanks! I'm still pondering wether I should be using this since most of the retries I have to do are because of the LLM itself not understanding the schema asked for (eg output with missing fields / using a value not present in `Literal[]`) — certain models being especially bad with deeply nested schemas and output gibberish. Anything on your end that can help with that?
or if you're open to share your prompt / data model with, I can send over my best guess of a good prompt! We've found these models works even with over 50+ fields / nested and whatnot decently well!
I might share it with you later on your discord server.
> I can send over my best guess of a good prompt!
Now if you could automate the above process by "fitting" a first draft prompt to a wanted schema, ie where your library makes a few adjustments if some assertions do not pass by have having a chat of its own with the LLM, that would be super useful! Heck i might just implement it myself.
[Another BAML creator here]. I agree this is an interesting direction! We have a "chat" feature on our roadmap to do this right in the VSCode playground, where an AI agent will have context on your prompt, schema, (and baml test results etc) and help you iterate on the prompt automatically. We've done this before and have been surprised by how good the LLM feedback can be.
We just need a bit better testing flow within BAML since we do not support adding assertions just yet.