We just use openai function calls (tools) and then use Pydantic to verify the JS...

knallfrosch · on June 18, 2024

Same here. I send a JSON schema along with the prompt to ChatGPT as function_call and then verify with NodeJS + Ajv against the same schema again.

aaronvg · on June 18, 2024

[Other BAML creator here!] one time we told a customer to do this to fix small json mistakes but turns out their customers don't tolerate a +20-30s increase in latency for regenerating a long json structure.

We instead had to write a parser to catch small mistakes like missing commas, quotes etc, and parse content even if there's things like reasoning in the response, like here: https://www.promptfiddle.com/Chain-of-Thought-KcSBh

b2v · on June 18, 2024

I'm not sure I understand, in the docs for the python client it says that BAML types get converted to Pydantic models, doesn't that step include the extra latency you mentioned?

aaronvg · on June 18, 2024

My bad, I think I didnt explain correctly. Basically you have two options when a "," is missing (amongst other issues) in an LLM output which causes a parsing issue:

- retry the request, which may take 30+ secs (if your LLM outputs are really long and you're using something like gpt4)

- fix the parsing issue

In our library we do the latter. The conversion from BAML types to Pydantic ones is a compile-time step unrelated to the problem above. That doesn't happen at runtime.

b2v · on June 18, 2024

Thanks for the clarification. How do you handle dynamic types, ie types determined at runtime?

hellovai · on June 18, 2024

we recently added dynamic type support with this snippet! (docs coming soon!)

Python: https://github.com/BoundaryML/baml/blob/413fdf12a0c8c1ebb75c...

Typescript: https://github.com/BoundaryML/baml/blob/413fdf12a0c8c1ebb75c...

Snippet:

async def test_dynamic():

    tb = TypeBuilder()

    tb.Person.add_property("last_name", tb.string().list())

    tb.Person.add_property("height", tb.float().optional()).description(
        "Height in meters"
    )


    tb.Hobby.add_value("chess")

    for name, val in tb.Hobby.list_values():
        val.alias(name.lower())

    tb.Person.add_property("hobbies", tb.Hobby.type().list()).description(
        "Some suggested hobbies they might be good at"
    )

    # no_tb_res = await b.ExtractPeople("My name is Harrison. My hair is black and I'm 6 feet tall.")
    tb_res = await b.ExtractPeople(
        "My name is Harrison. My hair is black and I'm 6 feet tall. I'm pretty good around the hoop.",
        {"tb": tb},
    )

    assert len(tb_res) > 0, "Expected non-empty result but got empty."

    for r in tb_res:
        print(r.model_dump())

b2v · on June 18, 2024

Neat, thanks! I'm still pondering wether I should be using this since most of the retries I have to do are because of the LLM itself not understanding the schema asked for (eg output with missing fields / using a value not present in `Literal[]`) — certain models being especially bad with deeply nested schemas and output gibberish. Anything on your end that can help with that?

hellovai · on June 18, 2024

nothing specific, but you can try our prompt / datamodel out on https://www.promptfiddle.com

or if you're open to share your prompt / data model with, I can send over my best guess of a good prompt! We've found these models works even with over 50+ fields / nested and whatnot decently well!

b2v · on June 18, 2024

I might share it with you later on your discord server.

> I can send over my best guess of a good prompt!

Now if you could automate the above process by "fitting" a first draft prompt to a wanted schema, ie where your library makes a few adjustments if some assertions do not pass by have having a chat of its own with the LLM, that would be super useful! Heck i might just implement it myself.

aaronvg · on June 18, 2024

[Another BAML creator here]. I agree this is an interesting direction! We have a "chat" feature on our roadmap to do this right in the VSCode playground, where an AI agent will have context on your prompt, schema, (and baml test results etc) and help you iterate on the prompt automatically. We've done this before and have been surprised by how good the LLM feedback can be.

We just need a bit better testing flow within BAML since we do not support adding assertions just yet.