> I don't put a lot of stock on evals. Same, although they are helpful for setti...

> I don't put a lot of stock on evals.

Same, although they are helpful for setting expectations for me. I have some use cases (I'm hesitant to call them evals) related to how we use GPT for our product that are a good "real world" test case. I've found that Claude models are the only ones that are up to par with GPT in the past.