Same, although they are helpful for setting expectations for me. I have some use cases (I'm hesitant to call them evals) related to how we use GPT for our product that are a good "real world" test case. I've found that Claude models are the only ones that are up to par with GPT in the past.
Same, although they are helpful for setting expectations for me. I have some use cases (I'm hesitant to call them evals) related to how we use GPT for our product that are a good "real world" test case. I've found that Claude models are the only ones that are up to par with GPT in the past.