They trumpet the exam results, but isn't it likely that the model has just memor...

qt31415926 · on March 14, 2023

It's trained on pre-2021 data. Looks like they tested on the most recent tests (i.e. 2022-2023) or practice exams. But yeah standardized tests are heavily weighed towards pattern matching, which is what GPT-4 is good at, as shown by its failure at the hindsight neglect inverse-scaling problem.

allthatisreal · on March 14, 2023

I believe they showed that in GPT4 reversed the trend on the hindsight neglect problem. Search for "hindsight neglect" in the website and you can see that it's accuracy on the problem shot up to 100%.

qt31415926 · on March 14, 2023

oh my bad, totally misread that

pphysch · on March 14, 2023

Well, yeah. It's a LLM, it's not reasoning about anything.