I have been taking a course in AI policy and the O1 and the FrontierMath dataset has been an important mark for me to emphasize the world we are moving toward. It is incredibly sad to know about the conflict of interest here. However, those more knowledgeable, can you explain in plain words, does this revelation compromise OAI's claims regarding o3's performance on FrontierMath problems?
It's worse than just an undeclared conflict of interest. They gave OpenAI all questions and solutions behind the scenes. It's hard to chalk this up to only naivete. This is a "sorry you caught me" moment.
They have an oral agreement that OpenAI won't use the benchmark in training. Which means first and foremost you have to consider the possibility that they broke that oral agreement and actually included the problems in the training set. Even if they didn't, the fact that they had the problems means they could have selectively chosen the training set data to specialize in solving that class of problem, while still technically keeping the verbal agreement.
So, yeah, the benchmark needs to be treated as essentially worthless at this point.
If OpenAI wanted the questions/solutions, there is going to be a reason for that. This data is not sitting in an unopened folder on Sam's computer.
There are a lot of ways you can use data to improve a model without directly training on it. A train/test validation loop, for example. Or as a wellspring for synthetic data generation. But all of these ways involve some level of data contamination, it's unavoidable.
This is a great tool. Thank you. As an international student in US with a lag of 10-12 hours with my home country, this will be very helpful in scheduling calls.
No, I don't think so. GRE is taken individually and PISA is conducted I believe in cooperation with the respective governments for a sample of students.