The training data is so large that it incidentally includes basically anything t...

paulclinger · on March 14, 2023

They seem to be taking this into account: We did no specific training for these exams. A minority of the problems in the exams were seen by the model during training; for each exam we run a variant with these questions removed and report the lower score of the two. We believe the results to be representative. (this is from the technical report itself: https://cdn.openai.com/papers/gpt-4.pdf, not the article).

int_19h · on March 14, 2023

By the same token, though, whatever test questions and answers it might have seen represent a tiny bit of the overall training data. It would be very surprising if it selectively "remembered" exact answers to all those questions, unless it was specifically trained repeatedly on them.