I'm not sure why this is surprising or newsworthy; it has been this way ever sin...

I'm not sure why this is surprising or newsworthy; it has been this way ever since o3. I guess few people noticed.

There are a few masters-level publishable research problems that I have tried with LLMs on thinking mode, and it had produced a nearly complete proof before we had a chance to publish it. Like the problem stated here, these won't set the world on fire, but they do chip away at more meaningful things.

It often doesn't produce a completely correct proof (it's a matter of luck whether it nails a perfect proof), but it very often does enough that even a less competent student can fill in the blanks and fix up the errors. After all, the hardest part of a proof is knowing which tools to employ, especially when those tools can be esoteric.