Serious theorem-proving AIs always write the proof in a formal syntax where it i...

nybsjytm · 2025-03-12T20:33:21 1741811601

> Checking the correctness of proofs is a much easier problem than coming up with the proof in the first place.

Just so this isn't misunderstood, not so much cutting-edge math is presently possible to code in lean. The famous exceptions (such as the results by Clausen-Scholze and Gowers-Green-Manners-Tao) have special characteristics which make them much more ground-level and easier to code in lean.

What's true is that it's very easy to check if a lean-coded proof is correct. But it's hard and time-consuming to formulate most math as lean code. It's something many AI research groups are working on.

zozbot234 · 2025-03-12T22:08:35 1741817315

> The famous exceptions (such as the results by Clausen-Scholze and Gowers-Green-Manners-Tao) have special characteristics which make them much more ground-level and easier to code in lean.

"Special characteristics" is really overstating it. It's just a matter of getting all the prereqs formalized in Lean first. That's a bit of a grind to be sure, but the Mathlib effort for Lean has the bulk of the undergrad curriculum and some grad subjects formalized.

I don't think AI will be all that helpful wrt. this kind of effort, but it might help in some limited ways.

oersted · 2025-03-13T09:08:39 1741856919

Yes I definitely concur, I have spent significant time with it.

The main bottleneck is having the libraries that define the theorems and objects you need to operate at those levels. Everything is founded on axiomatic foundations and you need to build all of maths on top of that. Projects like mathlib are getting us there but it is a massive undertaking.

It’s not just that it is a lot of maths to go through, it’s also that most maths has not really been proven to this degree of exactitude and there is much gap-filling to do when trying to translate existing proofs, or the reasoning style might be quite distant to how things are expressed in Lean. Some maths fields are also self-consistent islands that haven’t been yet connected to the common axiomatic foundations, and linking them is a serious research endeavor.

Although Lean does allow you to declare theorems as axioms. It is not common practice, but you can skip high up the abstraction ladder and set up a foundation up there if you are confident those theorems are correct. But still defining those mathematical objects can be quite hard on its own, even if you skip the proving.

Anyways, the complexity of the Lean language itself doesn’t help either. The mode of thinking you need to have to operate it is much closer to programming than maths, and for those that think that the Rust borrow-checker is a pain, this is an order of magnitude more complex.

Lean was a significant improvement in ergonomics compared to the previous generation (Coq, Isabelle, Agda…), but still I think there is a lot of work to be done to make it mathematician-friendly.

Most reinforcement-learning AI for maths right now is focused on olympiad problems, hard but quite low in the maths abstraction ladder. Often they don’t even create a proof, they just solve problems that end with an exact result and you just check that. Perhaps the reasoning was incorrect, but if you do it for enough problems you can be confident that it is not just guessing.

On the other side of the spectrum you have mathematicians like Tao just using ChatGPT for brainstorming. It might not be great at complex reasoning, but it has a much wider memory than you do and it can remind you of mathematical tools and techniques that could be useful.

hewtronic · 2025-03-13T12:21:43 1741868503

> Anyways, the complexity of the Lean language itself doesn’t help either. The mode of thinking you need to have to operate it is much closer to programming than maths, and for those that think that the Rust borrow-checker is a pain, this is an order of magnitude more complex.

Could you elaborate on this? I'm interested to learn what the complexities are (beyond the mathematical concepts themselves).

oersted · 2025-03-13T13:00:37 1741870837

Found something I wrote last year, see below, but off the top of my head:

Something like 5 different DSLs in the same language, most of it in a purist functional style that is neither familiar to most mathematicians nor most programmers, with type-checking an order of magnitude more strict and complex than any programming language (that's the point of it), with rather obscure errors most of the time.

It's really tedious to translate any non-trivial proofs to this model, so usually you end up proving it again almost from scratch within Lean, and then it's really hard to understand as it is written. Much of the information to understand a proof is hidden away as runtime data that is usually displayed via a complex VSCode extension. It's quite difficult to understand from the proof code itself, and usually it doesn't look anything like a traditional mathematical proof (even if they kind of try by naming keywords with a similar terminology as in normal proofs and sprinkling some unicode symbols).

I never-ever feel like I'm doing maths when I'm using Lean. I'm fighting with the syntax to figure out how to express mathematical concepts in the style that it likes, always having several different ways of achieving similar things (anti Zen of Python). And I'm fighting with the type-checker to transform this abstract expression into this other abstract expression (that's really what a proof is when it boils down to it), completely forgetting about the mathematical meaning, just moving puzzle pieces around with obscure tools.

And even after all of this, it is so much more ergonomic than the previous generation of proof-assistants :)

---

I think that the main reasons for Lean's complexity are:

- That it has a very purist functional style and syntax, literally reflecting the Curry-Howard Correspondence (function = proof), rather than trying to bridge the gap to more familiar maths notation.

- That it aims to be a proof assistant, it is fundamentally semi-automatic and interactive, this makes it a hard design challenge.

- A lot of the complexity is aimed at giving mathematicians the tools to express real research maths in it, on this it has been more successful than any alternative.

- Because of this it has at least 5 different languages embedded in it: functional expressions of theorems, forward proofs with expression transformations, backward proofs with tactics, the tactics metaprogramming macro language, and the language to define data-types (and at least 4 kinds of data-types with different syntax).

xg15 · 2025-03-13T19:29:24 1741894164

Ah, thanks for the clarification. Then the whole thing makes a lot more sense - though I'd say the outlook also becomes more optimistic.

I thought the rhetoric sounded somewhat like the AGI/accelerationist folks who postulate some sort of eventual "godlike" AI whose thought processes are somehow fundamentally inaccessible to humans. So if you had a proof that was only understandable to this sort if AIs, then mathematics as a discipline of understanding would be over for good.

But this sounds like it would at least theoretically let you tackle the proof? Like, it's imaginable that some AI generates a proof that is several TB (or EB) in size but still validates - which would of course be impossible to understand for human readers in the way you can understand a paper. But then "understanding" that proof would probably become a field of research of its own, sort of like the "BERTology" papers that try to understand the semantics of specific hidden states in BERT (or similar approaches for GPTs).

So I'd see an incomprehensible AI-generated proof not as the end of research in some conjecture, but more as a sort of guidance: Unlike before, you now know that the treasure chest exist and you even have its coordinates, you just don't have the route to that location. The task then becomes about figuring out that route.

oersted · 2025-03-14T08:46:24 1741941984

In terms of having the proofs be understandable to humans, I'd say that's still a challenge. Not a fundamental challenge, but a real practical challenge still.

Lean makes the proofs verifiable, and sure to a degree understandable, but normal Lean proofs written by humans are already a challenge to understand without additional documentation, and, as with all code-generation, AI generated proofs are probably even harder to understand. There are many paths to arrive at a proof, the AI may take very messy routes that don't align with how humans create abstractions and assign meaning to break the process down into understandable steps.

In a sense all AI is understandable, nothing is hidden, you can see all the numbers inside a Transformer. Just being able to see the code might not necessarily help mathematicians too much in extracting deeper meaning and lessons from AI-generated proofs, at least not without significant additional work.

oersted · 2025-03-14T09:02:14 1741942934

There's an argument to be made that incomprehensible proofs are still very useful.

I understand that mathematicians are all about understanding why a theorem is true, not just checking that it is, and then using that understanding to push forward other research. Same as most scientists care more about understanding how the world works rather than developing methods to control and manipulate it.

But proven theorems are mathematical tools that can be used to make progress, regardless of your understanding of how they work. I think there is a path forward for maths as an engineering discipline, possibly with much faster progress, just as computer science expanded into software engineering. AI is just the next step of many, software libraries, interfaces and abstractions are also black-boxes in a sense that do stuff for you without knowing how.

A good engineer learns to build on the shoulders of giants, mastering the understanding of what tools do and how they behave, without necessarily needing to know how they are built from scratch. It might be less satisfying to the curious, but you can get a lot more done.

nicf · 2025-03-14T14:52:50 1741963970

Incomprehensible proofs are indeed still useful to some extent, and I don't think you'll find many mathematicians who would reject them as an answer to the binary question of whether the result is true.

But when you talk about "getting a lot more done," I want to ask, get a lot more done to what end? Despite what mathematicians sometimes write in their grant applications, resolving most of the big open problems in the field probably won't lead to new technologies or anything. To use the Riemann Hypothesis example again, most number theorists already think it's probably true, and there are a lot of papers being published already which prove things like "if the Generalized Riemann Hypothesis is true, then [my new result]".

No one is really waiting around just for the literal, one-bit answer to the question of whether RH is true; if we got that information and nothing else, I'm sure number theorists would be happy to know, but not a whole lot about the work being done in the field would change. It's not just being "satisfying to the curious"; virtually the entire reason we want a proof is to use the new ideas it would presumably contain to do more mathematics. This is exactly what's happened with the proof of the Poincare Conjecture, the only one of the Millennium Problems that's been resolved so far.

This is what I was lamenting in my comment earlier: the thing you're describing, where we set proof-finding models to work and they spit out verifiable but totally opaque proofs of big open problems in math, very well might happen someday, but it wouldn't actually be all that useful for anything, and it would also mean the end of the only thing about the whole enterprise that the people working in it actually care about.

xg15 · 2025-03-14T20:07:13 1741982833

Yeah, I imagine in those situations, an AI proof that the conjecture is false would probably be more interesting and useful than a proof that it is true.

A proof of the conjecture would essentially just move the situation from "we think there could be counterexamples, but so far we haven't found any" to "there really are no counterexamples anywhere, you can stop looking". The interesting thing here would be the explanation why there can be no counterexamples, which is exactly the thing that the proof wouldn't give you.

On the other hand, a counterproof would either directly present the community with a counterexample - and maybe reveal some interesting class of objects that was overlooked so far - or at least establish that there have to be counterexamples somewhere, which would probably give more motivation to efforts of finding one.

(Generally speaking here. I'm not a mathematician and I can't really say anything about the hypotheses in question)

nicf · 2025-03-14T20:25:37 1741983937

Yeah, that's definitely right --- an explicit counterexample to the Riemann Hypothesis would be very surprising and interesting, and I think that would be equally true no matter whether it was found by a person or a computer! The situation that would be mostly unhelpful is a certificate that the result is true that communicates nothing about why.