> Checking the correctness of proofs is a much easier problem than coming up with the proof in the first place.
Just so this isn't misunderstood, not so much cutting-edge math is presently possible to code in lean. The famous exceptions (such as the results by Clausen-Scholze and Gowers-Green-Manners-Tao) have special characteristics which make them much more ground-level and easier to code in lean.
What's true is that it's very easy to check if a lean-coded proof is correct. But it's hard and time-consuming to formulate most math as lean code. It's something many AI research groups are working on.
> The famous exceptions (such as the results by Clausen-Scholze and Gowers-Green-Manners-Tao) have special characteristics which make them much more ground-level and easier to code in lean.
"Special characteristics" is really overstating it. It's just a matter of getting all the prereqs formalized in Lean first. That's a bit of a grind to be sure, but the Mathlib effort for Lean has the bulk of the undergrad curriculum and some grad subjects formalized.
I don't think AI will be all that helpful wrt. this kind of effort, but it might help in some limited ways.
Yes I definitely concur, I have spent significant time with it.
The main bottleneck is having the libraries that define the theorems and objects you need to operate at those levels. Everything is founded on axiomatic foundations and you need to build all of maths on top of that. Projects like mathlib are getting us there but it is a massive undertaking.
It’s not just that it is a lot of maths to go through, it’s also that most maths has not really been proven to this degree of exactitude and there is much gap-filling to do when trying to translate existing proofs, or the reasoning style might be quite distant to how things are expressed in Lean. Some maths fields are also self-consistent islands that haven’t been yet connected to the common axiomatic foundations, and linking them is a serious research endeavor.
Although Lean does allow you to declare theorems as axioms. It is not common practice, but you can skip high up the abstraction ladder and set up a foundation up there if you are confident those theorems are correct. But still defining those mathematical objects can be quite hard on its own, even if you skip the proving.
Anyways, the complexity of the Lean language itself doesn’t help either. The mode of thinking you need to have to operate it is much closer to programming than maths, and for those that think that the Rust borrow-checker is a pain, this is an order of magnitude more complex.
Lean was a significant improvement in ergonomics compared to the previous generation (Coq, Isabelle, Agda…), but still I think there is a lot of work to be done to make it mathematician-friendly.
Most reinforcement-learning AI for maths right now is focused on olympiad problems, hard but quite low in the maths abstraction ladder. Often they don’t even create a proof, they just solve problems that end with an exact result and you just check that. Perhaps the reasoning was incorrect, but if you do it for enough problems you can be confident that it is not just guessing.
On the other side of the spectrum you have mathematicians like Tao just using ChatGPT for brainstorming. It might not be great at complex reasoning, but it has a much wider memory than you do and it can remind you of mathematical tools and techniques that could be useful.
> Anyways, the complexity of the Lean language itself doesn’t help either. The mode of thinking you need to have to operate it is much closer to programming than maths, and for those that think that the Rust borrow-checker is a pain, this is an order of magnitude more complex.
Could you elaborate on this? I'm interested to learn what the complexities are (beyond the mathematical concepts themselves).
Found something I wrote last year, see below, but off the top of my head:
Something like 5 different DSLs in the same language, most of it in a purist functional style that is neither familiar to most mathematicians nor most programmers, with type-checking an order of magnitude more strict and complex than any programming language (that's the point of it), with rather obscure errors most of the time.
It's really tedious to translate any non-trivial proofs to this model, so usually you end up proving it again almost from scratch within Lean, and then it's really hard to understand as it is written. Much of the information to understand a proof is hidden away as runtime data that is usually displayed via a complex VSCode extension. It's quite difficult to understand from the proof code itself, and usually it doesn't look anything like a traditional mathematical proof (even if they kind of try by naming keywords with a similar terminology as in normal proofs and sprinkling some unicode symbols).
I never-ever feel like I'm doing maths when I'm using Lean. I'm fighting with the syntax to figure out how to express mathematical concepts in the style that it likes, always having several different ways of achieving similar things (anti Zen of Python). And I'm fighting with the type-checker to transform this abstract expression into this other abstract expression (that's really what a proof is when it boils down to it), completely forgetting about the mathematical meaning, just moving puzzle pieces around with obscure tools.
And even after all of this, it is so much more ergonomic than the previous generation of proof-assistants :)
---
I think that the main reasons for Lean's complexity are:
- That it has a very purist functional style and syntax, literally reflecting the Curry-Howard Correspondence (function = proof), rather than trying to bridge the gap to more familiar maths notation.
- That it aims to be a proof assistant, it is fundamentally semi-automatic and interactive, this makes it a hard design challenge.
- A lot of the complexity is aimed at giving mathematicians the tools to express real research maths in it, on this it has been more successful than any alternative.
- Because of this it has at least 5 different languages embedded in it: functional expressions of theorems, forward proofs with expression transformations, backward proofs with tactics, the tactics metaprogramming macro language, and the language to define data-types (and at least 4 kinds of data-types with different syntax).
> Many AI researchers are mathematicians. Any theoretical AI research paper will typically be filled with eye-wateringly dense math. AI dissolves into math the closer you inspect it. It's math all the way down.
There is a major caveat here. Most 'serious math' in AI papers is wrong and/or irrelevant!
It's even the case for famous papers. Each lemma in Kingma and Ba's ADAM optimization paper is wrong, the geometry in McInnes and Healy's UMAP paper is mostly gibberish, etc...
I think it's pretty clear that AI researchers (albeit surely with some exceptions) just don't know how to construct or evaluate a mathematical argument. Moreover the AI community (at large, again surely with individual exceptions) seems to just have pretty much no interest in promoting high intellectual standards.
I'd be interested to read about the gibberish in UMAP, I know the paper "An improvement of the convergence proof of the
ADAM-Optimizer" for the lemma problem in the original ADAM but hadn't heard of the second one. Do you have any further info on it?
> Each lemma in Kingma and Ba's ADAM optimization paper is wrong
Wrong in the strict formal sense or do you mean even wrong in “spirit”?
Physicists are well-known for using “physicist math” that isn’t formally correct but can easily be made as such in a rigorous sense with the help of a mathematician. Are you saying the papers of the AI community aren’t even correct “in spirit”?
Much physicist math can't be made rigorous so easily! Which isn't to say that much of it doesn't still have great value.
However the math in AI papers is indeed different. For example, Kingma and Ba's paper self-presents as having a theorem with a rigorous proof via a couple of lemmas proved by a chain of inequalities. The key thing is that the mathematical details are purportedly all present, but are just wrong.
This isn't at all like what you see in physics papers, which might just openly lack detail, or might use mathematical objects whose existence or definition remain conjectural. There can be some legitimate problems with that, but at least in the best cases it can be very visionary. (Mirror symmetry is a standard example.) By contrast I'm not sure what 'spirit' is even possible in a detailed couple-page 'proof' that its authors probably don't even fully understand. In most cases, the 'theorem' isn't remotely interesting enough as pure mathematics and is also not of any serious relevance to the empirical problem at hand. It just adds an impressive-looking section to the paper.
I do think it's possible that in the future there will be very interesting pure mathematics inspired by AI. But it hasn't been found yet, and I'm very certain it won't come from reconsidering these kinds of badly-written theorems and proofs.
There's a related section about 'mathiness' in section 3.3 of the article "Troubling Trends in Machine Learning Scholarship" https://arxiv.org/abs/1807.03341. I would say the situation has only gotten worse since that paper was written (2018).
However the discussion there is more about math which is unnecessary to a paper, not so much about the problem of math which is unintelligible or, if intelligible, then incorrect. I don't have other papers off the top of my head, although by now it's my default expectation when I see a math-centric AI paper. If you have any such papers in mind, I could tell you my thoughts on it.
If this were the case, I don't see why we'd need to wait for an AI company to make a breakthrough in math research. The key issue instead is how to encode 'real-life' statements in a formal language - which to me seems like a ludicrous problem, just complete magical thinking.
For example, how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language? What about "The theoretical basis of the October Revolution lay in a development of Marxism, but this development occurred through three successive rounds of theoretical debate"?
Or have I totally misunderstood what people mean when they say that developments in automatic theorem proving will solve LLM's hallucination problem?
You can't talk about formally verifiable truthiness until you solve epistemology. This can be achieved formally in mathematics, with known principal limitations. Here strict theorem-proving, Lean-style, is viable.
It can also be achieved informally and in a fragments way in barely-mathematical disciplines, like biology, linguistics, and even history. We have chains of logical conclusions that do not follow strictly, but with various probabilistic limitations, and under modal logic of sorts. Several contradictory chains follow under the different (modal) assumptions / hypotheses, and often both should be considered. This is where probabilistic models like LLMs could work together with formal logic tools and huge databases of facts and observations, being the proverbial astute reader.
In some more relaxed semi-disciplines, like sociology, psychology, or philosophy, we have a hodgepodge of contradictory, poorly defined notions and hand-wavy reasoning (I don't speak about Wittgenstein here, but more about Freud, Foucault, Derrida, etc.) Here, I think, the current crop of LLMs is applicable most directly, with few augmentations. Still a much, much wider window of context might be required to make it actually productive, by the standards of the field.
Hmmm I think even in something very nominally nearby like theoretical physics, there's very little that's similar to theorem proving. I don't see how AlphaProof could be a stepping stone to anything like what you're describing.
Generally, I think many people who haven't studied mathematics don't realize how huge the gulf is between "being logical/reasonable" and applying mathematical logic as in a complicated proof. Neither is really of any help for the other. I think this is actually the orthodox position among mathematicians; it's mostly people who might have taken an undergraduate math class or two who might think of one as a gateway to the other. (However there are certainly some basic commonalities between the two. For example, the converse error is important to understand in both.)
I don't think that mathematicians are actually in the best position to judge how math/logic might help in practical applications, because they are usually not interested in practical applications at all (at least for the last 70 years or so). Especially, they are also not interested in logic at all.
But logic is very relevant to "being logical/reasonable", and seeing how mathematicians apply logic in their proofs is very relevant, and a starting point for more complex applications. I see mathematics as the simplest kind of application of logic you can have if you use only your brain for thinking, and not also a computer.
"Being logical/reasonable" also contains a big chunk of intuition/experience, and that is where machine learning will make a big difference.
Probabilistic reasoning is possible in a formal setting; It produces a probability distribution over answers. To ground probabilistic logic itself I'm not aware of much progress beyond the initial idea of logical induction[0].
> For example, how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language? What about "The theoretical basis of the October Revolution lay in a development of Marxism, but this development occurred through three successive rounds of theoretical debate"?
> This takes for granted a formal setting, which is what I'm questioning in any of these 'real world' contexts.
A formal model of semantics would likely be a low-level physical representation of possible states augmented with sound definitions of higher-level concepts and objects. I don't think humans are capable of developing a formal semantics that would work for your sentences (it's taken us hundreds of years to approach formalization of particle physics), but I think that an automated prover with access to physical experiments and an LLM could probably start building a more comprehensive semantics.
People talk about how LLMs need "more data" but data is really sort of a blunt instrument to try and impart the LLM a lot of experience. Unfortunately data isn't actually experience, it's a record of an experience.
Now what we've seen with e.g. Chess and Go, is that when you can give a tensor model "real experience" at the speed of however many GPUs you have, the model is quickly capable of superhuman performance. Automatic theorem proving means you can give the model "real experience" without armies of humans writing down proofs, you can generate and validate proofs as fast as a computer will let you and the LLM has "real experience" with every new proof it generates along with an objective cost function, was the proof correct?
Now, this might not let us give the LLMs real experience with being a teacher or dealing with Marxist revolutionaries, but it would be a sea change in the ability of LLMs to do math, and it probably would let us give LLMs real experience in writing software.
This is a thing I'm working on, so I have some potentially useful thoughts. tl;dr, it doesn't have to be about encoding arbitrary real life statements to be super duper useful today.
> how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language?
Totally out of scope in the any near future for me at least. But that doesn't prevent it from being super useful for a narrower scope. For example:
- How might we take a statement like "(x + 1) (x - 5) = 0" and encode it formally?
- Or "(X X^T)^-1 X Y = B"?
- Or "6 Fe_2 + 3 H_2 O -> ?"?
We can't really do this for a huge swath of pretty narrow applied problems. In the first, what kind of thing is X? Is it an element of an algebraically closed field? In the second, are those matrices of real numbers? In the third, is that 6 times F times e_2 or 6 2-element iron molecules?
We can't formally interpret those as written, but you and I can easily tell what's meant. Meanwhile, current ways of formally writing those things up is a massive pain in the ass. Anything with a decent amount of common sense can tell you what is probably meant formally. We know that we can't have an algorithm that's right 100% of the time for a lot of relevant things, but 99.99% is pretty useful. If clippy says 'these look like matrices, right?' and is almost always right, then it's almost always saving you lots of time and letting lots more people benefit from formal methods with a much lower barrier to entry.
From there, it's easy to imagine coverage and accuracy of formalizable statements going up and up and up until so much is automated that it looks kinda like 'chatting about real-life statements' again. I doubt that's the path, but from a 'make existing useful formal methods super accessible' lens it doesn't have to be.
The quality of AI algorithms is not based on formal mathematics at all. (For example, I'm unaware of even one theorem relevant to going from GPT-1 to GPT-4.) Possibly in the future it'll be otherwise though.
There are some AI guys like Christian Szegedy who predict that AI will be a "superhuman mathematician," solving problems like the Riemann hypothesis, by the end of 2026. I don't take it very seriously, but that kind of prognostication is definitely out there.
link to this prediction? The famous old prediction of Szegedy was IMO gold by 2026 and that one is basically confirmed right? I think 2027/2028 personally is a breakeven bet for superhuman mathematician.
Side point, there is no existing AI which can prove - for example - the Poincaré conjecture, even though that has already been proved. The details of the proof are far too dense for any present chatbot like ChatGPT to handle, and nothing like AlphaProof is able either since the scope of the proof is well out of the reach of Lean or any other formal theorem proving environment.
what does this even mean? Surely an existing AI could reguritate all of Perelman's arxiv papers if we trained them to do that. Are you trying to make a case that the AI doesn't understand the proof it's giving? Because then I think there's no clear goal-line.
You don't even need AI to regurgitate Perelman's papers, you can do that in three lines of python.
What I meant is that there's no AI you can ask to explain the details of Perelman's proof. For example, if there's a lemma or a delicate point in a proof that you don't understand, you can't ask an AI to clarify it.
This has to come with an asterisk, which is that participants had approximately 90 minutes to work on each problem while AlphaProof computed for three days for each of the ones it solved. Looking at this problem specifically, I think that many participants could have solved P6 without the time limit.
(I think you should be very skeptical of anyone who hypes AlphaProof without mentioning this - which is not to suggest that there's nothing there to hype)
Very plausible, but that would also be noteworthy. As I've mentioned in some other comments here, (as far as I know) we outside of DeepMind don't know anything about the computing power required to run alphaproof, and the tradeoff between computing power required and the complexity of problems it can address is really key to understanding how useful it might be.
I agree that the result is important regardless. But the tradeoff of computing time/cost with problem complexity is hugely important to think about. Finding a proof in a formal language is trivially solvable in theory since you just have to search through possible proofs until you find one ending with the desired statement. The whole practical question is how much time it takes.
Three days per problem is, by many standards, a 'reasonable' amount of time. However there are still unanswered questions, notably that 'three days' is not really meaningful in and of itself. How parallelized was the computation; what was the hardware capacity? And how optimized is AlphaProof for IMO-type problems (problems which, among other things, all have short solutions using elementary tools)? These are standard kinds of critical questions to ask.
Though, if you start solving problems that humans can't or haven't solved, then questions of capacity won't matter much. A speedup in the movement of the maths frontier would be worth many power stations.
For some time a 'superhuman math AI' could be useful for company advertising and getting the attention of VCs. But eventually it would be pretty clear that innovative math research, with vanishingly few exceptions, isn't very useful for making revenue. (I am a mathematician and this is meant with nothing but respect for math research.)
As a mathematician, of course I agree. But in a sentence like:
> A speedup in the movement of the maths frontier would be worth many power stations
who is it 'worth' it to? And to what end? I can say with some confidence that many (likely most, albeit certainly not all) mathematicians do not want data centers and power stations to guzzle energy and do their math for them. It's largely a vision imposed from without by Silicon Valley and Google research teams. What do they want it for and why is it (at least for now) "worth" it to them?
Personally, I don't believe for a second that they want it for the good of the mathematical community. Of course, a few of their individual researchers might have their own personal and altruistic motivations; however I don't think this is so relevant.
There is something called applied math, and there is a big gulf between applied math and pure math. This new technology has the potential of making use of much more "pure math" for applied math, unifying much of pure math, applied math, and programming. I don't really care about the RH, I care about that.
> Finding a proof in a formal language is trivially solvable in theory since you just have to search through possible proofs until you find one ending with the desired statement. The whole practical question is how much time it takes.
No, in my experience the whole practical question is, can it be found automatically, or can it not be found automatically? Because there is an exponential search space that conventional automated methods will not be able to chew through, it either works, or it doesn't. AlphaProof shows that for some difficult IMO problems, it works.
More or less. Modern theorem provers, even fully automatic ones, can prove incredibly difficult problems if given enough time. With 3 days and terabytes of memory, perhaps they could? Would be interesting to compare Alphaproof with a standard theorem prover that is given similarly astronomical computing resources.
That is an interesting thought. But I doubt that standard automated theorem proving techniques would have solved this in a 100 years, even with immense resources, as all they do is (basically) brute force search, and it remains an exponential search space. Therefore 3 days does not really matter, indeed, 3 months would still be a result that 10 years ago I would have thought impossible.
Why have they still not released a paper aside from a press release? I have to admit I still don't know how auspicious it is that running google hardware for three days apiece was able to find half-page long solutions, given that the promise has always been to solve the Riemann hypothesis with the click of a button. But of course I do recognize that it's a big achievement relative to previous work in automatic theorem proving.
I don't know why so few people realize this, but by solving any of the problems their performance is superhuman for most reasonable definitions of human.
Talking about things like solving the Reimman hypothesis in so many years assumes a little too much about the difficulty of problems that we can't even begin to conceive of a solution for. A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.
Well, multiply two large numbers instantly is a superhuman feat a calculator can do. I would hope we are going for a higher bar, like “useful”. Let’s see if this can provide proofs of novel results.
That superhuman capability of "multiplying two large numbers instantly" has transformed human society like not even plumbing has. I really can't see how this you could not consider this useful.
It's worth emphasizing that it's been possible for years to use an automatic theorem prover to prove novel results. The whole problem is to get novel interesting results.
We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs. So, in order to evaluate AlphaProof's achievements, we'd need to know how much of a shortcut AlphaProof achieved. A good proxy for that would be the total energy usage for training and running AlphaProof. A moderate proxy for that would be the number of GPUs / TPUs that were run for 3 days. If it's somebody's laptop, it would be super impressive. If it's 1000s of TPUs, then less so.
> We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs.
It's a direct consequence of the format of a proof. They're finite-length sequences of a finite alphabet of symbols, so of course they're enumerable (the same algorithm you use to count works to enumerate them).
If you want a computer to be able to tell that it found a correct proof once it enumerates it, you need a bit more than that (really just the existence of automated proof checkers is enough for that).
I guess it is tautological from the definition of "provable". A theorem is provable by definition if there is a finite well-formulated formula that has the theorem as consequence (https://en.wikipedia.org/wiki/Theorem paragraph theorem in logic)
Not sure it’s a tautology. It’s not obvious that a recursively enumerable procedure exists for arbitrary formal systems that will eventually reach all theorems derivable via the axioms and transformation rules. For example, if you perform depth-first traversal, you will not reach all theorems.
Hilbert’s program was a (failed) attempt to determine, loosely speaking, whether there was a process or procedure that could discover all mathematical truths. Any theorem depends on the formal system you start with, but the deeper implicit question is: where do the axioms come from and can we discover all of them (answer: “unknown” and “no”)?
It's "obvious" in the sense that it's a trivial corollary of the completeness theorem (so it wouldn't be true for second order logic, for example).
Hilbert's program failed in no contradiction to what GP wrote because the language of FOL theorems is only recursively enumerable and not decidable. It's obvious that something is true if you've found a proof, but if you haven't found a proof yet, is the theorem wrong or do you simply have to wait a little longer?
The shortcut vs enumeration is definitely enormous right? just take average shannon entropy to the exponent of the length for example will be probably > heat death (or whatever death) of universe on all of human compute (I'm assuming I didn't bother to check but it seems intuitively true by a margin)
> A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.
What does this have to do with a hypothetical automatic theorem prover?
Being logical in social life is pretty much completely different from being logical in a mathematical argument, especially in a formal theorem proving environment. (Just try to write any kind of cultural proposition in a formal language!)
That's the way things are now, but this regime came about when proving things took intense concentration and specialized skills that very few people had. Contrast going to look something up in a library with googling something during a conversation.
Google stopped publishing interesting AI work since they had their AI lead taken away by OpenAI, and mostly with tech that was pioneered, but not monetised by Google like transformers.
I imagine they are under pressure not to make this mistake again.
Hinton was never in the running for a Fields medal since he never made a single contribution to the field of mathematics. His work is about empirical discoveries in CS.
Tao is famous for being the world's most singular and inspiring genius. It's kind of a meme position* that most people accept because they think everyone else accepts it, but for people of a certain inclination, it makes anything he might say into a Pronouncement to read depths into. If you think someone is possibly the greatest mathematician of all time, you'd be interested in everything they think!
(* I think one could very legitimately view him as the top researcher in harmonic analysis in the world - he is a great mathematician - but it's not clear to me how people go from that to Epochal Genius and his extreme celebrity status across STEM)
Just so this isn't misunderstood, not so much cutting-edge math is presently possible to code in lean. The famous exceptions (such as the results by Clausen-Scholze and Gowers-Green-Manners-Tao) have special characteristics which make them much more ground-level and easier to code in lean.
What's true is that it's very easy to check if a lean-coded proof is correct. But it's hard and time-consuming to formulate most math as lean code. It's something many AI research groups are working on.