> How does this even make sense? The whole point to specifying something abstractly is that you want to denote anything that satisfies the specification. And what proof are you talking about? Are you confusing “proof” with “typing derivation”? A single term may admit multiple typing derivations.
I am not confusing anything. Again, this is a matter of overloaded terms. When a typed language expression to compute a function Nat -> EvenNat is given, there are two computations that must be carried out: First, a proof that the function yields an even natural is computed (you call that type checking); second (that's the evaluation step), the actual computation is carried out, and an element in the set of even numbers is computed.
When you want to start counting operation is up to you. You can declare that the validation (type checking) phase is not counted. But the fact remains that if you know something -- i.e. that the result of the computation is a member of a certain set -- that knowledge must be gained at some computational cost. If you know it, then it means that something has paid that cost.
> You're assuming that you can write correct programs, without taking verification into consideration right from the beginning. This assumption is incorrect.
I am not assuming anything. When to do something is a practical, engineering concern. It is obviously of extreme importance. But from a theoretical point of view, the total computational effort required to prove a property of a program is the same (given the same conditions), no matter when it is carried out. Of course, engineering concerns move that complexity around between different machines -- the computer and the human programmer -- based on their respective strengths, but the total work is always the same. Just as to sort an array there's a minimum complexity required (O(n log n)) no matter how you do it, so too does writing and proving a program correct has an inherent complexity that has to be paid no matter what.
> First, a proof that the function yields an even natural is computed (you call that type checking); second (that's the evaluation step), the actual computation is carried out, and an element in the set of even numbers is computed.
As stated above, not all types denote sets. For instance, in a language with general recursion, (simple) types denote domains, not sets.
> But from a theoretical point of view, the total computational effort required to prove a property of a program is the same (given the same conditions), no matter when it is carried out.
You are ignoring two things:
(0) There is no magical way to implement the correct program if you don't have verification in mind right from the start. You will implement the wrong program, then attempt to fix it, then have another wrong program, then attempt to fix it, etc. That too has a computational cost.
(1) When you design your program upfront with verification in mind, you actually have a different mathematical object than when you don't. (Just like 2 as a member of the naturals is a completely different mathematical object from 2 as a member of the reals.) If your program design is split into modules (as it should be, if your program is large enough), then enforcing abstraction barriers across module boundaries constrains the size of the design space, often dramatically. (e.g., there are too many non-parametric total functions of type `forall a. a -> a` to even form a set, but there is only one parametric such total function: the identity function.) You note that verified properties could be thrown away after the verification is done, but why would anyone in their right mind want to?
Again, you're talking about engineering. That's great, but I'm talking about theory. You're saying that it's easier to build a building from the bottom upwards -- and that's true -- but theoretically a building would be just as strong if it were somehow built from the top down. Of course, no one would think to build skyscraper starting from the top, but that doesn't mean that the theory of structure must always incorporate the study of construction. I'm not talking about the process of construction, but the properties of the structure. I'm not arguing with any of what you've said here, simply talking about different aspects. Keep in mind that not all computational process -- certainly not all studied in TOC -- are programmed by humans at all. Computational processes in a living cell, for example, also have to ensure correctness -- and therefore spend computational resources on "verification" -- yet they don't use symbolic logic for that task. This means that there's value in studying the problem of verification even separately from the study of programming. Indeed, this is how it has mostly been done in academia. The software verification community uses very different concepts from the PLT community, concepts that are much closer to abstract machine models than to language models.
No, I'm not. I'm talking about math. (If you read my comment history, my distaste for “engineering” should be obvious, at least if by “engineering” we mean “software engineering”.)
> but theoretically a building would be just as strong if it were somehow built from the top down.
The amount of mental gymnastics required to produce this statement (not to mention expect someone else to believe it) completely baffles me.
> Computational processes in a living cell, for example, also have to ensure correctness -- and therefore spend computational resources on "verification" -- yet they don't use symbolic logic for that task.
Nature doesn't do “verification”, because there's no specification in the first place. Nature just evolves a variety of systems, and those that aren't fit enough to survive, well, just die. Also, the evolution of biological systems happens way too slowly to use it as inspiration for how to design computational systems for our own purposes.
Good. In that case, verifying that a program conforms to spec in the general case is of as much computational complexity no matter when it's carried out. This is proven and well known. For an overview, you can see my post here: http://blog.paralleluniverse.co/2016/07/23/correctness-and-c...
In short, the reason for that is that the every computational problem with a computable upper bound on complexity can be efficiently reduced to verification. The complexity required to verify is a function of the spec, and can be made arbitrarily hard.
> The amount of mental gymnastics required to produce this statement (not to mention expect someone else to believe it) completely baffles me.
OK, so let me use another analogy. That you can wind up a wind-up toy before I enter the room doesn't mean that just because you hid that from me, the toy requires any less external energy to run than when you'd wind it at any other time. Discussions of ergonomic ways to wind the toy (with a larger lever perhaps) may be interesting, but they're separate from this basic fact.
Verification is a computational problem, and one of a certain complexity. That complexity must be paid, no matter what, no matter when. The knowledge that a certain program has a certain property has a price-tag, and there are no discounts. Because all computational problems efficiently reduce to verification, if you believe you've found a loophole, I could use it to solve much bigger problems than ensuring that your recipe application is correct.
> Nature doesn't do “verification”, because there's no specification in the first place. Nature just evolves a variety of systems, and those that aren't fit enough to survive, well, just die.
You're wrong because, again, you're blind to computational complexity and to computational processes that aren't man-made programs. What you described is precisely how nature computes. That things that aren't fit enough to survive "just die" is a verification process by a genetic algorithm (that requires tremendous time and space complexity) to conform with the spec for genes -- i.e. life's "software" -- which is: reproduce.
Computational complexity results are used in the study of biological systems. Look at the work by Leslie Valiant (Turing award) or Stuart Kauffman (https://en.wikipedia.org/wiki/Stuart_Kauffman). Incidentally, TLA+ can also be used to model systems such as metabolic pathways, as those are very easily described as nondeterministic state machines: http://www.cosbi.eu/research/publications?pdf=5437
> Also, the evolution of biological systems happens way too slowly to use it as inspiration for how to design computational systems for our own purposes.
Maybe, but TOC doesn't study how to design computational systems for our own purposes. It's pure science. If its discoveries turn out to be useful, then that's a happy coincidence.
For that matter, neither does most of TOP for the past couple of decades or so. If you do math to study a system designed for use by a computationally-complex collaborator (the programmer in this case), you must study the collaborator as well. TOP shows no interest in such empirical study, and has therefore been utterly unable to prove that the approaches it advocates actually help write very large complex software systems; indeed it has been even unwilling to test this hypothesis, not even once.
> In that case, verifying that a program conforms to spec in the general case is of as much computational complexity no matter when it's carried out.
Then, as Dijkstra said, don't focus on the general case: “So, instead of trying to devise methods for establishing the correctness of arbitrary, given programs, we are now looking for the subclass of "intellectually manageable programs", which can be understood and for which we can justify without excessive amounts of reasoning our belief in their proper operation under all circumstances.” https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/E...
> The knowledge that a certain program has a certain property has a price-tag, and there are no discounts.
So what? If you're designing programs out of reusable components and you don't throw away their proofs of correctness, then much of the cost has already been paid by someone else before you even start.
> You're wrong because, again, you're blind to computational complexity and to computational processes that aren't man-made programs. What you described is precisely how nature computes. That things that aren't fit enough to survive "just die" is a verification process by a genetic algorithm (that requires tremendous time and space complexity) to conform with the spec for genes -- i.e. life's "software" -- which is: reproduce.
This still doesn't make sense. No biological system is “correct” in any meaningful sense. Correctness is always relative to a specification, and, even if your specification is as vague as “survive”, what's happening in nature is that some life forms are lucky enough to be fit for their current circumstances, not all possible ones.
> Then, as Dijkstra said, don't focus on the general case
Exactly, except that since Dijkstra gave that talk (early seventies) we have learned some important results in computational complexity. The bottom line is that we know that it is impossible to create a generally useful programming language (even not Turing-complete; even a finite-state machine) where every program is relatively cheap to verify (in 2000 it was proven that language abstractions can't significantly reduce verification cost even though it would have been reasonable to believe that they could; that result is discussed in my linked post). Whatever language is chosen, only a small subset of programs written in that language will be verifiable. Identifying that subset requires extensive empirical research. This is done in the software verification community; unfortunately the PL community is reluctant to conduct empirical research.
> much of the cost has already been paid by someone else before you even start.
Not true. See the Schnoebelen results quoted in my post that I linked to. Composing proofs of program correctness requires proof effort that is intractable in the complexity of each component, not in the number of components. This was proven in 2005. In technical terms, verification isn't Fixed-Parameter Tractable, or FPT, a relatively modern[1] complexity class created to incorporate problems that are intractable in the general case but may be tractable in practice. In simple terms, that each of your components has been proven correct still means that the difficulty of proving the composite's correctness is an intractable function (exponential at least) in the complexity of each component. Of course, theoretically it's better than nothing, but not by enough.
> is that some life forms are lucky enough to be fit for their current circumstances, not all possible ones.
That is only if you view each life form as the computational system. However, that genetic algorithm I was referring to is conducted by all living systems as a single computational system.
> Whatever language is chosen, only a small subset of programs written in that language will be verifiable.
You're still focusing on arbitrary programs. Just don't write arbitrary programs. Let your intended proof tell you what program you need to write.
> Composing proofs of program correctness requires proof effort that is intractable in the complexity of each component, not in the number of components.
So keep each component small.
> However, that genetic algorithm I was referring to is conducted by all living systems as a single computational system.
This is still not verification. It's just testing on a much larger scale.
> Just don't write arbitrary programs. Let your intended proof tell you what program you need to write.
Of course, but easier said than done. In other words, we haven't figured out how to do this yet. So far, the cost of proof exceeds what we're willing to pay for what needs proving. OTOH, researchers often focus not on the programs we need to write, but on the properties they can prove (searching under the lamppost, so to speak); this is the best way of getting results. Some of them then try to convince themselves and others that the properties they can prove of the programs they can prove them on are really those that need proving, and that the cost is affordable. Sadly, this wish hasn't been shown to be true just yet.
Look, I suggest that you try to write a large-scale formally verified program. Forget large; small and non-trivial will do. And I don't mean "verified" as in it type-checks in Haskell, because the Haskell type system can only prove extremely weak properties, but actually verified with respect to interesting logical properties. There is a reason why no one in the history of computing has never come close to achieving such a feat. Interestingly, you find that it is the people who actually do formal verification (you can watch talks by Xavier Leroy, who comes to verification from the PLT direction, or statements made by the people who worked on seL4) are usually the ones most restrained in discussing its possibilities. It can work, depending on what exactly you want to do, but requires a great investment, and end-to-end verification is possible only on very small programs (like seL4 or CompCert) and only at a prohibitive cost, that makes the process worth it only for a handful of very special projects.
Let me put it another way: so far, practice has shown to be very much in line with the difficulties predicted by the theory. Even Schnoebelen says, when discussing the result that language abstractions don't help, that, while true for the worst-case, "has been observed in many instances, ranging from reachability problems to equivalence problems, and can now be called an empirical fact"
Personally, I do formal verification extensively on large projects (much larger than CompCert or seL4), but I don't even attempt end-to-end or manual deductive proof because the cost of those becomes very clear very fast once you start trying these things on big systems. These observations have been reported, AFAIK, by every single industry or research unit that carries out significant software verification efforts.
Formal verification can and does work. But in order for it to work, you need to approach it humbly and know the limitations. Also, you need to know where best to spend your efforts, which is a bit of an art. When you do, you discover that maybe you've not found a way to consistently churn out bug-free programs, but that the quality of your programs has improved considerably.
Just to give you a taste of program proof, this is a simple exercise by Leslie Lamport:
Consider N processes numbered from 0 through N-1, in which each process i executes
x[i] := 1;
y[i] := x[(i-1) mod N]
and stops, where each x[i] initially equals 0. (The reads and writes of each x[i] are assumed to be atomic.) This algorithm satisfies the following property: after every process has stopped, y[i] equals 1 for at least one process i. It is easy to see that the algorithm satisfies this property; the last process i to write y[i] must set it to 1… The algorithm satisfies this property because it maintains an inductive invariant. Do you know what that invariant is?
If you can produce a machine-checked proof of this 2-line program that has no loops or recursion in under one hour, hats off to you. The difficulty of the problem grows super-linearly with the size of the code. People writing real programs, even those intended for verification, write programs that are several orders of magnitude more complex than this toy example.
> This is still not verification. It's just testing on a much larger scale.
It's verification (by an algorithm that is not common in software). The cost of verification comes from the knowledge you wish to obtain, not how you obtain it. Computational complexity doesn't care if you know something by proving it using FOL and natural deduction, dependent types, or by exhaustively trying each input. BTW, as far as the theory is concerned, a test is verification of a more specific property (e.g., the property that your program, when given input x, would, say emit output y followed by output z).
> Look, I suggest that you simply try to write a large-scale formally verified program. And I don't mean "verified" as in it type-checks in Haskell, because the Haskell type system can only prove extremely weak properties, but actually verified with respect to interesting logical properties.
This is exactly what I'm doing: a library of data structures and algorithms, where the verification effort is suitably split between the language and me. The language (in this case, Standard ML) ensures that abstract types can only be used abstractly, and I prove (using paper and pen) that the implementations of these abstract types are correct. A primary concern is simplicity, so if a module has too many invariants to manually prove, it is a natural candidate for splitting into smaller modules.
I'm not saying that Standard ML is the ideal language for this kind of thing, e.g., some limited form of dependent typing would be very helpful (as long as it doesn't compromise type inference), but it's much better than anything else I have tried (including Haskell).
> Do you know what that invariant is?
The invariant to maintain is “there exists some distinguished index i such that x[i] is written by process i before it is read by process (i+1) mod N”. If N = 1, then i = 0. If N = N' + 1, consider some valid interleaving of N' processes, with distinguished index i', and then consider the effect of interleaving one more process.
> The cost of verification comes from the knowledge you wish to obtain, not how you obtain it.
In this case, what I wish to ascertain is whether a living organism is fit to survive in a range of circumstances, most of which will never actually happen.
> BTW, as far as the theory is concerned, a test is verification of a more specific property (e.g., the property that your program, when given input x, would, say emit output y followed by output z).
Tests can verify atomic propositions. They can't verify that a predicate holds for all inhabitants of an infinite set.
That's excellent, but I didn't say "program" for nothing. Programs are usually interactive, with temporal components. But if data structures are your thing, try writing and verifying, say a concurrent R-Tree.
> The only invariant I can see is that “y[i] has been assigned to” implies “x[i] has been assigned to”. From this, we can see that “all y[i]s have been assigned to” implies “all x[i]s have been assigned to”. But this doesn't involve any induction on N so far.
Your reasoning is OK, but it shows how much you're thinking in terms of the language rather than more abstractly. It's also not enough to prove the property.
The crucial state here is the hidden state in the program counter. The induction is not on the number of processes, but on the steps of the algorithm. Each step executes one line of one process (chosen nondeterministically). The invariant, expressed in TLA+ is:
(2): this is the inductive bit; `pc` models the program counter
This is instantly verified with a model checker. A machine-checked proof is about 60 lines long (I try to avoid formal proofs as they are extremely laborious, but I guess that a more experinced machine-prover than me could do it shorter).
> Tests can verify atomic propositions. They can't verify that a predicate holds for all inhabitants of an infinite set.
Exactly, and each of those atomic propositions has a cost of verification. Moreover, we can prove (again, it's in my post) that we can't in general generalize from one input to another, so the cost of verification over an infinite number of inputs is infinite in the general case. Of course, it can be done in practice, but we can tell exactly how hard that would be: the cost of verification is proportional to the number of states in the algorithm -- the minimal algorithm that satisfies the full spec; not necessarily your actual program. Properties that are easily checked by simple type systems like those of Haskell and SML can always be satisfied by much simpler programs (in fact, a FSM or, if polymorphic recursion is used, then a PDA).
Errr, yes, you're right. I noticed I messed up after I posted my comment, then edited it accordingly. I don't see why I need to perform induction on the program counter, though.
I posit that, no matter how you interleave your processes, there is some distinguished index i such that x[i] is written by process i before it is read by process `(i+1) mod N`. To prove this, I use induction on N:
(0) If N = 1, then i = 0.
(1) If N = M + 1, inductively assume that we have a collection of M processes with distinguished index i. Now consider the effect of adding one more process, respecting the interleaving of the original processes:
(1.a) If the new process is added at any index other than `i+1`, then the original processes i and `(i+1) mod M` still have contiguous indices in the new collection, and the former still writes to the common variable before the latter reads it.
(1.b) If the new process is added at index `i+1`, then at least one of the two following things must happen:
(1.b.i) The original process i writes to x[i] before the new process reads it.
(1.b.ii) The new process writes to x[i+1] before the original process `(i+1) mod M` (now `(i+2) mod N`) reads it.
No analysis of the program counter was required, and as a result we get a far more beautiful proof. You used a sledgehammer, I used a sculptor's chisel.
> Properties that are easily checked by simple type systems like those of Haskell and SML can always be satisfied by much simpler programs (in fact, a FSM or, if polymorphic recursion is used, then a PDA).
I never claimed that I could delegate the entirety of the formal verification task to a type system like that of Haskell or SML. And, to be honest, I don't even think using computers to verify everything is such a good idea. What I want is automation for the boring and repetitive parts (the vast majority), so that I can concentrate exclusively on the hard ones. (Which isn't to say that I'm happy with the little amount of work Damas-Milner does for me. It's possible to do much better, without sacrificing automation.)
> and the former still writes to the common variable before the latter reads it.
This is not true. At no point in your induction is it proven that it is always the last process that writes x before the first reads it, nor is it necessarily the case.
> then at least one of the two following things must happen:
This, too, is untrue. The events may occur in the precise opposite order. If this were true, then it would have been an invariant that either process N-1 or process 0 would necessarily write 1, but this is not the case.
> No analysis of the program counter was required, and as a result we get a far more beautiful proof. You used a sledgehammer, I used a sculptor's chisel.
Excellent, except that my "sledgehammer" is a proof technique that's been proven complete, i.e. if a program property is provable, it is provable via an inductive invariance, which is amenable to mechanical proof checking (and, indeed, I mechanically checked it in TLA+). Good luck getting your proof (which is called "behavioral") -- even if it were correct -- to verify formally.
Chord is a distributed algorithm that was published in 2001 and quickly became popular. It got thousands of citations and plenty of industry implementations. In 2011 it even won a "test of time award". Its authors said, "Three features that distinguish Chord from many peer-to-peer lookup protocols are its simplicity, provable correctness, and provable performance". They, of course, provided proofs. However, in 2012, Pamela Zave, using a model checker, showed that "of the seven properties claimed invariant of the original version, not one is actually an invariant". The point is that when algorithms are concerned, "proofs" are often not proofs.
> I never claimed that I could delegate the entirety of the formal verification task to a type system like that of Haskell or SML.
Just to be sure: proof with pen and paper isn't formal verification. It is pretty the very definition of informal verification. Formal proof means using natural deduction or sequent calculus. This is too hard, so a machine-checked proof is acceptable. But any proof or any other kind of verification not carried out or verified by a computer is not formal.
> At no point in your induction is it proven that it is always the last process that writes x before the first reads it, nor is it necessarily the case.
I didn't say anything about “the last process”, though. I just took a collection of M processes, with some valid interleaving, and inserted one more process, at whatever index j, displacing all pre-existing proceses j...M-1 one place to the right.
>> then at least one of the two following things must happen:
> The events may occur in the precise opposite order.
Let A and B be processes with adjacent indices (for my purposes, M-1 and 0 are considered “adjacent”), such that A's write happens before B's read. If we insert a new process C in between A and B, then A and B no longer manipulate a shared variable, but instead A writes to a variable C reads, and C writes to a variable B reads. Then at least one of the two following things must be true:
(0) A's write happens before C's read.
(1) C's write happens before B's read.
This follows from these facts:
(0) A's write must happen before B's read.
(1) C's write must happen before C's read.
Was that too hard to understand? If it was, you need to relearn to prove things by hand.
> Just to be sure: proof with pen and paper isn't formal verification. It is pretty the very definition of informal verification.
While the proof I gave above is most certainly informal, you can prove things formally using pen and paper. The formality comes from the use of explicitly given deduction rules, not from the use of a computer.
> I didn't say anything about “the last process”, though
Of course. Sorry. For some reason my mind read i as N. It was late at night... I'm convinced the proof follows from your invariant, but the inductive step is a bit tricky, as i is not the same for every interleaving. My proof is simpler, as the invariant holds for -- and more importantly, maintained by -- every step made by the system, and there's no need to consider various interleavings. It also doesn't require induction on N.
> The formality comes from the use of explicitly given deduction rules, not from the use of a computer.
That's true and I said as much, but doing so is extremely laborious. Do you actually use natural deduction?
> but the inductive step is a bit tricky, as i is not the same for every interleaving
Ah, the joys of nondeterminism! It gives “there exists” an, ahem, “interesting” meaning. This is why I only do imperative and concurrent programming when it's absolutely necessary (e.g., when no functional and deterministic program could be asymptotically as efficient).
> My proof is simpler, as the invariant holds for -- and more importantly, maintained by -- every step made by the system
My invariant is prettier. It doesn't need to be maintained “throughout the computation”, because it's a static assertion about the program: “regardless of the interleaving of operations, there exists some i such that...” Almost invariably, the program is a much simpler mathematical object than the computations it gives rise to. Dijkstra noted “that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed.” https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/E...
> It also doesn't require induction on N.
Induction on the program counter is even harder! The inductive case of my proof doesn't even look at the entire computation: it looks at most at three processes at a time. It's literally less work than what you propose.
> Do you actually use natural deduction?
In the design of type systems, yes. If I want to prove something by hand, hell no. Natural deduction is much nicer to prove things about than it is to prove things in.
In TLA+ nondeterminism is as natural and graceful as determinism (it's a matter of ∨ vs ∧), and the functional concepts of parameterization and function types are just special cases. Nondeterminism is what allows refinement relations. You have it whenever you have types, but the language you use just doesn't use the same words to describe it.
> because it's a static assertion about the program
Mine is as static and as global as yours (and has the benefit of being easily verifiable mechanically)! You're just not trained to see it. You've been told that concurrency and imperative constructs are "dynamic" and are hard to reason about. That's just because FP is bad at reasoning about them, and prefers syntactic decomposition. There is nothing dynamic about a proper, pure description of what to you seems a messy dynamic process. The entire program is just a single, pure semantically unambiguous first-order predicate.
I had a discussion about this with a PLT person once, and he kept asking how you prove static properties in TLA+. It took me a while to understand what he meant, as "static" vs. "dynamic" are necessary when you want to reason about syntax. In simple logic, there's just a single formula.
> Almost invariably, the program is a much simpler mathematical object than the computations it gives rise to
You don't reason about each computation, but about what you'd call the "program", which in TLA+ is called the algorithm, as not to unnecessarily confuse with ideas that are related to syntax. The algorithm is just a boolean predicate, and my proof is that that predicate implies another (which implies the property).
Now when I look at how FP people reason about code I wonder why they have to complicate things so much. Sequential, concurrent, parallel, nondeterministic, deterministic, algorithms, properties -- they can all be reduced to a single first-order predicate, with the simple implication relation proving anything you want to know about any of them (that the program satisfies a property or a "type", that the deterministic program is a refinement of the nondeterministic one, that the parallel program computes the same result as the sequential one, that the sequential program has a worst-case complexity of such-and0such), all with simple, familiar logic and semantics that fit on a single page.
> You've been told that concurrency and imperative constructs are "dynamic" and are hard to reason about. That's just because FP is bad at reasoning about them, and prefers syntactic decomposition.
The hard fact is that, while you can reason about purely functional programs using little more than high-school algebra, reasoning about imperative programs requires bringing in the heavy apparatus of logic right from the start, because imperative programs denote predicate transformers. Distinguishing a class of computations that can be expressed as functional programs is distinguishing a class of computations whose meaning can be established with simpler tools, and thus obviously a beneficial activity.
> There is nothing dynamic about a proper, pure description of what to you seems a messy dynamic process. The entire program is just a single, pure semantically unambiguous first-order predicate.
Last time I checked, imperative programs don't denote first-order predicates, but rather predicate transformers. Quite different things. A predicate transformer is a higher-order mathematical object that relates two (possibly first-order) predicates - a precondition and a postcondition.
> You don't reason about each computation, but about what you'd call the "program", which in TLA+ is called the algorithm, as not to unnecessarily confuse with ideas that are related to syntax.
Your use of induction on the program counter reveals that you're just using the program as a proxy for a class of computations.
> The algorithm is just a boolean predicate,
Um, no. Predicates don't express computation.
> and my proof is that that predicate implies another (which implies the property).
Would you have been able to produce your proof entirely by yourself? That is, unaided by a computer. I produced my proof using nothing other than my brain.
> The hard fact is that, while you can reason about purely functional programs using little more than high-school algebra, reasoning about imperative programs requires bringing in the heavy apparatus of logic right from the start, because imperative programs denote predicate transformers.
1. It is immediately obvious that TLA+ is simpler than SML, yet as powerful as Coq; it is TLA+ that requires little more than high school math. You can reason with it about any kind of algorithm you like, using the same simple, familiar math (FOL, ZF plus three temporal operators, of which only one is really used for safety properties).
2. Imperative/functional is a language concept. In TLA+ there is no such thing as an imperative or a functional program. There is an algorithm that, say, computes a factorial in some sequence of steps. Is that sequence imperative or functional? Hard to say. TLA+ is pure, yet doesn't model an algorithm as a function.
3. Imperative programs do not denote predicate transformers. They denote whatever sound model you decide they do. The whole idea of "denotation" is a semantic one, i.e., tied to a specific language. E.g., in TLA+, all algorithms -- functional, imperative (although there's no difference), parallel, concurrent, quantum -- denote a set of behaviors, where each behavior is an infinite sequence of states.
Lamport's entire idea is that thinking of languages and semantics when reasoning about algorithms complicates things unnecessarily. Think of algorithms as abstract state machines, and reasoning becomes -- if not simple -- then as simple as possible (today, at least).
> Your use of induction on the program counter reveals that you're just using the program as a proxy for a class of computations.
That is what a program reasonably "denotes" (and what it, in fact, does in TLA+). But the proof works on the representation of the program, which is just a logical predicate.
> Um, no. Predicates don't express computation.
Um, yes, they do. In 1994 Leslie Lamport wrote a paper (2500 citations) that shows how they can, and that makes everything simpler. It is one of the best known and most widely used algorithm logics. It's nothing obscure.
> Would you have been able to produce your proof entirely by yourself?
Yes. First I thought of it, then I wrote it, then I added details until it checked. But let me tell you: if a proof assistant would ever make proofs easier than doing them in your head, then I'd be very happy. Usually they make things harder because they need lots of detail. Sometimes they're necessary if the proof is shallow but very big, and requires remembering lots of things. But, unfortunately, proof assistants don't help you uncover a proof. Model checkers can do that sometimes.
1. Why would that be astonishing? The "+" in TLA+ is FOL + ZFC, the same foundation used for virtually all of math (except some branches of logic).
2. I am not a logician so I don't know of the formal particulars of measure theory, of any limitation ZFC has in formulating it, or of any TLA+ specific limitations in that regard. However, TLA+ is not a general proof assistant (it can certainly be used to prove general math theorems, it just wasn't designed for that task), but a language for specifying and reasoning about algorithms and large software systems. In that, it is as powerful as Coq.
3. There are completeness proofs for TLA, i.e., that anything that you can state about a computation, you can state in TLA, and anything you can prove about a computation you can prove in TLA. Now, this isn't quite accurate: TLA is a linear temporal logic, and there are some things that you can state and prove in a branching-time temporal logics that you can't in a linear logic, nevertheless, as Moshe Vardi says in Branching vs. Linear Time: Final Showdown[1], it makes more sense to prefer linear time logic because empirical evidence suggests that it's far more useful. There is one glaring downside: TLA is unable to reason about probabilistic algorithms[2]. However, it is possible to embed reasoning about computation in the + part of TLA (i.e. ZFC) and reason about probabilistic algorithms in that way, and I assume that this is how it would be done in Coq, too. So you could do everything in TLA+ without the TLA bit, but that would be inconvenient, as the natural representation of algorithms in TLA is especially convenient and the main justification for the approach. So I would be very interested in seeing an extension of TLA meant to deal with probabilistic algorithms.
4. In the end, what matters isn't theoretical strength (by which Coq and TLA+ are equivalent in anything that concerns algorithms), but convenience. I am sure that there are some problems that are easier to formulate and reason in Coq than in TLA+. But the fact that no one has ever used Coq for a large software system, and that the handful of use cases in industry were always done in cooperation with academic experts while TLA+ is used in very large projects by "plain" developers speaks volumes to the strength of the approach.
[2]: You can describe any probabilistic algorithm in TLA (i.e., state that an algorithm performs a certain step with a certain profitability). What you can't do is then state a probabilistic theorem about that algorithm, as "in 70% of the cases, the algorithm gives the right answer".
Because Coq was designed (as I understand it) to help people prove theorems about mathematics, not about algorithms. If TLA+ -- a language designed to prove theorems about algorithms, not about mathematics -- is actually better for that purpose then a lot of people have wasted a lot of time.
Sure, algorithms are mathematical and much of mathematics can be recast as algorithmic, but still they have quite a different flavour. If a tool designed to help with one is actually the best tool to to help with the other, then that is astonishing.
> It is immediately obvious that TLA+ is simpler than SML, yet as powerful as Coq
What facilities for data abstraction does TLA+ have? How do I hide the representational choices of a module whose internal implementation I'm not currently concerned with?
> it is TLA+ that requires little more than high school math. You can reason with it about any kind of algorithm you like, using the same simple, familiar math (FOL, ZF
First-order logic is not “high-school math”. You can't expect most high schoolers to know how to correctly manipulate logical quantifiers. (Of course, a competent programmer should know, but that doesn't mean one should force themselves to use a more complicated tool when a simpler one is available.) And axiomatic set theories (such as ZF), which are built on top of first-order logic, are definitely not “high-school math”. Have you ever seen the crazy things that Harvey Friedman does with large cardinals?
> plus three temporal operators, of which only one is really used for safety properties).
That's the thing: I want to reason in a time-independent manner as much as possible. Time-invariant reasoning is simpler and mathematically more beautiful than exploring state spaces alongside with your processes.
> In TLA+ there is no such thing as an imperative or a functional program. (...) E.g., in TLA+, all algorithms (...) denote a set of behaviors, where each behavior is an infinite sequence of states.
Then in TLA+ all algorithms are expressed as imperative programs.
> Think of algorithms as abstract state machines, and reasoning becomes -- if not simple -- then as simple as possible (today, at least).
By virtue of exploring large state spaces?
> But the proof works on the representation of the program, which is just a logical predicate.
Which is it?
(0) Two programs that compute the same result, but have different efficiency characteristics, are represented as logically (not just syntactically) different predicates.
(1) You don't distinguish between programs that compute the same result, but have different efficiency characteristics.
The former is unusable as a basis for reasoning about extensional equality. The latter is unusable as a basis for reasoning about complexity.
> Yes. First I thought of it, then I wrote it, then I added details until it checked.
Checked by whom?
> But let me tell you: if a proof assistant would ever make proofs easier than doing them in your head, then I'd be very happy. Usually they make things harder because they need lots of detail.
I don't disagree with this one.
> But, unfortunately, proof assistants don't help you uncover a proof. Model checkers can do that sometimes.
Type inference reconstructs type derivations all of the time. Reliably. Of course, those type derivations are all boring routine, but it means that human programmers themselves aren't burdened with it.
> What facilities for data abstraction does TLA+ have?
The full strengths of set theory (sets, functions), plus sugar for sequences and records.
> How do I hide the representational choices of a module I'm not currently concerned with?
TLA+ has a simple module system. Definitions can be local or public.
> First-order logic is not “high-school math”. You can't expect most high schoolers to know how to correctly manipulate logical quantifiers.
Depends on the country I guess. But OK, first year college.
> And axiomatic set theories (such as ZF), which are built on top of first-order logic, are definitely not “high-school math”.
None of that has any bearing on specifying algorithms, though.
> That's the thing: I want to reason in a time-independent manner as much as possible. Time-invariant reasoning is simpler and mathematically more beautiful than exploring state spaces alongside with your processes.
That's only because you're used to thinking in a way that is tied to languages. I have yet to encounter a powerful type system as mathematically elegant as TLA+.
> Then in TLA+ all algorithms are expressed as imperative programs.
No. They're expressed as nondeterministic abstract state machines, written as a first order logic predicate. Each step of the machine is arbitrarily complex. You can specify a program in TLA+ that in each step solves a difficult SAT problem, or even the halting problem.
Imperative is an attribute of a programming language with programming language semantics; TLA+ is ordinary math and doesn't have any of those attributes. It's like saying that calculus is imperative. However, it certainly isn't functional because it doesn't identify the concept of a function with the concept of a computation.
I guess you could specify a functional program as a machine that computes its result in a single state (although I wouldn't). You could then model, I don't know, LLVM machine code, and shoe that the machine code algorithm is a refinement of the functional program.
> By virtue of exploring large state spaces?
I don't understand the question. The reasoning -- as any other formal reasoning -- is done by manipulating logical formulas. The state space is a property of the model, not of the logic.
> Checked by whom?
TLAPS, the TLA+ proof system.
> Type inference reconstructs type derivations all of the time.
It's done reliably because the properties they represent are kept very computationally simple. In TLAPS uses SMT solvers to reduce the effort, and where a type would be inferred, so would a proposition. The problem is with more interesting propositions.
> The full strengths of set theory (sets, functions), plus sugar for sequences and records.
Set theory doesn't have any facilities for data abstraction. If anything, it's the antithesis of data abstraction: every mathematical object (internal to your set theory of choice) is represented as a set (or urelement, but those are even weirder) somehow, even if the encoding is irrelevant for your purposes.
> TLA+ has a simple module system.
Hierarchical modules? Parameterized module families? Multiple views of the same module, showing or hiding different aspects of it? I use this stuff everyday not only to keep my programs modular, but also to keep their proofs of correctness modular.
>> And axiomatic set theories (such as ZF), which are built on top of first-order logic, are definitely not “high-school math”.
> None of that has any bearing on specifying algorithms, though.
You were the one who originally brought up ZF, describing it as “high-school math”, so I was just refuting that.
> That's only because you're used to thinking in a way that is tied to languages. I have yet to encounter a powerful type system as mathematically elegant as TLA+.
If elegance is your only criterion, it's very hard to beat Martin-Löf type theory. Of course, usability matters too, and there MLTT fails miserably, at least for programming purposes. Higher-order dependent types don't come for free.
> It's done reliably because the properties they represent are kept very computationally simple.
That's a feature. It helps you delimit the routine stuff (and can be reliably verified mechanically, and should be most of your program, if it's designed sensibly) from what requires creativity to prove.
That being said, of course, existing type systems don't automatically verify everything that could be considered “routine”, so this is an area where further improvement is necessary.
> The problem is with more interesting propositions.
Prove them by hand. Good old-fashioned brain usage is never going away.
---
By the way, I'd like to see your full proof, for the exercise you posted a while back. (“Prove that there exists some i such that y[i] = 1 at the end of the whole computation.”) Is it as short and sweet as my proof?
It was an informal statement :) In any event I said deduction rules or a computer, by which I meant some tool that can reduce the rote effort, and that we can be sure is itself correct.
There is still one glaring issue with your thesis that remains unresolved.
Verification is hard. We all agree. But that is for the worst case. Alternatively we can say that it is hard to verify arbitrary programs. I am utterly unconvinced that it is hard to verify programs that are designed to be verified, and I am utterly unconvinced that it is hard to design programs to be verified.
How should I convince myself that it is hard either (1) to verify even programs that are specifically designed to be verified, or (2) to write such programs?
(I don't mean hard in practice, I mean in theory.)
> How should I convince myself that it is hard either (1) to verify even programs that are specifically designed to be verified, or (2) to write such programs?
Didn't we discuss that issue on Reddit? Anyway, as the complexity of verification is proportional to the number of states. If you write the program from scratch, the number of states is dictated by your spec (as every computational problem, every function has an inherent computational complexity). Your spec can be arbitrarily hard.
Now, it's not a matter of worst-case. That writing and verifying arbitrary programs is the very same problem is easy to see: given a program A, and a property I wish to verify P, I then create a spec S = A ∧ P, i.e, the program you write must behave like this arbitrary program and satisfy P. So there is no difference between writing from scratch or verifying. Indeed, you can see that static analysis tools can often prove on arbitrary programs the very same properties that are enforced by type systems.
Of course, not every algorithm and property are worst-case, regardless of whether the program is being written or given. There are useful classes of programs/properties that are easier to verify, and the language can make useful restrictions. So far, examples that work well in practice are SPARK and SCADE. Each forbids memory allocation, including unrestricted stack growth.
> That writing and verifying arbitrary programs is the very same problem is easy to see: given a program A, and a property I wish to verify P, I then create a spec S = A ∧ P, i.e, the program you write must behave like this arbitrary program and satisfy P. So there is no difference between writing from scratch or verifying.
I don't follow. What has A got to do with it? I am given a spec P. You claim that it's hard to write a program that satisfies P, pretty much regardless of what P is. I don't see how A is relevant at all.
Yes, but P can be arbitrarily hard to implement. This is hard for you to see, so I'm incorporating A into P just as an example of how we can make the spec hard. If it were easy to write programs that conform to arbitrary specifications, it would have been just as easy to verify arbitrary programs, because I can reduce the latter to the former by making the arbitrary program the spec of the new one.
If you want another example of how a spec can be hard, consider this one: "write a program that for every input n, an even integer greater than two, outputs the smaller of two primes whose sum is n". Do you think writing that program and proving it correct is any easier than verifying the 10-liner in my post?
Or even this: "write a program that sorts a list of integers". You know for a fact that that program has a minimal time complexity of O(n lg n). That complexity, in turn, partly determines how hard it will be proving the program correct. You know that it will be easier than "write a program that returns the first element of its input list", whose time complexity is O(1) (actually the difficulty of verification is more related to the space complexity, because the space complexity is a better approximation of the number of states the algorithm must have in order to solve the problem, unless the algorithm can be a pushdown automaton).
Sure, I know that there are some specifications P that are hard to implement (and some that are impossible!). What's not clear to me is whether the specifications we care about implementing in practice are hard to implement.
In practice it's obvious that they're hard! Look at the bugs people make and look at the effort required for seL4/CompCert (both are fairly simple systems).
That's a good question and I don't think anyone has a definitive to that. I certainly don't. I think we should do empirical research and try to find out. What would be your guess and why?
I do have a some ideas, though. First, I think that the effort required for end-to-end proofs -- even for heavily simplified, relatively small programs -- does indicate that the problems are inherently hard because I don't think we're lacking some secret proof technique that would all of a sudden make things significantly easier. The difference between cases that are extremely hard (Goldbach's conjecture) and real world code doesn't seem big. Second, I think that empirical research might uncover something that can have a bigger impact. End-to-end proof is not a very important practical problem, because very few software projects ever need it or anything close to it. Improving correctness or reducing development costs (two sides of the same coin) for non high-assurance programs is much more important, and where the theory leaves much more wiggle room (because once you step away from actual watertight proof, basically anything is possible). I believe that empirical research can uncover certain common program patterns and bugs that can be analyzed/prevented, and that those may give a significant boost. But without empirical research, we won't even know which patterns/bugs to tackle.
I think that emphasis on deductive proofs is misplaced (in light of theory and practice) as it's hard to fine-tune[1], and indeed, the software verification community has been moving further and further away from that direction for a long while; since the seventies, I believe. It's the PL community that's decided this is interesting to them. Software verification is interested in many other approaches, like static analysis (that can verify some properties -- hopefully important -- some of the time), probabilistic methods, combination of formal methods and testing (like so called "concolic" testing) etc..
------
[1]: I'm referring to actual proofs with dependent types. Simpler type systems can be invaluable, if researchers only tried to figure out which properties are actually important. For example, Rust's borrow checker is extremely interesting, and tackles a class of bugs that's known to be both very harmful and easy to prevent mechanically. We should be looking for more of those. An example of what not to do if you want real-world impact, is the peculiar focus on effect systems. Other than shared state (which is a whole other matter), how did some PL researchers get it into their minds that effects are an important problem to tackle?
> That's a good question and I don't think anyone has a definitive to that. I certainly don't. I think we should do empirical research and try to find out. What would be your guess and why?
I have absolutely no idea. The entire focus of my software development philosophy for a decade or has been to make programs easier for humans to understand, not easier to write proofs about (although there's some overlap). I do wonder how big the weak but extremely useful class of properties (like Rust's memory safety) is.
> almost all (92%) of the catastrophic system failures
are the result of incorrect handling of non-fatal errors
explicitly signaled in software.
The exception-handling code is there, it's just that no thought has been put into it. They say that their simple linting tool could have detected and prevented a good portion of those very costly bugs.
I am not confusing anything. Again, this is a matter of overloaded terms. When a typed language expression to compute a function Nat -> EvenNat is given, there are two computations that must be carried out: First, a proof that the function yields an even natural is computed (you call that type checking); second (that's the evaluation step), the actual computation is carried out, and an element in the set of even numbers is computed.
When you want to start counting operation is up to you. You can declare that the validation (type checking) phase is not counted. But the fact remains that if you know something -- i.e. that the result of the computation is a member of a certain set -- that knowledge must be gained at some computational cost. If you know it, then it means that something has paid that cost.
> You're assuming that you can write correct programs, without taking verification into consideration right from the beginning. This assumption is incorrect.
I am not assuming anything. When to do something is a practical, engineering concern. It is obviously of extreme importance. But from a theoretical point of view, the total computational effort required to prove a property of a program is the same (given the same conditions), no matter when it is carried out. Of course, engineering concerns move that complexity around between different machines -- the computer and the human programmer -- based on their respective strengths, but the total work is always the same. Just as to sort an array there's a minimum complexity required (O(n log n)) no matter how you do it, so too does writing and proving a program correct has an inherent complexity that has to be paid no matter what.