To those interested in the history of logic and computation, and the relationship between them (and algebra) -- which, BTW, was not at all accidental or surprising but completely by design -- I've compiled an anthology of mostly primary sources starting with Aristotle and going up to the 1950s: https://pron.github.io/computation-logic-algebra
Over the course of a few weeks, I've been going through your "History of Computation, Logic and Algebra". I just wanted to say, brilliant work, thank you! The way you present the history is really insightful in understanding the current of philosophy and ideas that led up to our modern conception and use of computation.
I just want to thank you for all you work! I have enjoyed your answers and content in the past, and I will continue to do so, be it as a PhD. or as a hobbyist :)
I’ve taken some graduate courses in programming languages where I’ve learned about the lambda calculus and about type theory, and I really appreciate this post since this provides a nice timeline of how programming language theory has evolved. One topic I’m very interested in is how computation can be expressed in many different ways and the tradeoffs behind various expressions.
I apologize if this is off-topic, but I wonder if there has been work done on algorithm analysis using the lambda calculus? The texts that I’ve read on algorithms describe the algorithms using procedural code, and then the analysis is done on the amount of procedures that get called. However, I would imagine that the description would be different for non-procedural styles of programming. I’ve heard of a book called “Functional Data Structures” but I haven’t read it yet. I’m wondering if there has been work done on algorithm analysis at the lambda calculus level.
It's an interesting question that poses interesting problems.
The main problem is that beta reduction is an awkward operation from the perspective of computational complexity, which is a fairly fundamental concept in algorithm analysis. There's a discussion of that here:
P.S: Purely Functional Data Structures is a great book. It's focus is more on the implementation details of algorithms in a functional language (ML), and the necessary abstractions.
It's not an analysis of algorithms within a formal model, like lambda calculus.
Right. The naive way to do complexity analysis of a lambda calculus algorithm is to count β-reductions; but it's immediately clear that the cost of a β-reduction (which is basically a string substitution) is proportional to the length of the program.
Here is a simple lambda calculus program:
((λ x . (x x)) (λ y . y))
If we apply the β-reduction rule once, we obtain:
((λ y . y) (λ y . y))
However, in order to perform this operation, we have to imagine the program as a string and stepping through the entire program symbol by symbol, either copying it unchanged (if it's not "x") or replacing it by "(λ y . y)" (if it is "x".) This in an O(m) algorithm, where m is the symbol length of the program. That is how a Turing machine emulating the lambda calculus would have to do it, for example. Surely we should prefer to analyze our algorithms in terms of atomic operations that can be implemented in O(1) on a Turing machine, no?
Space complexity is also an issue; if our program expands to m symbols, surely space complexity is at least O(m) as well?
Worse yet, β-"reductions" don't necessarily make the program shorter; and it can expand to arbitrary size before it starts to get smaller. It's not usually obvious how m (length of the program) is related to n (number of elements acted on, in the usual O(n) sense.)
An "awkward operation from the perspective of computational complexity" indeed!
A β-reduction is basically the same as a function call, we know how to implement those, and they take constant time. Your example could be directly interpreted as a program in e.g. Lisp or ML, and there the function call is definitely not implemented by stepping through the entire program symbol by symbol.
I think the issue in the stackoverflow post above is that the lambda calculus is nondeterministic, so if you try to define the cost as the smallest number of β-reductions to reduce a term to a value, then you're in trouble because it's in general very hard to figure out what the optimal way of reducing a program would be. But that also shows that it's a completely unrealistic model. Surely what we should do is pick an evaluation order, e.g. call-by-value, and then count the number of reductions there. This corresponds to what actually happens when you run your program on a real computer.
In fact, CBV evaluation is really well-behaved, and you can given an extremely intuitive cost semantics to such a lambda calculus: see section 7.4 in [1], and also see [2] to show that this semantics is sound in the sense that you can compile it down to an abstract machine respecting the cost. (Incidentally, this is something that Bob Harper has been talking about for a long time, see e.g. his blog post [3] for some polemics.)
If anything, I think what this argument shows is that Turing machines are bad for complexity-theoretic purposes. We do not want to "analyze our algorithms in terms of atomic operations that can be implemented in O(1) on a Turing machine", we want to count atomic operations that can be executed in O(1) time on the actual computers we run stuff on. So Turing machines are ok if you just want to distinguish polynomial from non-polynomial time, but for finer-grained distinctions you need a more accurate model, like a RAM, or... like the lambda calculus!
Consider the following lambda calculus expression:
((λ x . ((x x) (x x))) (λ x . ((x x) (x x))))
Under β-reduction, this blows up: doubling with every reduction, never repeating itself, growing infinitely long. Let's name this expression "BlowsUp". Recall that the Church encoding for "false" is "(λ x . (λ y . y))". Recall that this acts like an "if/else" statement, taking two arguments and always returning the second, discarding the first. Then its clear that this expression:
((false BlowsUp) (λ y . y))
or more explicitly,
(((λ x . (λ y . y)) ((λ x . ((x x) (x x))) (λ x . ((x x) (x x)))) ) (λ y . y))
should evaluate to "(λ y . y)", the identity function. However, under a call-by-value evaluation strategy, which is eager, the value BlowsUp will need to be evaluated first. Only a lazy evaluation strategy will which first the "false" expression (thereby discarding the part of the program containing BlowsUp entirely, without evaluating it) will succeed.
This is an extraordinarily important case, because recursive functions are typically written with two branches: a base case that terminates, and a recursive case that goes on for ever. For example, consider everyone's favorite recursive function, factorial:
def factorial(n):
if n <= 1:
return 1
else:
return n * factorial(n-1)
Translated into the lambda calculus via the Y combinator, this (and pretty much every other non-trivial recursive function) will exhibit the same property of the above, and will never terminate if evaluated under a strictly eager, call-by-value strategy. Therefore, specifying the call-by-value evaluation strategy is not useful for the purposes of evaluating non-trivial algorithms, which almost always involve recursion. (In the lambda calculus even unbounded iteration must be implemented via recursion so this excludes basically all algorithms of interest to complexity theorists.) Note that languages which use call-by-value for function calls usually have a separate, built-in construct for if/else or && (short-circuiting boolean AND) which allows the programmer to decide when they want to use the "lazy" strategy; the lambda calculus unfortunately is too elegant to admit of such a pragmatic distinction. :)
There is of course a so-called "normal" evaluation strategy for lambda calculus (which is guaranteed to work) but it is basically "call-by-name" (to contrast it with the call-by-value nomenclature) and it has the problems I describe above and is hard to analyze.
It's true that some constructions that work under normal/CBN order will go into an infinite loop under CBV order, but this is easy to work around, typically you just need to add an extra lambda to prevent it from evaluating too eagerly. For example, instead of translating
if b then e1 else e2
as
b e1 e2
you can translate it to
(b (λx. e1) (λx. e2)) tt
where tt is any value (it doesn't matter what).
Similarly, you can make a CBV-version of the Y combinator by eta-expanding some subexpressions to prevent it from evaluating it too eagerly [0].
The exact details of the encoding don't matter too much, because when you actually write programs you want to use some syntactic sugar like "if then else" or "let rec", so you just care that it can be translated into lambda calculus in some way which preserves the operational semantics.
What if there is a way to write lambda calculus programs that makes beta reduction "easier" in that sense? We are used to reading things like ((λ y . y) (λ y . y)), but there may be another notation with better properties. If there was, we wouldn't even have to use it, because simply the fact that it exists would justify using beta reduction counts as a measure of complexity.
But this is clearly not a property of the particular notation, but lambda-calculus itself (and the way beta-reduction is defined). If you were to make it a constant-time operation in length it literally wouldn't be lambda-calculus.
Which actually makes me curious: what sort of change did you have in mind?
If you could quickly translate a lambda-calculus expression into a program written in some other model of computation (that was easier to simulate on a Turing machine) and back, then you could do the complete reduction in the other model, only using lambda-calculus as your input and output. To perform the entire reduction you would still have to scan the expression in O(n), but it wouldn't necessarily be O(n * number of reduction steps). There could potentially be a relationship between the number of beta reductions and the number of steps required to execute the reduction in the other model, in which case beta reduction count would be justified as a measure of time.
(On my phone walking so very quick and dirty post) purely functional DSs in general will cost you an extra O(log n) cause of their underlying tree representation for immutability (immutability achieved by copying paths in the tree to retain old data). Also some DSs are very difficult to express in a purely functional way.
While true on the first part, these days many cases don't have insane primary efficiency needs that would be affected by that log(n). As always though, you choose the tools based on the job at hand, not the tools. In my experience, functional philosophy often is about the gained mental and consistency advantages achieved by the features offered.
Don’t get me wrong, you are preaching to the choir (I write Haskell at work), just thought it’s worth mentioning this to the OP. In practice sophisticated compilers like GHC produce insanely fast binaries and is getting better and better.
Homotopy Type Theory doesn't really contribute much to logic (it's basically just type theory with a few extra rules) or programming languages (computer-assisted proofs are easier in HoTT, but that's about it).
HoTT is a foundations of mathematics thing (think ZF/ZFC), so it's a bit weird to see it on the list. But it's kind of hot right now, so you see it just about everywhere.
> Homotopy Type Theory doesn't really contribute much to logic
Of all the things in this list, this is the one you think is odd?
The idea behind homotopy type theory is a groundbreaking connection between type theory and abstract homotopy theory (really, higher-category theory) and had far reaching consequences in both areas.
For example, cubical type theory solves many problems with equality in type theory and it was only developed by understanding the homotopical models of type theory that were developed for HoTT. I could give more examples, but seriously, this alone is a major advance in logic.
Logic and type systems have been connected since Frege (he coined "functional" and "non-functional" types[1]). Computation friendly is interesting, but not really revolutionary. Topology I'm not an expert in so I can't comment.
This seems like a good a thread as any to ask my perennial question:
Is there some sort of logic or formalism that would let me algebraically represent a program like bubble sort, and then algebraically simplify it to a quick sort?
Any timeline of logic that fails to include Charles Sanders Peirce[1] is woefully incomplete. He discovered the existential and universal quantifiers and did seminal work on the relational calculus, to name just a few items. If you haven't heard of him I strongly recommend at least reading the article I linked. He's a great American genius who is virtually forgotten.
There's A Path to Programming Language Theory, which is a list of resources for learning this material -- it seems pretty comprehensive, but as you can see it involves a lot of different resources. I'm not sure that any one book could give a decent overview of the field:
https://steshaw.org/plt/
If you produced a mind-map to show their inter-relatedness, It's quite a dense network.
A quick sketch of some of the nodes and vertices might start to look like:
Logic <--> Gödel's theorems
Gödel's theorems <--> Proof theory
Proof Theory <--> Linear Logic
Computability <---> Turing Machines
Computability <---> Lambda Calculus
Combinatory Logic <--> Lambda Calculus
Type Theory <--> Lambda Calculus
Homotopy Type Theory <--> Type Theory
Pi Calculus <--> Process Calculi/Distributed Systems
Category Theory <--> Everything...
And that's far from complete. I imagine synthesising all the concepts into a single book, with a cohesive narrative would be quite hard, without some deeper unifying theory, uniting all concepts. (Category theory and HTT may be the best contenders).
In this case, a Wiki might best capture the semantics of structure. Although I'd love to be proved wrong and find such a book!
The more PL-focused aspects of this timeline (prior to the 1990s) are covered in Part 1 of "Semantics Engineering with PLT Redex". We used it for my operational semantics class and I thought it was good, but I also had one of the authors as the professor haha.