Timeline for Logic, λ-Calculus, and Programming Language Theory (2012) [pdf]

pron · on Sept 3, 2019

To those interested in the history of logic and computation, and the relationship between them (and algebra) -- which, BTW, was not at all accidental or surprising but completely by design -- I've compiled an anthology of mostly primary sources starting with Aristotle and going up to the 1950s: https://pron.github.io/computation-logic-algebra

lioeters · on Sept 4, 2019

Over the course of a few weeks, I've been going through your "History of Computation, Logic and Algebra". I just wanted to say, brilliant work, thank you! The way you present the history is really insightful in understanding the current of philosophy and ideas that led up to our modern conception and use of computation.

schwurb · on Sept 3, 2019

I just want to thank you for all you work! I have enjoyed your answers and content in the past, and I will continue to do so, be it as a PhD. or as a hobbyist :)

philzook · on Sept 3, 2019

Oh my god. Your article is incredible.

linguae · on Sept 3, 2019

I’ve taken some graduate courses in programming languages where I’ve learned about the lambda calculus and about type theory, and I really appreciate this post since this provides a nice timeline of how programming language theory has evolved. One topic I’m very interested in is how computation can be expressed in many different ways and the tradeoffs behind various expressions.

I apologize if this is off-topic, but I wonder if there has been work done on algorithm analysis using the lambda calculus? The texts that I’ve read on algorithms describe the algorithms using procedural code, and then the analysis is done on the amount of procedures that get called. However, I would imagine that the description would be different for non-procedural styles of programming. I’ve heard of a book called “Functional Data Structures” but I haven’t read it yet. I’m wondering if there has been work done on algorithm analysis at the lambda calculus level.

yaseer · on Sept 3, 2019

It's an interesting question that poses interesting problems.

The main problem is that beta reduction is an awkward operation from the perspective of computational complexity, which is a fairly fundamental concept in algorithm analysis. There's a discussion of that here:

https://cstheory.stackexchange.com/q/41776

P.S: Purely Functional Data Structures is a great book. It's focus is more on the implementation details of algorithms in a functional language (ML), and the necessary abstractions.

It's not an analysis of algorithms within a formal model, like lambda calculus.

olooney · on Sept 3, 2019

Right. The naive way to do complexity analysis of a lambda calculus algorithm is to count β-reductions; but it's immediately clear that the cost of a β-reduction (which is basically a string substitution) is proportional to the length of the program.

Here is a simple lambda calculus program:

    ((λ x . (x x)) (λ y . y))

If we apply the β-reduction rule once, we obtain:

    ((λ y . y) (λ y . y))

However, in order to perform this operation, we have to imagine the program as a string and stepping through the entire program symbol by symbol, either copying it unchanged (if it's not "x") or replacing it by "(λ y . y)" (if it is "x".) This in an O(m) algorithm, where m is the symbol length of the program. That is how a Turing machine emulating the lambda calculus would have to do it, for example. Surely we should prefer to analyze our algorithms in terms of atomic operations that can be implemented in O(1) on a Turing machine, no?

Space complexity is also an issue; if our program expands to m symbols, surely space complexity is at least O(m) as well?

Worse yet, β-"reductions" don't necessarily make the program shorter; and it can expand to arbitrary size before it starts to get smaller. It's not usually obvious how m (length of the program) is related to n (number of elements acted on, in the usual O(n) sense.)

An "awkward operation from the perspective of computational complexity" indeed!

vilhelm_s · on Sept 3, 2019

That's not immediately clear at all!

A β-reduction is basically the same as a function call, we know how to implement those, and they take constant time. Your example could be directly interpreted as a program in e.g. Lisp or ML, and there the function call is definitely not implemented by stepping through the entire program symbol by symbol.

I think the issue in the stackoverflow post above is that the lambda calculus is nondeterministic, so if you try to define the cost as the smallest number of β-reductions to reduce a term to a value, then you're in trouble because it's in general very hard to figure out what the optimal way of reducing a program would be. But that also shows that it's a completely unrealistic model. Surely what we should do is pick an evaluation order, e.g. call-by-value, and then count the number of reductions there. This corresponds to what actually happens when you run your program on a real computer.

In fact, CBV evaluation is really well-behaved, and you can given an extremely intuitive cost semantics to such a lambda calculus: see section 7.4 in [1], and also see [2] to show that this semantics is sound in the sense that you can compile it down to an abstract machine respecting the cost. (Incidentally, this is something that Bob Harper has been talking about for a long time, see e.g. his blog post [3] for some polemics.)

If anything, I think what this argument shows is that Turing machines are bad for complexity-theoretic purposes. We do not want to "analyze our algorithms in terms of atomic operations that can be implemented in O(1) on a Turing machine", we want to count atomic operations that can be executed in O(1) time on the actual computers we run stuff on. So Turing machines are ok if you just want to distinguish polynomial from non-polynomial time, but for finer-grained distinctions you need a more accurate model, like a RAM, or... like the lambda calculus!

[1] https://www.cs.cmu.edu/~rwh/pfpl/2nded.pdf [2] http://www.cs.cmu.edu/~blelloch/papers/BG95.pdf [3] https://existentialtype.wordpress.com/2011/03/16/languages-a...

olooney · on Sept 4, 2019

Consider the following lambda calculus expression:

    ((λ x . ((x x) (x x))) (λ x . ((x x) (x x))))

Under β-reduction, this blows up: doubling with every reduction, never repeating itself, growing infinitely long. Let's name this expression "BlowsUp". Recall that the Church encoding for "false" is "(λ x . (λ y . y))". Recall that this acts like an "if/else" statement, taking two arguments and always returning the second, discarding the first. Then its clear that this expression:

    ((false BlowsUp) (λ y . y))

or more explicitly,

    (((λ x . (λ y . y)) ((λ x . ((x x) (x x))) (λ x . ((x x) (x x)))) ) (λ y . y))

should evaluate to "(λ y . y)", the identity function. However, under a call-by-value evaluation strategy, which is eager, the value BlowsUp will need to be evaluated first. Only a lazy evaluation strategy will which first the "false" expression (thereby discarding the part of the program containing BlowsUp entirely, without evaluating it) will succeed.

This is an extraordinarily important case, because recursive functions are typically written with two branches: a base case that terminates, and a recursive case that goes on for ever. For example, consider everyone's favorite recursive function, factorial:

    def factorial(n):
        if n <= 1:
            return 1
        else:
            return n * factorial(n-1)

Translated into the lambda calculus via the Y combinator, this (and pretty much every other non-trivial recursive function) will exhibit the same property of the above, and will never terminate if evaluated under a strictly eager, call-by-value strategy. Therefore, specifying the call-by-value evaluation strategy is not useful for the purposes of evaluating non-trivial algorithms, which almost always involve recursion. (In the lambda calculus even unbounded iteration must be implemented via recursion so this excludes basically all algorithms of interest to complexity theorists.) Note that languages which use call-by-value for function calls usually have a separate, built-in construct for if/else or && (short-circuiting boolean AND) which allows the programmer to decide when they want to use the "lazy" strategy; the lambda calculus unfortunately is too elegant to admit of such a pragmatic distinction. :)

There is of course a so-called "normal" evaluation strategy for lambda calculus (which is guaranteed to work) but it is basically "call-by-name" (to contrast it with the call-by-value nomenclature) and it has the problems I describe above and is hard to analyze.

https://sookocheff.com/post/fp/evaluating-lambda-expressions...

vilhelm_s · on Sept 4, 2019

It's true that some constructions that work under normal/CBN order will go into an infinite loop under CBV order, but this is easy to work around, typically you just need to add an extra lambda to prevent it from evaluating too eagerly. For example, instead of translating

    if b then e1 else e2

as

    b e1 e2

you can translate it to

    (b (λx. e1) (λx. e2)) tt

where tt is any value (it doesn't matter what).

Similarly, you can make a CBV-version of the Y combinator by eta-expanding some subexpressions to prevent it from evaluating it too eagerly [0].

The exact details of the encoding don't matter too much, because when you actually write programs you want to use some syntactic sugar like "if then else" or "let rec", so you just care that it can be translated into lambda calculus in some way which preserves the operational semantics.

[0] https://en.wikipedia.org/wiki/Fixed-point_combinator#Strict_...

whatshisface · on Sept 3, 2019

What if there is a way to write lambda calculus programs that makes beta reduction "easier" in that sense? We are used to reading things like ((λ y . y) (λ y . y)), but there may be another notation with better properties. If there was, we wouldn't even have to use it, because simply the fact that it exists would justify using beta reduction counts as a measure of complexity.

jcora · on Sept 3, 2019

But this is clearly not a property of the particular notation, but lambda-calculus itself (and the way beta-reduction is defined). If you were to make it a constant-time operation in length it literally wouldn't be lambda-calculus.

Which actually makes me curious: what sort of change did you have in mind?

whatshisface · on Sept 3, 2019

If you could quickly translate a lambda-calculus expression into a program written in some other model of computation (that was easier to simulate on a Turing machine) and back, then you could do the complete reduction in the other model, only using lambda-calculus as your input and output. To perform the entire reduction you would still have to scan the expression in O(n), but it wouldn't necessarily be O(n * number of reduction steps). There could potentially be a relationship between the number of beta reductions and the number of steps required to execute the reduction in the other model, in which case beta reduction count would be justified as a measure of time.

jdyb · on Sept 3, 2019

Perhaps the work on binary lambda calculus and Kolmogorov complexity by John Tromp would be interesting to you?

http://tromp.github.io/cl/cl.html

"Functional Bits: Lambda Calculus based Algorithmic Information Theory" http://tromp.github.io/cl/LC.pdf

bodhi · on Sept 3, 2019

It’s not exactly what you were after, but there’s a paper called “The Expressive Power of Programming Languages” that might be worth a look:

https://www.sciencedirect.com/science/article/pii/0167642391...

reuben364 · on Sept 3, 2019

I found the OPLSS lectures on parallel cost semantics using lambda calculus quite enjoyable, but I don't really know much about it.

https://www.youtube.com/watch?v=lSc0EtF6_Zk

pwm · on Sept 3, 2019

(On my phone walking so very quick and dirty post) purely functional DSs in general will cost you an extra O(log n) cause of their underlying tree representation for immutability (immutability achieved by copying paths in the tree to retain old data). Also some DSs are very difficult to express in a purely functional way.

adjkant · on Sept 3, 2019

While true on the first part, these days many cases don't have insane primary efficiency needs that would be affected by that log(n). As always though, you choose the tools based on the job at hand, not the tools. In my experience, functional philosophy often is about the gained mental and consistency advantages achieved by the features offered.

pwm · on Sept 3, 2019

Don’t get me wrong, you are preaching to the choir (I write Haskell at work), just thought it’s worth mentioning this to the OP. In practice sophisticated compilers like GHC produce insanely fast binaries and is getting better and better.

continuational · on Sept 3, 2019

Array indexing is not O(1) on any modern machine due to the cache hierachy.

platz · on Sept 3, 2019

Philip Wadler showcases and comments on some of this history in his conference talk "Propositions as Types" https://www.youtube.com/watch?v=IOiZatlZtGU

proc0 · on Sept 3, 2019

This is talk is responsible for my outrage whenever people say math is invented.

dvt · on Sept 3, 2019

Homotopy Type Theory doesn't really contribute much to logic (it's basically just type theory with a few extra rules) or programming languages (computer-assisted proofs are easier in HoTT, but that's about it).

HoTT is a foundations of mathematics thing (think ZF/ZFC), so it's a bit weird to see it on the list. But it's kind of hot right now, so you see it just about everywhere.

fmap · on Sept 3, 2019

> Homotopy Type Theory doesn't really contribute much to logic

Of all the things in this list, this is the one you think is odd?

The idea behind homotopy type theory is a groundbreaking connection between type theory and abstract homotopy theory (really, higher-category theory) and had far reaching consequences in both areas.

For example, cubical type theory solves many problems with equality in type theory and it was only developed by understanding the homotopical models of type theory that were developed for HoTT. I could give more examples, but seriously, this alone is a major advance in logic.

adamnemecek · on Sept 3, 2019

It connects logic, topology and type systems. How is this not a big deal? It's also computation friendly.

dvt · on Sept 3, 2019

Logic and type systems have been connected since Frege (he coined "functional" and "non-functional" types[1]). Computation friendly is interesting, but not really revolutionary. Topology I'm not an expert in so I can't comment.

[1] On Function and Concept, 1891; http://fitelson.org/proseminar/frege_fac.pdf

david_draco · on Sept 3, 2019

It is missing Plankalkül by Zuse https://en.wikipedia.org/wiki/Plankalk%C3%BCl which heavily influenced ALGOL

kjeetgill · on Sept 4, 2019

This seems like a good a thread as any to ask my perennial question:

Is there some sort of logic or formalism that would let me algebraically represent a program like bubble sort, and then algebraically simplify it to a quick sort?

User23 · on Sept 3, 2019

Any timeline of logic that fails to include Charles Sanders Peirce[1] is woefully incomplete. He discovered the existential and universal quantifiers and did seminal work on the relational calculus, to name just a few items. If you haven't heard of him I strongly recommend at least reading the article I linked. He's a great American genius who is virtually forgotten.

[1] https://en.wikipedia.org/wiki/Charles_Sanders_Peirce

divan · on Sept 3, 2019

Is there any good book that provides an overview and reasoning behind all those concepts in this timeline?

vector_spaces · on Sept 3, 2019

There's A Path to Programming Language Theory, which is a list of resources for learning this material -- it seems pretty comprehensive, but as you can see it involves a lot of different resources. I'm not sure that any one book could give a decent overview of the field: https://steshaw.org/plt/

yaseer · on Sept 3, 2019

Not that I'm aware of.

If you produced a mind-map to show their inter-relatedness, It's quite a dense network.

A quick sketch of some of the nodes and vertices might start to look like:

Logic <--> Gödel's theorems

Gödel's theorems <--> Proof theory

Proof Theory <--> Linear Logic

Computability <---> Turing Machines

Computability <---> Lambda Calculus

Combinatory Logic <--> Lambda Calculus

Type Theory <--> Lambda Calculus

Homotopy Type Theory <--> Type Theory

Pi Calculus <--> Process Calculi/Distributed Systems

Category Theory <--> Everything...

And that's far from complete. I imagine synthesising all the concepts into a single book, with a cohesive narrative would be quite hard, without some deeper unifying theory, uniting all concepts. (Category theory and HTT may be the best contenders).

In this case, a Wiki might best capture the semantics of structure. Although I'd love to be proved wrong and find such a book!

chewxy · on Sept 3, 2019

ncatlab.org

DonaldPShimoda · on Sept 3, 2019

The more PL-focused aspects of this timeline (prior to the 1990s) are covered in Part 1 of "Semantics Engineering with PLT Redex". We used it for my operational semantics class and I thought it was good, but I also had one of the authors as the professor haha.