Pierce's Types and Programming Languages is a phenomenal textbook. I'm almost done with my implementation of System F-omega (polymorphic lambda calculus with higher kinded types and type operators), featuring a full handwritten lexer/parser with helpful diagnostics.
My end goal is to use it as one phase of IR for a functional language compiler.
Another similar book is Harper's Practical Foundations for Programming Languages[1]. I had the privilege of taking a course[2] about this kind of stuff with Prof. Harper at CMU.
Congrats!! I’ve been trying to study the basics of type theory but without a more formal background in some of the mathematics/proofs i’m in a little over my head with those books. The red dragon book seems to have simpler / less mathematical explanations of basic type stuff so far
I would recommend "Software Foundations" also by Pierce. It covers a lot of the same material but in Coq. I found it much more approachable not having as much of formal background. Having proofs machine checked with error messages is a god send for your sanity when you don't have a professor there to validate you.
I have no formal education in type theory (or computer science), but I found the textbook I mentioned to be perfect for self-learning. There are some bits that are over my head (especially the proofs), but I've found it to be exceptionally accessible otherwise.
While I cannot find the exact quote from Martin Odersky[0], I do believe he once said something along the line of, "it takes about ten years to make a complete typed language."
If anyone also recalls this and has a link to the quote, I would much appreciate the pointer to it.
Curious: from what I can see, a type system is really just an embedded declarative DSL for doing set algebra.
So is there a technical reason why education and implementations always intertwine it with a larger client language, rather than treating it as a complete, self-contained entity in its own right? Or is that lack of decomposition just oversight?
A self-contained type-system is called "a logic". There are plenty of logics, many of which are not related to set algebra. Furthermore, many type systems are not directly related to set algebra (for instance, Rust's ownership). There are also more profound reasons (what is the type of a type, cf the notion of impredicativity and questions about decidability), but those are usually of little practical consequences for "mainstream" languages.
With only a logic, you can't show properties about the language. In particular, we like to show that programs that are well typed avoid some errors (like segfaults),
or that if an expression ( `log2(length(s))`) has a type (`int`), when you evaluate it to a value, the value (`4`) has the same type. If you want to consider such (nice!) properties, you need to consider the type system and the language together.
Some type systems have set theoretical semantics, but this does not usually correspond to the semantics of the program.
For instance the type ∀ A. (A → A) → A would be empty in the set theoretical semantics, but in many programming languages there are terms of this type – the fixed point operator.
A better framework for semantics of many programming languages is domain theory[0]. But it also has its limitations.
The type system types the terms of the language. Thus it has to be intertwined with them. One of the purposes is to allow only well-typed terms, which then have a better understood semantics.
You can count the number of type systems that are "logically OK" (the formal terms are sound (doesn't admit wrong programs, unlike e.g. Java), complete (can admit all correct programs (for some definition of "correct"), unlike e.g. Scala), decidable (always finishes)) on one hand. The rest amount to various amounts of hacks - common ones include specifying function parameter types to allow type inference to function, various escape hatches (like Object in Java and interface {} in Go), generics incorrect or limited to various degrees. Subtyping is tremendously hard. The specific hacks implemented in different programming languages are often implementation-specific (and arbitrary), and often result in weird interactions between different language features.
Basically, type systems == science (figuring out new things), programming languages = engineering (making new findings useful).
> You can count the number of type systems that are "logically OK" (the formal terms are sound (doesn't admit wrong programs, unlike e.g. Java), complete (can admit all correct programs (for some definition of "correct"), unlike e.g. Scala)... on one hand. The rest amount to various amounts of hacks
Huh? Rice's Theorem states that any static semantic analysis must either reject valid programs or accept invalid programs. This says nothing about whether a type system is rigorous or hacky.
> common ones include specifying function parameter types to allow type inference to function
It's true that type inference becomes undecidable without annotations for things like dependent types, and there's nothing hacky about that.
Rice's theorem only aplies to turing complete programs, so if youre "weak enough" there's no problem being both sound and complete. I think it's easy to see why people would think of such systems as "nicer" (even though the others are of course still rigorous).
>It's true that type inference becomes undecidable without annotations for things like dependent types, and there's nothing hacky about that.
But it's not just for more advanced type systems, type inference is undecidable for basically any system more complex than HM typing. So in practice virtually all programming languages with type systems that go beyond the absolue basics can't fullfill the "promise" of type inference, which is that you don't have to explicitly annotate types. I think if you take into consideration that even relatively "good" type systems like Haskell's occasionally require you to annotate not just functions, but 'random' terms, you might see why some people might consider it "hacky".
> a type system is really just an embedded declarative DSL for doing set algebra
I vaguely remember that there is some maths that tells us that types aren't sets, but can't recall the details. Something to do with Russel's Paradox and function types that refer to themselves as an argument? (I'm not a mathematician.)
Well, isn’t that true of everything? Being a bear of very little brain, it’s why I’m always searching for ways to decompose the larger problem into smaller, simpler, self-contained chunks. Plus, a less common perspective sometimes yields insights that the mainstream may miss.
Thanks for the inputs, folks. Will cogitate further.
I started reading but stopped when I saw "succ n" and "prev n". Unary numbers are so academic and disconnected from reality that I lose interest in any paper that uses them. Lambda calculus makes me yawn too. Guess I'll be making a programming language on my own to see how far I get without reading TAPL or any CS papers :-)
It's not so hard as it is hard to read because of the formalism. Instead of introducing lambda calculus and the fraction-like thingies with Greek letters, CS scientists should just use Python and write an implementation on the fly.
My end goal is to use it as one phase of IR for a functional language compiler.