Hacker News new | past | comments | ask | show | jobs | submit login

> "Oh, and good luck finding a book that explains how to write a type-checker, let alone one that covers more practical topics such as supporting sub-typing, generics, and so on"

I bought _Practical Foundations for Programming Languages_ by Harper and _Types and Programming Languages_ by Pierce and I just can't get through the first few pages of either of them. I would love to see a book as gentle and fun as _Crafting Interpreters_ but about making a static ML like language without starting with hardcore theory. (Bob, if you're listening, please make a sequel!)




I'm in the same boat as you -- here are the two best resources I found:

https://mukulrathi.com/create-your-own-programming-language/...

https://jaked.org/blog/2021-09-07-Reconstructing-TypeScript-...

I read through the first 10 chapters of TAPL, and skimmed the rest. The first 10 chapters were good to remind myself of the framing. But as far as I can tell, all the stuff I care about is stuffed into one chapter (chapter 11 I think), and the rest isn't that relevant (type inference stuff that is not mainstream AFAIK)

This is also good:

https://github.com/golang/example/blob/master/gotypes/README...

And yeah some of us had the same conversation on Reddit -- somebody needs to make a Crafting Interpreters for type checking :) Preferably with OOP and functional and nominal/structural systems.

---

Also, it dawned on me that what makes TAPL incredibly difficult to read is that it lacks example PROGRAMS.

It has the type checkers for languages, but no programs that pass and fail the type checker. You are left to kind of imagine what the language looks like from the definition of its type checker !! Look at chapter 10 for example.

I mean I get that this is a math book, but there does seem to be a big hole in PL textbooks / literature.

Also I was kinda shocked that the Dragon Book doesn't contain a type checker. For some reason I thought it would -- doesn't everyone say it's the authoritative compiler textbook? And IIRC there are like 10 pages on type checking out of ~500 or more.


Are we looking at the same Dragon book? Chapter 6 the whole chapter is about type checking. 6.2 spells out a type checker for a sample language, admittedly in pseudo code.

It might not have the same lingo as the modern type checking specification but most of the ideas are there. It talks about type systems, type expressions, type rules, constructed types like array/struct, type variables for unknown types, etc. It puts the type rules in the grammar productions. It stores the type info in the AST nodes. It uses the chained symbol tables as the type environments.

It has sections on type conversion, operator/function overloading, polymorphic functions (parameterized types), and type unification.

It has all the pieces and steps to build a type checker.


I have the second edition, copyright 2007, and chapter 6 is "intermediate code generation". The whole chapter is definitely not about type checking -- is yours?

https://www.amazon.com/Compilers-Principles-Techniques-Tools...

It has 11 sub-sections, 2 of which are about type checking

6.3 - Types and Declarations, pages 370-378

6.5 - Type Checking - pages 386 - 398

So that's about 20 pages out of 993 pages. A fraction of a chapter!

---

What I was referring to is the appendix "A Complete Front End", which contains a lexer and parser, but not a type checker! Doesn't sound complete to me.

I guess they have a pass to create the three-address code, and type checking is only in service of that. But they don't give you the source code!

---

Also weirdly, when you look at the intro chapter "The structure of a compiler", it goes

1.2.2 Syntax Analysis

1.2.3 Semantic Analysis (3 paragraphs that mentions type checking and coercions)

1.2.4 Intermediate Code Generation

But when you look at the chapters, there's NO semantic analysis chapter! It goes from (4) Syntax analysis, (5) Syntax-directed translation, to (6) intermediate code generation.

I think there's a missing chapter! (or two)


I have the 1st edition printed 1988. Chapter 6 is Type Checking, chapter 7 is Runtime Environments, and chapter 8 is Intermediate Code Generation.

WTF. The 2nd edition is really butchered.


Wow yeah. So in the first edition pages 343 to 388 is a full chapter on type checking. It looks much more coherent.

What the heck happened ...


Standard thing: older textbooks are generally better. They tend to get dumbed down in new revisions, and modified to fit stupid standards about class syllabus.


This reminds me that I actually bought the first edition of the GC handbook on purpose, because it had more C++ material, which was relevant to my project.

The second edition cut a bunch of C++ material out.

And yeah I don't think I missed anything -- I think it even had better diagrams.

So yeah I believe that general rule! I didn't realize that at the time I bought the Dragon Book.

Pretty sad how you can lose knowledge over time!

---

The books also respond to fashion -- e.g. Java was a popular teaching language, so they had to add more Java. I think that is reasonable in some ways but can backfire in the long term.

Although I guess I am not too excited about MIX assembly in TAOCP and so forth, so it's a hard problem


Obviously sometimes the updates are fixing real critiques of earlier editions. Those are good. But many times it is “fixing” critiques along the line of “cut out advanced material because it is too much to cover in 1 semester.” Makes sense in an academic context, but not for professionals like you and me that are looking for that kind of stuff.

Books also tend to become more sparse on the page, and with more pointless “illustrative” stock photos that add nothing and take up page real estate (leading to more advanced topics being dropped to reclaim space), but make the book seem more approachable to undergraduates.

Generally applies more to physics, chemistry, and math books than computer science, but it is a general phenomenon.


Thank you so much! This looks like some good morning-with-coffee reading!


The bigger challenge is doing the pattern matching and mainly the exhaustiveness checking of patterns which aren't constructors of a sum type, but for example integer intervals.

Simon Peyton Jones together with others wrote a relatively easy to read book about the writing of Miranda (Haskell), which even includes a simple explanation of lambda calculus, so it includes everything you need to know about the theory, more or less: "The Implementation of Functional Programming Languages" 1987, https://www.microsoft.com/en-us/research/wp-content/uploads/...


I highly recommend https://github.com/sdiehl/write-you-a-haskell as it is very developer friendly. It’s not complete, but it really gets the gears turning and will set you up for writing your own Hendley-Milner style type checker.


> I would love to see a book as gentle and fun as _Crafting Interpreters_ but about making a static ML like language without starting with hardcore theory. (Bob, if you're listening, please make a sequel!)

I don't think an ML-like language is the best option. I would start with type-checking something simple like C or Pascal: no overloading, generics, type inference, anonymous functions. Just plain old subroutines and declared types on everything, arrays and structs as the only composite data structures.

Then you could add implicit conversions, type inference for locals (not that hard), subtyping, target type inference for anonymous functions, generics and maybe overloads if you're brave enough.


I had a similar experience this year as I’m working on an ML-cousin to Golang. I actually found that ChatGPT (gpt4) was really good and breaking down the step of implementing a Hindley-Milner type system in a language I could understand. It could also provide follow up clarifications to my questions too. I then implemented it a small bit at a time and as the complexity grew I started to better understand the algorithm.

EDIT: ChatGPT could actually demonstrate the whole process for type checking small chunks of code, including: assigning them type variables, collecting constraints and then unification. It would even then point out if there was a type error in the code snippet!


I'm curious how you went. Here's my attempt: https://github.com/chewxy/hm , the core of which (i.e. the interface type) is what powers most of the languages I wrote (different languages I wrote have different unification schemes)


Cool to see an implementation in Go, nice! Here's where I got to with my first attempt: https://github.com/cobbweb/flite. There's definitely stuff I'd do a bit different, my second attempt is actually a project to learn Rust with, but I'm already considering switch to Ocaml haha!


I've mentioned this in a different comment, but I've written several articles on type inference, trying to make the topic more approachable:

https://gilmi.me/blog/tags/type%20inference

If you find that helpful, I've made more content on compilers (including live coding a compiler, without narration) which should be easily reachable from my website.


I would love to know more about what the author found difficult regarding type checking. As long as you don't screw up your semantics to the point of needing a full blown constraint solver there should be no issues.

Edit: by "screwing up semantics" I mostly mean the combination of overloading and implicit conversions, which is known to cause issues in Java, C++, etc.


The problem for me was that I had a rough idea of how to implement (essentially it's similar to a recursive-descent parser), but I had a difficult time finding appropriate resources to confirm or deny whether my idea was solid, as well as tips and what not.

Basically, the existing material is a bunch of existing incredibly complicated implementations, the odd blog post that just throws a bunch of code your way without really explaining the why/thought process behind it, and books that aren't worth the money.

The result is that you can of course piece things together (as I did), but it leaves you forever wondering whether you did it in a sensible way, or if you constructed some weird monstrosity.

To put it differently: you can probably build a garden by digging some holes and throwing a few plants around, but without the right resources it can be difficult to determine what the impact of your approach may be, and whether there are better ways of going about it. Oh and I'm aware there are resources on gardening, it's just a metaphor :)


As someone who has written many compilers and type checkers, I agree. There’s very little information online about the subject. I think part of the issue, for me, was psychological: I felt like I was missing some implementation theory that was presented alongside other compiler subjects (e.g., LALR) when the reality is that you’re mostly implementing very simple Boolean operations.


While helping a bit, it's difficult to learn or reverse engineer from existing type checking code because a lot of them are mundane repetitive code implementing some high level theory and algorithms. The actual code is too far remote from the theory. You really want to start from the theory and the overall picture.

The Dragon book has a chapter on type checking. It gives explanations on many topics. It has plenty of examples. It has type treatments on different areas of a language, like expression, array, struct, function, pointer, etc.

Despite being really old, its ideas and explanation on the topic are still relevant.


I think the larger writings like books are generally going to be written by academics so they'll be rigorous and hard to read.

So that leaves blog posts for most developers actually implementing this stuff.

But the solution here is that when you finally figure out a good strategy to deal with something muddy like this is to write that better blog post :)


I'm planning on doing something similar to what I did for pattern matching [1]: basically building something entirely standalone that fits in 2k LOC or so, and explains the basics (i.e. nominal typing plus basic sub-typing), hopefully such that people can then take that and extend it.

As for _when_ I'll do that, that depends on when I can convince my inner critic to actually commit to the idea :)

[1]: https://github.com/yorickpeterse/pattern-matching-in-rust


I've been implementing a language with a TypeScript-like type system, and while the core system is pretty intuitive (values combining with other values, determining whether basic value types fit into others), some of the fringes have been challenging:

- Generics, especially inferring generic type params

- Recursive types

- Deeply mutable vs immutable types

- Type refinement based on checks/usage

- Giving good error messages for deep type mismatches (though I'm not sure TypeScript itself has figured this one out yet, lol)

etc. And even the stuff I've figured out, I figured out almost all from scratch (sometimes having to bail and take a totally new approach midway through). I would love to read a book giving me a proper bottom-up walkthrough of the current landscape of type system implementation


I know nothing about how one would make a type checker, but it sort of feels like your comment is "As long as you don't encounter any issues, there won't be any issues", no?


TypeScript type-checking is quite complicated. I'm sure no human being on earth could pass a test of "does this compile"

But that's an exception, and deliberately decided to be complex.


Would type-inference also "screw up the semantics"? My understanding is that this generally operates via some kind of constraints solving.


Depends on how much type inference you want.

If you want ML-style total type inference when you expect the compiler to deduce the parameter and the return types of your functions, then yes.

Local variable type inference isn't hard at all, unless you want to do stuff like Rust: declare a variable without an initial value and infer its type from the first assignment downcode.


I’ve written about this at some length, but long story short: type modeling and semantic checking is intimately connected to the language semantics, so aside from very simple “look at the properties of the underlying nodes, derive properties of the current node”, it is hard to present any generic approach. In fact, there are many possible way to approach modeling types even with the exact same semantics, leading to different trade offs (just compare different C compilers and you’ll see!).

So this is actually harder to do that it might seem. The best advice I have is to contribute to a compiler or two and understand their semantic checking model.


> about making a static ML like language without starting with hardcore theory.

The book Applicative Order Programming: The Standard ML Perspective has you implement an ML-like language at the end (the book is mostly about programming in Standard ML itself). Chapter 10 covers type checking and type inference. Chapter 11 covers interpretation via graph reduction. Chapter 12 covers compilation of the ML-like language to an abstract stack machine. The code is all in Standard ML. It's very direct and practical without getting bogged down in theory, although it does talk about theory, and it's not as gentle and fun as Crafting Interpreters.

It was published in 1991. It's on libgen; I'd recommend downloading a PDF and reading those chapters in order to judge for yourself.


I've written a type checker or two. My experience: https://lambdaland.org/posts/2022-07-27_how_to_write_a_type_...

I left some links/references to books that helped me out.


Might this help? I wrote it: https://azdavis.net/posts/define-pl-01/


“Modern Compiler Implementation” by Appel is definitely worth a look. I read the ML version, and the code is really clean/readable (not sure about the C and Java versions).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: