What are some good resources/courses/books for learning compilers, especially in a hands-on way? "Crafting Interpreters" seems to be well-liked, is that a decent place to start? Appreciate any other tips.
This paper is my favorite introduction to compilers, it's short and hands-on, goes from compiling a primitive program that does nothing but returns a single integer to a full-blown implementation of a real programming language in 24 small steps: http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf
I wrote a tutorial series for my students: Let’s make a Teeny Tiny compiler. It’s much smaller than Crafting Interpreters, so I’d recommend it before you go there to test the waters.
Writing an Interpreter in Go / Writing a Compiler in Go, the two part series by Thorsten Ball, was what I really cut my teeth on. It straightforwardly takes you through some parsing, ASTs, a basic tree-walk interpreter, and then in part two refactors the whole thing to compile to bytecode and introduces a virtual machine.
Like other sibling comments mention about Crafting Interpreters, I recommend following the book but implementing in another language you're familiar with. I ran through "Writing an Interpreter" in Go once, but felt like it stuck a lot harder when I went back through and did it again in Swift. Though you're still following code with the hard parts figured out, the simple act of translating the syntax will mean you're looking harder at what's actually happening.
Someone on here a few weeks ago also mentioned chapter 8 of Rob Pike and Brian Kernighan's "the UNIX Programming Environment". I've had an old eBay recovered copy of this book on my shelve for a few years, it's a neat intro to shell as well, but I've been working though chapter 8 this Memorial Day weekend to dip my toes back into the vintage C waters. The book builds up a simple calculator adding variables, builtins, complex control flow, and functions. It starts as a pure yacc parser, and then shifts to a strategy reminiscent of bytecode. This time I'm using a process I'm familiar with (language implementation) to sharpen my C skills.
By doing either of these projects, you also then have a testbed language where you can start developing more advanced concepts. GC, a type system, externalizeable bytecode, linking, the list goes on forever. About four years ago I started working on the Apex language at Salesforce, and the problems encountered in these projects are absolutely still relevant at the scale we're compiling and executing code across our datacenter.
EDIT: I'll also throw in... it gets a lot more fun when the parser's done. If you're finding recursive descent to be a bit of a brain twister, fight through!
Thanks for the thorough reply. I had seen that Crafting Interpreters used Java as the language, glad to hear that implementing in a different language is ideal.
Seconding reading Crafting Interpreters but writing with a different language. I took it as an excuse to hone my Rust skills, as well as learn the behind the scenes of the compiler. I don't think I would have learned nearly as much if I didn't have to read a paragraph, stop, and then really thoroughly examine the ideas/assumptions behind it to ensure an accurate translation.
Another strategy I used was to look at the title of the chapter, then work ahead as much as possible to implement that. It really helped my learning process to naturally explore the problem space myself, then read through and see how my naive attempts compared to a more seasoned implementation. But as the parent said, ymmv!
Keep in mind that Crafting Interpreters starts with Java, but switches to C later on. If you want to write your code in a different language (which I did), don't choose either of those.
I used C# for the first part, easy to translate from the Java code but you can make things more concise with modern C# features while still having access to a GC which is kinda necessary for the first part of the book.
I got rid of the code gen and visitor pattern stuff by using records in C# with pattern matching which felt a lot simpler to me.
Depending on where you're coming from, you can pick C, Java or Standard ML as implementation language to go along and implement the Tiger language in its variants across the book.
Here is my second attempt with interpreters, It is a 7 part series. it uses JSON as AST and mostly goes into what happens after parsing and things like how scope, variables and functions can be represented.
It isn't perfect and it might not contain all the best practices out there, consider it notes from a student struggling to learn about the topic