It's well written. It's clever. The comments style does a great job of explaining the concepts.
My only gripe is, although translating one high level language (lisp) to another (c) is technically a compiler, in practice most people think of something producing output at a lower level of abstraction than the input.
This would be even neater if it could emit assembly.
I get your point and agree about assembly, but I also think most people would agree that C is a lower level of abstraction than LISP.
One reason to "stop" at C might be that assembly itself tends to be a bit verbose, so not very suitable when aiming for shortness of code as a major feature. Generating assembly would perhaps obscure the things being taught with a bunch of details that are of course interesting, but sort of beside the point. Just guessing, I'm not the original author of course.
I guess I'm coming at this as a novice with a recent interest in compilers. I was working through a tutorial series for writing a compiler in Go, but it got frustrating when I realized the author's example language wasn't going to output anything lower than C.
I've since found a series that does show how to output to assembly (68000 no less) and I'm finding it much more rewarding.
I see the educational value in building a compiler to C if you want to introduce people to parsing, AST transformations, visitor pattern etc.
But a lot of interesting parts about a compiler are lost when you compile to a high-level language instead of machine code: code selection (to a large extent), register allocation, operation scheduling.
Doing a compiler backend right is much more challenging than doing a compiler frontend right, which is why tutorials usually skip it, and also why people are so grateful for LLVM, which gives a common compiler backend for about every platform imaginable.
"High level" is in the eye of the beholder, but I don't know that C is the best example to pick on: Before LLVM it was pretty common to target C as your portable assembly language.
Just to give an example, I think GHC targets C-- as one of
the steps in its transformations, and still outputs C when you're porting it to a new platform or building a cross-compiler. I don't know that it's useful to say that GHC ceases to be a compiler when you use it to target weird platforms.
And the Chicken Scheme compiler outputs C code[1] that doesn't look like a human wrote it: The C code uses a trampoline to organize function calls, and it periodically garbage-collects your stack cleaning up old frames as you call functions.
> Just to give an example, I think GHC targets C-- as one of the steps in its transformations, and still outputs C when you're porting it to a new platform or building a cross-compiler. I don't know that it's useful to say that GHC ceases to be a compiler when you use it to target weird platforms.
Cmm is GHC's final intermediate language, that's then passed to the machine code generator backend: asm, llvm or C (which is now only available un unregisterised mode, which as you note is mostly for getting started on new platforms)
Funfact, going from LISP to C (and back) is the whole of chapter 5 in SICP [1]. The last few exercizes in the book ask you to write a C compiler in Lisp, and then write a C compiler in the Lisp you just created [2].
That brings back fond memories. Back in the day on BIX¹ (is anyone else here old enough to remember BIX?) we were talking about different languages and compilers, and the topic of C++ came up. I said, "C++? That's a preprocessor, isn't it?"
Much to my embarrassment, Bjarne Stroustrup was hanging out in the same BIX conference, and he chewed me out for this - but very politely. Bjarne explained that Cfront was definitely not a preprocessor (which would imply something like the C preprocesssor), but an actual honest-to-goodness compiler that just happened to have C as its compilation target.
Now if we could only get the younger generation to abandon that horrible "transpiler" word and call things by their right name: a compiler!
This should not be a surprise. Consider what happens on a modern CPU after the preprocessors do their job, after the compilers do their job, when you're right down to machine code: the bare metal!
But really, how bare is that metal? Not at all.
Your so-called "machine language" just turns out to be another high level language that the CPU itself compiles down to the actual internal instructions that it executes, instructions that look nothing like what you thought was the machine language.
If you want a binary executable, yes. But that's sort of a tautology: If a program produces some output (like C source code or a binary), you usually run it because you intend to do something with its output later (like compile it or interpret it, or execute it on the machine if it's a binary).
Yes, either another compiler to translate it to machine code, or a C interpreter. Years ago, for example, there was a C interpreter that went by the name "Saber C," if memory serves.
>translating one high level language (lisp) to another (c) is technically a compiler
Is it not a transpiler? And a compiler from a programming language to a computer readable language like ASM?
Just asking, we can say compiler for everything I guess (I hear you Stratoscope), in fact transpiler has been new to me only recently with an article about Typescript.
My only gripe is, although translating one high level language (lisp) to another (c) is technically a compiler, in practice most people think of something producing output at a lower level of abstraction than the input.
This would be even neater if it could emit assembly.