I think a Lisp compiler isn't quite as interesting for beginners as something that handles the more usual syntax with precedence etc., since it's so trivial that it leaves the "how does a 'real' compiler parse" question unanswered.
In that respect, I would suggest beginning with a simple recursive-descent parser that can handle the usual maths expressions - then extending it to e.g. all the operators in a C-like language is not a huge leap (table-driven precedence-climbing can be derived from recursive descent and makes this much easier), and it also leads to the opportunity for a compiler that can parse and compile itself - unless you are also writing the compiler in Lisp.
The AST stuff can come later; with expressions, the parser can perform evaluation as it parses, which gives the opportunity to write a calculator. Generating an AST is just replacement of evaluation with creation of tree nodes.
IMHO "it can compile itself" (as well as other non-trivial programs) is a good goal, and really makes people pay attention because it dispels the notion that compilers are somehow "magic".
> In that respect, I would suggest beginning with a simple recursive-descent parser that can handle the usual maths expressions - then extending it to e.g. all the operators in a C-like language is not a huge leap (table-driven precedence-climbing can be derived from recursive descent and makes this much easier), and it also leads to the opportunity for a compiler that can parse and compile itself - unless you are also writing the compiler in Lisp.
I'd suggest starting with a Top Down Operator Precedence ("Pratt") parser, they're beautiful magic for the working man, highly declarative and slightly mind-bending in that they seem so simple they shouldn't work, they make expressions with infix priority a joy to handle.
TDOP, Pratt, precedence-climbing, I think they're all referring to the same, if not extremely similar, method of using a loop in the main recursive function that compares precedences and decides whether to recurse or loop:
They're definitely very simple, and can be derived by refactoring a recursive descent parser and observing that there's no need to make a chain of nested recursive calls to the right level when the function can be called directly to "jump" to the desired level. The simplicity is probably why it's been rediscovered several times and has different names.
TDOP and Pratt are exactly the same thing, Vaughan Pratt is the inventor, Top Down Operator Precedence is the name he gave to the algorithm. And TDOP never "recurses or loops" (the precedence tells it whether it should stop looping). Part of the interest is how object-oriented it is, the main parsing loop ("expression" in the Crockford article) just tells a token "parse yourself as prefix" or "parse yourself as infix preceded by (AST)"
> since it's so trivial that it leaves the "how does a 'real' compiler parse" question unanswered.
I'm sure it is tempting to bake everything into this project. After implementing a "real" parser, you might want to implement a "real" transform, an optimizing step perhaps, you could keep going until it's totally unrecognizable (https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...)... the "give a mouse a cookie" effect applies here.
However, trying to teach everything at once isn't an effective way to teach people. I think this is largely why compilers are so unapproachable to most people.
I don't expect someone who has taken even just a simple Compilers 101 class to get anything out of this project. I'm hoping that I can make someone previously unaware of how compilers work to feeling somewhat comfortable with them. For a project that you can read through in 20-30 minutes, I think that's already a pretty big goal.
Self-hoisted compilers are cool, but that would contradict the goal of this project.
Shameless plug: I wrote a very compact and generic lexer, parser and interpreter for a small C-like language. I believe it can be of a great educational value for anyone who tries to understand the basic mechanics of parsing an imperative curly-brace language:
http://homepage.ntlworld.com/edmund.grimley-evans/cc500/
This one is fun too, since it contains both a bytecode compiler and its interpreter VM:
https://github.com/rswier/c4
(Previously discussed on HN at https://news.ycombinator.com/item?id=8576068 and https://news.ycombinator.com/item?id=8558822 )
I think a Lisp compiler isn't quite as interesting for beginners as something that handles the more usual syntax with precedence etc., since it's so trivial that it leaves the "how does a 'real' compiler parse" question unanswered.
In that respect, I would suggest beginning with a simple recursive-descent parser that can handle the usual maths expressions - then extending it to e.g. all the operators in a C-like language is not a huge leap (table-driven precedence-climbing can be derived from recursive descent and makes this much easier), and it also leads to the opportunity for a compiler that can parse and compile itself - unless you are also writing the compiler in Lisp.
The AST stuff can come later; with expressions, the parser can perform evaluation as it parses, which gives the opportunity to write a calculator. Generating an AST is just replacement of evaluation with creation of tree nodes.
IMHO "it can compile itself" (as well as other non-trivial programs) is a good goal, and really makes people pay attention because it dispels the notion that compilers are somehow "magic".