And once you understand the basics, you can move onto something more interesting...

masklinn · on March 31, 2016

> In that respect, I would suggest beginning with a simple recursive-descent parser that can handle the usual maths expressions - then extending it to e.g. all the operators in a C-like language is not a huge leap (table-driven precedence-climbing can be derived from recursive descent and makes this much easier), and it also leads to the opportunity for a compiler that can parse and compile itself - unless you are also writing the compiler in Lisp.

I'd suggest starting with a Top Down Operator Precedence ("Pratt") parser, they're beautiful magic for the working man, highly declarative and slightly mind-bending in that they seem so simple they shouldn't work, they make expressions with infix priority a joy to handle.

userbinator · on March 31, 2016

TDOP, Pratt, precedence-climbing, I think they're all referring to the same, if not extremely similar, method of using a loop in the main recursive function that compares precedences and decides whether to recurse or loop:

https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing

http://javascript.crockford.com/tdop/tdop.html

They're definitely very simple, and can be derived by refactoring a recursive descent parser and observing that there's no need to make a chain of nested recursive calls to the right level when the function can be called directly to "jump" to the desired level. The simplicity is probably why it's been rediscovered several times and has different names.

masklinn · on March 31, 2016

TDOP and Pratt are exactly the same thing, Vaughan Pratt is the inventor, Top Down Operator Precedence is the name he gave to the algorithm. And TDOP never "recurses or loops" (the precedence tells it whether it should stop looping). Part of the interest is how object-oriented it is, the main parsing loop ("expression" in the Crockford article) just tells a token "parse yourself as prefix" or "parse yourself as infix preceded by (AST)"

abecedarius · on March 31, 2016

They're the same algorithm expressed in different styles: structured for precedence climbing, and as you say OO or 'data driven' for Pratt.

https://github.com/darius/sketchbook/blob/master/parsing/pre...

https://github.com/darius/sketchbook/blob/master/parsing/pra...

thejameskyle · on April 1, 2016

Hi, author here.

> since it's so trivial that it leaves the "how does a 'real' compiler parse" question unanswered.

I'm sure it is tempting to bake everything into this project. After implementing a "real" parser, you might want to implement a "real" transform, an optimizing step perhaps, you could keep going until it's totally unrecognizable (https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...)... the "give a mouse a cookie" effect applies here.

However, trying to teach everything at once isn't an effective way to teach people. I think this is largely why compilers are so unapproachable to most people.

I don't expect someone who has taken even just a simple Compilers 101 class to get anything out of this project. I'm hoping that I can make someone previously unaware of how compilers work to feeling somewhat comfortable with them. For a project that you can read through in 20-30 minutes, I think that's already a pretty big goal.

Self-hoisted compilers are cool, but that would contradict the goal of this project.

bluetomcat · on March 31, 2016

Shameless plug: I wrote a very compact and generic lexer, parser and interpreter for a small C-like language. I believe it can be of a great educational value for anyone who tries to understand the basic mechanics of parsing an imperative curly-brace language:

https://github.com/bbu/simple-interpreter