Super Tiny Compiler

Stratoscope · on March 31, 2016

This is really quite beautifully done. If you're impatient, go straight to the JavaScript code (but the README is worth reading too):

https://github.com/thejameskyle/the-super-tiny-compiler/blob...

For someone like me who always goes, "Comments? Meh. The code should be the comments!" it's an eye-opener.

The code has a lot of comments. But not this kind:

  i++;  // Increment i

It's about 200 lines of actual code plus another 500 lines of comments explaining what is going on. And 75 lines of old-school ASCII header art like you used to see on line printers back in the '60s, just to appeal to old-timers like me.

It takes you through tokenizing raw code, parsing the tokens into an AST, transforming the AST into a form the code generator can use, and then generating code from it.

A great example of how to teach a concept with a small amount of code and informative comments.

userbinator · on March 31, 2016

And once you understand the basics, you can move onto something more interesting like this self-compiling compiler:

http://homepage.ntlworld.com/edmund.grimley-evans/cc500/

This one is fun too, since it contains both a bytecode compiler and its interpreter VM:

https://github.com/rswier/c4

(Previously discussed on HN at https://news.ycombinator.com/item?id=8576068 and https://news.ycombinator.com/item?id=8558822 )

I think a Lisp compiler isn't quite as interesting for beginners as something that handles the more usual syntax with precedence etc., since it's so trivial that it leaves the "how does a 'real' compiler parse" question unanswered.

In that respect, I would suggest beginning with a simple recursive-descent parser that can handle the usual maths expressions - then extending it to e.g. all the operators in a C-like language is not a huge leap (table-driven precedence-climbing can be derived from recursive descent and makes this much easier), and it also leads to the opportunity for a compiler that can parse and compile itself - unless you are also writing the compiler in Lisp.

The AST stuff can come later; with expressions, the parser can perform evaluation as it parses, which gives the opportunity to write a calculator. Generating an AST is just replacement of evaluation with creation of tree nodes.

IMHO "it can compile itself" (as well as other non-trivial programs) is a good goal, and really makes people pay attention because it dispels the notion that compilers are somehow "magic".

masklinn · on March 31, 2016

> In that respect, I would suggest beginning with a simple recursive-descent parser that can handle the usual maths expressions - then extending it to e.g. all the operators in a C-like language is not a huge leap (table-driven precedence-climbing can be derived from recursive descent and makes this much easier), and it also leads to the opportunity for a compiler that can parse and compile itself - unless you are also writing the compiler in Lisp.

I'd suggest starting with a Top Down Operator Precedence ("Pratt") parser, they're beautiful magic for the working man, highly declarative and slightly mind-bending in that they seem so simple they shouldn't work, they make expressions with infix priority a joy to handle.

userbinator · on March 31, 2016

TDOP, Pratt, precedence-climbing, I think they're all referring to the same, if not extremely similar, method of using a loop in the main recursive function that compares precedences and decides whether to recurse or loop:

https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing

http://javascript.crockford.com/tdop/tdop.html

They're definitely very simple, and can be derived by refactoring a recursive descent parser and observing that there's no need to make a chain of nested recursive calls to the right level when the function can be called directly to "jump" to the desired level. The simplicity is probably why it's been rediscovered several times and has different names.

masklinn · on March 31, 2016

TDOP and Pratt are exactly the same thing, Vaughan Pratt is the inventor, Top Down Operator Precedence is the name he gave to the algorithm. And TDOP never "recurses or loops" (the precedence tells it whether it should stop looping). Part of the interest is how object-oriented it is, the main parsing loop ("expression" in the Crockford article) just tells a token "parse yourself as prefix" or "parse yourself as infix preceded by (AST)"

abecedarius · on March 31, 2016

They're the same algorithm expressed in different styles: structured for precedence climbing, and as you say OO or 'data driven' for Pratt.

https://github.com/darius/sketchbook/blob/master/parsing/pre...

https://github.com/darius/sketchbook/blob/master/parsing/pra...

thejameskyle · on April 1, 2016

Hi, author here.

> since it's so trivial that it leaves the "how does a 'real' compiler parse" question unanswered.

I'm sure it is tempting to bake everything into this project. After implementing a "real" parser, you might want to implement a "real" transform, an optimizing step perhaps, you could keep going until it's totally unrecognizable (https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...)... the "give a mouse a cookie" effect applies here.

However, trying to teach everything at once isn't an effective way to teach people. I think this is largely why compilers are so unapproachable to most people.

I don't expect someone who has taken even just a simple Compilers 101 class to get anything out of this project. I'm hoping that I can make someone previously unaware of how compilers work to feeling somewhat comfortable with them. For a project that you can read through in 20-30 minutes, I think that's already a pretty big goal.

Self-hoisted compilers are cool, but that would contradict the goal of this project.

bluetomcat · on March 31, 2016

Shameless plug: I wrote a very compact and generic lexer, parser and interpreter for a small C-like language. I believe it can be of a great educational value for anyone who tries to understand the basic mechanics of parsing an imperative curly-brace language:

https://github.com/bbu/simple-interpreter

noselasd · on March 31, 2016

>The code has a lot of comments. But not this kind:

    i++;  // Increment i

It does however have this:

    // Then we increment `current`
        current++;

Buy yes - the comments in the project is written as a tutorial, and are very good.

Stratoscope · on March 31, 2016

Aha! Good catch, especially given my comment about the comments.

Yeah, I wouldn't write that comment in production code either. But as you said, it's a tutorial, so why not go for it? :-)

wingerlang · on March 31, 2016

I sometimes comment code like this, including the odd "increment". Because omitting it would disrupt the flow of the code+comments.

sleepychu · on March 31, 2016

I've not read the example and I take your point but I'd much rather see "Then we increment the current so that..."

nazka · on March 31, 2016

Maybe at the edge when it doesn't mean anything straightforward like a 'car++' in a meta class where a better name won't help or would be too long. But ya.

Joky · on March 31, 2016

I wish every project I had to work on that wasn't "new" would have been written as a tutorial...

herbst · on March 31, 2016

   // And a `tokens` array for pushing our tokens to.
   var tokens = [];

   // We're going to create a `value` string that we are going to push
   // characters to.
   var value = '';

Sure OP tried to be super descriptive but there a few comments that you simply don't need at all with at least nice a little understanding.

thejameskyle · on March 31, 2016

Author here. I actually walked through all of this code line by line in a conference talk using another thing I built: https://github.com/thejameskyle/spectacle-code-slide

I ripped these code comments from my presenter notes where I was explaining everything in case people couldn't read it.

I'll have to go back in and clean it up

eveningcoffee · on April 13, 2016

Please DO NOT. You are right in your other comment that it helps with the flow and for example in the case of

   // We're going to create a `value` string that we are going to push
   // characters to.
   var value = '';

your comment declares you intention in advance, even if it might be trivial to deduce.

abiox · on March 31, 2016

some people get really weird about comments.

in a case like this, i actually like the dialogue/conversation nature of the comments. i think the opinion of "not needing them" seems miss what they're accomplishing.

javajosh · on March 31, 2016

Comments about intention are good.

Yes, an experienced dev can infer intent from context (name, project, 'culture'), and even a smart beginner can read ahead to infer intent. But the comment resolves ambiguity in the advanced dev's mind, and saves the beginner's mind from the difficulty of read-ahead inferences (of which he may have several going at once).

I say, keep it.

rukai · on March 31, 2016

I am someone who doesnt know much about compilers and I found them helpful. Setting up each variables purpose builds up context in my mind lessening the struggle with the 'real' code.

altotrees · on April 1, 2016

That was my first thought too. Comments can sometimes be seen as an afterthought, something that you feel obligated to leave for future programmers or managers craving documentation. In cases like this, comments become an active part of the program, documenting and teaching parts of the code to anyone who wants or needs to understand.

It inspires me to leave more thorough, detailed comments, not for my own benefit, but for that of others. This almost reads like a really well-done technical paper in parts. Very neat.

nickpsecurity · on March 31, 2016

I only glanced at it cuz Im at work. Yet, I found myself glancing longer than I shouldve because of the very things you mentioned. It was an amazing firsr impression.

I'll be looking ahain despite not knowing JavaScript. I bet I jnderstand most of it anyway given attention to readability.

hk__2 · on March 31, 2016

If you’re interested this is called “litterate programming” [1], a form of programming where instead of having a few comments in a large program, you have a few lines of program in a large comment. It’s particularly adapted to tutorials like this one.

[1]: https://en.wikipedia.org/wiki/Literate_programming

nothrabannosir · on March 31, 2016

Nitpick: this is not literate programming; it's good commenting.

Literate programming means the flow of the code follows the flow of the thought process: you write the story, and insert code where relevant. It includes other bits of code as macros.

From that Wikipedia page:

The literate programming paradigm, as conceived by Knuth, represents a move away from writing programs in the manner and order imposed by the computer, and instead enables programmers to develop programs in the order demanded by the logic and flow of their thoughts.[2]

...

Order of human logic, not that of the compiler

...

Instead of comments provided as side notes to source code a literate program contains the explanation of concepts on each level, with lower level concepts deferred to their appropriate place, which allows for better communication of thought.

Example (same page):

    The purpose of wc is to count lines, words, and/or characters in a list of files. The
    number of lines in a file is ......../more explanations/
    
    Here, then, is an overview of the file wc.c that is defined by the noweb program wc.nw:
        <<*>>=
        <<Header files to include>>
        <<Definitions>>
        <<Global variables>>
        <<Functions>>
        <<The main program>>
        @
    
    We must include the standard I/O definitions, since we want to send formatted output
    to stdout and stderr.
        <<Header files to include>>=
        #include <stdio.h>
        @

Not to say these comments aren't good; they're great stuff, they make it very accessible and understandable! But it's not Literate Programming. Just Good Programming :)

hk__2 · on March 31, 2016

I like to call it literate programming because it’s very close to what Knuth was thinking about; there’s no other way to do it in JS.

mhd · on March 31, 2016

If you don't need something as super-tiny, but don't want to go all-out Dragon book, Niklaus Wirth has released an updated copy of his classic "Compiler Construction" for free, now targeting a subset of his Oberon language. All in 130 pages. No fancy functional methods aimed at a language high in the Chomsky hierarchy, but all the basics are there. (I consider it one of IT's biggest losses that we don't follow Wirth more, go notwithstanding)

http://www.inf.ethz.ch/personal/wirth/CompilerConstruction/C...

nickpsecurity · on March 31, 2016

I second that reference. Side benefit is there's compilers, useful code, and a whole OS with great docs to experiment with afterwards.

fizixer · on March 31, 2016

Really great work. Thumbs up.

As an experienced programmer who only recently got into compilers, I have a few, open ended, somewhat advanced, questions about the transformer phase (for anyone to comment, thanks in advance).

It appears to me that the transformer is like the guts of the compiler, everything else is more like support system. And more often than not, the purpose of transformation ends up being 'optimization', but of course you could do transformation for other purposes. However this phase ends up being very hairy logic if the programmer is not careful.

Anyway, my questions are:

- is there any consensus on creating a DSL for providing transformation rules instead of hand coding them in the programming language?

- There is the nanopass compiler framework in scheme [0]. Is that one of the superior ways of doing things? or are there any serious criticisms about that technique?

- Is there a relationship between compiler transformations and 'term-rewriting systems'?

- OMeta/OMeta2 pattern matching on context-free-languages?

Whether it's nanopass-framework, OMeta2, term-rewriting system, or some other DSL, I guess what I'm trying to get at is how can that phase be made more 'formal', so I guess at this point we get into the area of 'formal systems' and 'formal semantics' etc, etc.

I would appreciate if there is recommended reading (books, survey papers, etc) on this topic.

[0] https://github.com/akeep/nanopass-framework

naasking · on March 31, 2016

Here's one data point:

* Automatic Generation of Peephole Superoptimizers: http://lambda-the-ultimate.org/node/2800

skybrian · on March 31, 2016

There are a wide variety of techniques. Some compilers use a single IR (intermediate language) as the input and output for each optimization pass. Others use multiple IR's depending on the phase.

If you have a bunch of optimization passes and each pass visits the entire program, the compiler can become slow pretty quickly. Also, there's nothing preventing passes from running in the wrong order, and they may have implicit dependencies that aren't well documented.

bluetomcat · on March 31, 2016

I wrote a shift-reduce parser for a minimalistic C-like language where the grammar is entirely specified as a static table. The precedence rules are resolved through a "should_shift" function where the parser asks whether it should perform the reduction for a matched rule, or shift one token more:

https://github.com/bbu/simple-interpreter

MichaelBurge · on March 31, 2016

Jon Harrop had an article in the F# Journal a while back where he wrote an x86 JIT compiler for a small functional language in about 200 lines of code. It could express Fibonacci programs, and the compiler's performance beat the equivalent F# program on that toy benchmark.

I think the article itself is behind a paywall, but it looks like he put a similar version of the code on Github: https://github.com/jdh30/FSharpCompiler/blob/master/FSharpCo...

atombender · on March 31, 2016

As an aside, I've noticed that some modern compilers (Go and Clang are the ones I've studied recently) bundle the tokenization phase and the lexing phase into a higher-level lexer. Which is to say that instead of, say, turning a token pair (PUNCTUATION, "&&") they produce (LOGICAL_AND, "&&"), for example. It makes sense, but surprised me, since if you look at classical compiler books, they generally promote the more formal two-level pipeline of tokenization before lexical analysis.

dcw303 · on March 31, 2016

It's well written. It's clever. The comments style does a great job of explaining the concepts.

My only gripe is, although translating one high level language (lisp) to another (c) is technically a compiler, in practice most people think of something producing output at a lower level of abstraction than the input.

This would be even neater if it could emit assembly.

unwind · on March 31, 2016

I get your point and agree about assembly, but I also think most people would agree that C is a lower level of abstraction than LISP.

One reason to "stop" at C might be that assembly itself tends to be a bit verbose, so not very suitable when aiming for shortness of code as a major feature. Generating assembly would perhaps obscure the things being taught with a bunch of details that are of course interesting, but sort of beside the point. Just guessing, I'm not the original author of course.

dcw303 · on March 31, 2016

I take your point as well.

I guess I'm coming at this as a novice with a recent interest in compilers. I was working through a tutorial series for writing a compiler in Go, but it got frustrating when I realized the author's example language wasn't going to output anything lower than C.

I've since found a series that does show how to output to assembly (68000 no less) and I'm finding it much more rewarding.

majewsky · on March 31, 2016

I see the educational value in building a compiler to C if you want to introduce people to parsing, AST transformations, visitor pattern etc.

But a lot of interesting parts about a compiler are lost when you compile to a high-level language instead of machine code: code selection (to a large extent), register allocation, operation scheduling.

Doing a compiler backend right is much more challenging than doing a compiler frontend right, which is why tutorials usually skip it, and also why people are so grateful for LLVM, which gives a common compiler backend for about every platform imaginable.

chrisseaton · on March 31, 2016

Right - all these compiler tutorials, and many compiler textbooks are about the front-end only. The back-end is where all the fun and magic is.

doom2 · on March 31, 2016

Do you mind linking to the other resource you're using? The one outputting to assembly.

marvy · on March 31, 2016

Probably the one mentioned here:

http://prog21.dadgum.com/30.html

dcw303 · on April 1, 2016

Yep, that's the one!

nickpsecurity · on April 1, 2016

Depends on if your using a LISP processor or x86. And a LISP processor is basically eval on a few primitives. Really low-level and simple.

MichaelBurge · on March 31, 2016

"High level" is in the eye of the beholder, but I don't know that C is the best example to pick on: Before LLVM it was pretty common to target C as your portable assembly language.

Just to give an example, I think GHC targets C-- as one of the steps in its transformations, and still outputs C when you're porting it to a new platform or building a cross-compiler. I don't know that it's useful to say that GHC ceases to be a compiler when you use it to target weird platforms.

And the Chicken Scheme compiler outputs C code[1] that doesn't look like a human wrote it: The C code uses a trampoline to organize function calls, and it periodically garbage-collects your stack cleaning up old frames as you call functions.

[1] http://www.more-magic.net/posts/internals-gc.html

masklinn · on March 31, 2016

> Just to give an example, I think GHC targets C-- as one of the steps in its transformations, and still outputs C when you're porting it to a new platform or building a cross-compiler. I don't know that it's useful to say that GHC ceases to be a compiler when you use it to target weird platforms.

Cmm is GHC's final intermediate language, that's then passed to the machine code generator backend: asm, llvm or C (which is now only available un unregisterised mode, which as you note is mostly for getting started on new platforms)

e12e · on April 1, 2016

There's also Nim, which AFAIK started out by targeting C, but apparently now also support c++, objective-c and javascript backends: http://nim-lang.org/docs/backends.html#backends

For more on Nim, see nim-lang.org and/or https://howistart.org/posts/nim/1

stepvhen · on March 31, 2016

Funfact, going from LISP to C (and back) is the whole of chapter 5 in SICP [1]. The last few exercizes in the book ask you to write a C compiler in Lisp, and then write a C compiler in the Lisp you just created [2].

[1]:https://mitpress.mit.edu/sicp/full-text/book/book-Z-H-30.htm...

[2]:https://mitpress.mit.edu/sicp/full-text/book/book-Z-H-35.htm...

thejameskyle · on March 31, 2016

Author here, I tried doing something like you said but it ends up being far more code.

I prioritized keeping it understandable over doing something cool or complex.

bitserf · on March 31, 2016

The first C++ compiler produced C. I think this qualifies...

Stratoscope · on March 31, 2016

That brings back fond memories. Back in the day on BIX¹ (is anyone else here old enough to remember BIX?) we were talking about different languages and compilers, and the topic of C++ came up. I said, "C++? That's a preprocessor, isn't it?"

Much to my embarrassment, Bjarne Stroustrup was hanging out in the same BIX conference, and he chewed me out for this - but very politely. Bjarne explained that Cfront was definitely not a preprocessor (which would imply something like the C preprocesssor), but an actual honest-to-goodness compiler that just happened to have C as its compilation target.

Now if we could only get the younger generation to abandon that horrible "transpiler" word and call things by their right name: a compiler!

¹ https://en.wikipedia.org/wiki/Byte_Information_Exchange

pklausler · on March 31, 2016

I didn't notice "transpiler" until a year or so ago, and man, it grates nearly as badly as "performant" does.

vinhboy · on March 31, 2016

If something compiles down to C, does that mean there has to be another compiler to compile the C after that?

Stratoscope · on March 31, 2016

Indeed it does.

https://en.wikipedia.org/wiki/Cfront

This should not be a surprise. Consider what happens on a modern CPU after the preprocessors do their job, after the compilers do their job, when you're right down to machine code: the bare metal!

But really, how bare is that metal? Not at all.

Your so-called "machine language" just turns out to be another high level language that the CPU itself compiles down to the actual internal instructions that it executes, instructions that look nothing like what you thought was the machine language.

It's all turtles, my friend.

https://en.wikipedia.org/wiki/Superscalar_processor

https://en.wikipedia.org/wiki/Turtles_all_the_way_down

throweway · on March 31, 2016

Then the universe interprets the code.

My theory is Haskell is functional compiles to machine code which is imperative which runs on a procesor running on physics which is functional again!

majewsky · on March 31, 2016

If you want a binary executable, yes. But that's sort of a tautology: If a program produces some output (like C source code or a binary), you usually run it because you intend to do something with its output later (like compile it or interpret it, or execute it on the machine if it's a binary).

jes · on March 31, 2016

Yes, either another compiler to translate it to machine code, or a C interpreter. Years ago, for example, there was a C interpreter that went by the name "Saber C," if memory serves.

nazka · on March 31, 2016

>translating one high level language (lisp) to another (c) is technically a compiler

Is it not a transpiler? And a compiler from a programming language to a computer readable language like ASM?

Just asking, we can say compiler for everything I guess (I hear you Stratoscope), in fact transpiler has been new to me only recently with an article about Typescript.

lfowles · on March 31, 2016

To nitpick, the computer readable language is machine code. ASM is just an even thinner veneer on top of that.

jhpriestley · on March 31, 2016

Here's the same thing in Haskell, using Parsec

    stmt = fmap (++ ";") expr
    expr = call <|> lit
    call = fmap
           (\(fn:args) -> fn ++ "(" ++ intercalate "," args ++ ")") 
           (between (str "(" >> spaces) (str ")" >> spaces) (many1 expr))
    lit = (many1 digit <|> many1 letter) <* spaces

jakub_h · on March 31, 2016

With enough libraries, everything fits into a few lines. How big is Parsec itself?

LewisJEllis · on March 31, 2016

The answer to your question is "Not really that big": 3.5kloc with comments/whitespace and 1.6kloc without.

Your point is valid that much of this is the library's doing, but I think more of the difference than you're letting on is just from how well-suited functional languages are to this sort of work compared to dynamic scripting languages. Parsec is practically a Haskell standard library, anyway; I wouldn't discount it any more than I would discount using Node's buffer or stream module.

mej10 · on April 1, 2016

I think you could implement (naively) all of the Parsec functionality used in the comment in like ~30 lines.

peter_d_sherman · on March 31, 2016

In my opinion, the core function "compiler" -- is absolutely beautifully written (The crowning gem of this well-written compiler).

It's abstract without being too abstract (exactly the right level of abstraction), and it's a model of simplicity, comprehensibility (especially for people new to compilers), and elegance:

function compiler(input) {

  var tokens = tokenizer(input); 
  var ast    = parser(tokens); 
  var newAst = transformer(ast); 
  var output = codeGenerator(newAst);

  // and simply return the output!
  return output;

}

In short, a great starting point (especially for beginners) to conceptualize and subsequently delve into the depths of what goes on in a compiler.

An A+ and Two Thumbs Up for your efforts. The teaching potential of this compiler (which is always what I'm looking for in compilers) is huge!

typon · on March 31, 2016

Isn't this exactly what functional programming people have been harping on about for decades now?

vinhboy · on March 31, 2016

One small suggestion to make it more friendly for the casual reader. Include input and output for every step.

I want to see what the final output is...

thejameskyle · on March 31, 2016

Author here, you could look at the test.js file in the repo.

I want to make an interactive tutorial for this in the future that should hopefully be even more helpful

ggchappell · on March 31, 2016

Nice, but it needs to say up front what it does (i.e., compile a Lisp-ish PL to a C-ish PL).

EDIT. After a bit of analysis:

The lexer is basically a state machine. Each kind of token consists entirely of characters in a single class, so the usual state machine is modified to give some states a while loop that goes through characters until the token ends.

The parser is basically recursive-descent. The grammar is awfully simple, so only one parsing function (walk) is needed.

The transformer traverses the AST, identifies function calls, and produces a new AST with function calls marked. I'm not sure why the transformer is written as a separate piece of code. It seems to me that the functionality it adds could easily be integrated into either the parser or the code generator. And then the compiler would be even tinier.

And the code generator traverses the transformed AST and spits out C-like code.

feylikurds · on March 31, 2016

Thanks! I was just telling people yesterday that I have decided to learn more about either compilers or operating-systems. You made my choice easier, I am going with compilers by studying the Super Tiny Compiler :)

majewsky · on March 31, 2016

So next week, I'd like to see a "Super Tiny Operating System" on the frontpage.

jakub_h · on April 1, 2016

It's called Oberon.

herbst · on March 31, 2016

Honestly isnt that the perfect example how a lot of comments make code actually way less readable?

The code is pretty cool tho :)

thejameskyle · on March 31, 2016

I added a separate file without any code comments as well.

https://github.com/thejameskyle/the-super-tiny-compiler/blob...

herbst · on March 31, 2016

Thanks for taking the critic serious. I didnt ment to make you do this tho. I was just pointing something out.

I dont think there is a thing like to much comments when you explain something in a way that even noobs should understand.

Your presentation software btw is damn cool.

linkmotif · on March 31, 2016

For a more advanced but still very approachable JS example check out the lexer and parser in the GraphQL code base (https://github.com/graphql/graphql-js)

jarcane · on March 31, 2016

I have been working slowly through Sestoft's Programming Language Concepts, and through that I wrote a tiny interpreter in F# of a mini Forth-like language: http://ideone.com/24I5y0

Inspired by this though, I decided I'd try re-writing it into a compiler for it instead, targeting the MicroMini VM[0], and here it is: http://ideone.com/czXhoM

[0] https://github.com/jarcane/MicroMini

bruth · on March 31, 2016

This is a great read and motivates me to create DSLs for everything. That being said, can anyone recommend one or more of their favorites references for creating DSLs? If it is relevant, my target audience are researchers in healthcare.

melling · on March 31, 2016

I've collected some compiler resources on Github. There are a couple other small C compilers, for example.

https://github.com/melling/ComputerLanguages/blob/master/com...

bruth · on March 31, 2016

Possibly answering my own question, I found this SO question: http://stackoverflow.com/questions/8086364/are-there-any-boo...

ecthiender · on March 31, 2016

Before this I could never read and understand any compiler/interpreter code in one sitting.

This is so simple and elegant, that with one reading I could understand it very clearly. (Albeit it covers a very small, simple language)

runarberg · on March 31, 2016

Thank you. This was a very informative read indeed. I'm currently in the process of rewriting a hobby compiler[1] that I hacked together over a year ago. This time I wanted to do it properly, and hence have been scouting on the web for a short, concise and to the point, tutorial that briefly explains most of the important concepts that I need to know be for I begin. This was exactly that tutorial.

[1]: https://github.com/runarberg/ascii2mathml

iamflimflam1 · on March 31, 2016

You should take a look at Bison - http://www.gnu.org/software/bison/ - it takes a while to understand how to use it, but is well worth the effort.

oldmanhorton · on March 31, 2016

Having used Bison as well has my own hand-written compilers, I personally would suggest that Bison is not worth it. With recursive descent, you end up writing the same BNF structure anyways, but you don't have to fight the Bison file structure just to decorate the tree and you can much more easily work around syntax errors.

pklausler · on March 31, 2016

Whether one uses a parser generator's output as part of a compiler is one thing, but one should at least pass the grammar of the language through a parser generator to let it check the grammar for ambiguity.

grumpyprole · on March 31, 2016

Does this also count? It's approx 50 lines: http://tinyurl.com/oofj8mz

DiabloD3 · on March 31, 2016

Everything is awesome about this even the logo.

Somebody knew what they were doing when they geared this towards the kind of people that inhabit sites like HN.

Good job. :)

omoikane · on March 31, 2016

I am not sure what's super tiny about this one, is it the language that it supports?

From the title, I had expected a contender to http://tinycc.org/ or http://ioccc.org/years.html#2001_bellard

MatthewPhillips · on March 31, 2016

As someone just starting out on their first pet language, this is an amazing resource. I've already read a book on the subject and I think I learned more reading this than the book.

It makes me second guess my choice to use a parser generator (PEG.js) which I don't fully understand yet, this makes it seem simple enough that I could write my own.

Thanks so much for this resource!

mingodad · on March 31, 2016

Interesting enough today I was thinking if we have an AST of a program we could have tools that apply semantic analysis like ownership in rust and achieve the same benefits of it in any language.

Does someone know any already existing tool like this ?

Cheers !

Gladdyu · on March 31, 2016

LLVM. Easiest way to write your own compiler pass.

However, it's going to be tricky to handle ownership between threads as those are not intrinsic to most languages and therefore not native for the compiler, instead they are just a call to a linked in function (fork/clone/pthread_create/std::thread). Compilers for are unaware that the function actually invokes a system call which performs 'odd' (spawning a new thread) behaviour for a function. So in order to have some notion of 'ownership' by a thread you'd need to recognize all different ways of spawning a new thread for the different operating systems and platforms your compiler should support.

emodendroket · on March 31, 2016

I wrote a tokenizer the other day without really knowing what I was doing and it's reassuring to see that it doesn't work that differently from this one.

Anyway, this is very cool.

callumlocke · on March 31, 2016

What languages are the input and output? Or is that a stupid question?

thejameskyle · on March 31, 2016

No stupid questions.. I keep getting similar feedback. I'll do a better job of explaining it.

To answer your question, it's taking a lisp-style syntax and turning it into a C-style syntax (specifically JavaScript- the output AST is actually a valid Babel AST)

    (add 2 (subtract 4 2))

    Into

    add(2, subtract(4, 2))

grizzly_wint · on March 31, 2016

HAHAHAH this is great

known · on March 31, 2016

Brilliant

akerro · on March 31, 2016

> Possibly the smallest compiler ever

>Run with node test.js

hathym · on March 31, 2016

should be called "super tiny interpreter"

Gladdyu · on March 31, 2016

Why? It's not actually interpreting anything, merely transforming (compiling) the lisp-like syntax into c-like syntax without executing it.

dubmax123 · on March 31, 2016

Why in javascript??? Ugh. Maybe it's great, but javascript sucks and should never be used for a compiler.

thejameskyle · on March 31, 2016

Hi I wrote the compiler, the reasons for JavaScript are:

- It was for my conference talk for a JavaScript audience.

- JavaScript is the language I use for 99% of my work.

- JavaScript has a much larger audience.

- I'm a maintainer of a JavaScript compiler (http://babeljs.io/) and I want more people to be able to contribute.

- JavaScript is totally fine for a compiler.

Also, hating on programming languages is dumb.

shimon · on March 31, 2016

If your goal is to demystify a haute-programming concept for people who think of themselves as regular programmers, why not use the most popular language?

It's not like they're saying javascript should be used for all compilers. Even so, there are several javascript-based compilers and transpilers in widespread use today, so even javascript hackers should be able to apply the concepts of compilers in their work.

daliwali · on March 31, 2016

It's worth pointing out that Lisp is homoiconic, meaning that the code you see has the same structure as its AST, so writing a parser for a Lisp is pretty trivial and is part of the reason why this compiler looks super short. Other languages like C, however...

The "code generator" part of this compiler doesn't do much, since it only translates from one high level language to another, not machine code which would understandably be far more complicated, as other commenters have pointed out.

This is a compiler from an altitude of 50km in the stratosphere, and won't actually teach you much about real world compilers that do so much more under the hood. Sorry to be Mr. Cranky HN commenter (ok, I'm not sorry).