I'm writing a compiler in Haskell for a university class this semester. I don't ...

jfoutz · on Oct 25, 2016

Sure :) First, i have to say i feel your pain. Writing a compiler is hard. If there's anything left you don't understand, this project will be punishing. Just know that at the end of the semester, there is absolutely no magic left. Your mental model of computation will be spectacular. Hang in there.

There are a couple of things. First, haskell lets you build up the language incrementally. So if your parse tree is like

    data AST = Prim Op |
               Immediate Int

you can just throw in

               Variable String

And the compiler will tell you all the functions with non-exhaustive matches. It's tough to do this in C because you have to remember all of your switch/case statements. Also, in C those switches are going to be based on a tag in a data structure rather than a direct type. The compiler won't really help very much.

Second, the thing that implements the semantics for the parse tree can be a typeclass.

    class Semantics where
        eval (Prim op) :: Prim -> a
        eval (Immediate i) :: Immediate -> a
        eval (Var name) :: Variable -> a

In haskell, you are free to implement an interpreter which is way way easier than a full compiler. It's nice to have an interpreter for your language, because you can run the same program both ways, interpreted and compiled.

so...

    instance Semantics Interpreter where
        eval (Prim op) = case op of
            Plus l r = eval l + eval r
            Minus l r = eval l - eval r
            ...
        eval (Immediate n) = n
            ...
        -- variables need an environment, but i think you get the idea, just look up the string in the env.

Then, you can write a compiler.

    instance Semantics Compiler where
        eval (Prim op) = case op of
            (Plus l r) = eval l ++ "MOV r1 r2 \n" 
                      ++ eval r "ADD r1 r2 -> r1\n"
            ...
        eval (Immediate a) = "STORE " ++ a ++ " r1\n"
            ...
        -- again, variables require an env to be passed around, it's just another arg to eval
        -- this time around though, instead of a simple lookup
        -- you need the memory address to load into r1
        -- you have to encode what lookup would do. 
        -- [2] talks about this a bit.

This is a pretty crappy evaluation strategy, just sticking the last result in r1, but i'll work. You can do much much fancier things.

Another super cute trick with haskell, you can use LLVM to generate your machine code at haskell run time, and then dynamically link it [1]. This lets you mix and match evaluation, interpret parts, compile parts, and use them interchangeably. Much much easier debugging. That link should get you generating machine code for a toy language in a day. Basically, instead of loading up the dynamic library, you just make an executable instead.

Anyway, there's lots of good stuff out there. I'd suggest out 3imp [2] (warning postscript file) That's Kent Dybvig's dissertation on writing scheme compilers. The second implementation is unbelievably elegant. full support for call with continuation, so much cool stuff. It compiles to a VM rather than assembly, but that makes a lot of complex things clear.

An obvious typeclass is a pretty printer for your parse tree, so you can make sense of what a program is supposed to be doing and you don't have to swear quite as much when your resulting assembly doesn't work.

A less obvious typeclass pass is a typechecker, if someone is adding a string to an integer, you can return a new AST that injects the Int -> String conversion. Constant folding is a good one to do as well.

An even less obvious typeclass will ensure that every malloc has a free. I don't think it's possible for any C program, but you can get a lot of it, and spit out a warning when your algorithm isn't sure.

If you can handle academic papers, Oleg's finally tagless is the coolest way to swap out interpreters, compilers, and partial compilation [3] (pdf)

C won't help you with that, like at all. you can play a game with a table of function pointers, that you swap out for the various eval cases. Debugging that, imho, is hard. With haskell the compiler just tells you you're being dumb.

I guess the main thing is, a compiler has a ton of special cases. Half the battle is just mananging those special cases. Haskell eases a lot of those burdens. The other half of the battle is coming up with good algorithms for getting stuff done (like register selection above!). Haskell lets you slice things up, so you can give your algorithm everything it needs to do its job efficiently. It's not that you can't do that in other languages (clearly, there are a ton of compilers in a ton of languages) it's more that it really encourages and supports the kinds of things you need to do when writing a compiler.

Again, writing compilers is hard. you will feel stupid. You are not stupid, you're doing something very few of the 7 billion people on the planet have even attempted. It takes years of practice to get good enough to even try. You're a badass. You'll solve it.

[1] http://augustss.blogspot.com/2009/01/llvm-llvm-low-level-vir...

[2] ftp://www.cs.indiana.edu/pub/scheme-repository/doc/pubs/3imp.ps.gz

[3] http://okmij.org/ftp/tagless-final/JFP.pdf

edit

In retrospect that register allocator blows. any right associative operations will get stomped on. It's what i get for slapping something together at 1 am. let's just pretend this is for fourth, and you always work on the top of the stack.

Athas · on Oct 25, 2016

Have you tried writing a compiler in another language? In my experience, functional languages are a very nice fit.

yyhhsj0521 · on Oct 25, 2016

So which part do you find difficult? The parser? Haskell has the best library for parsing, parsec. The interpreter/code generator? I can't see how Haskell is different from any other languages on this part.

bbcbasic · on Oct 25, 2016

You're writing a compiler! It won't be painless in any language.