I actually work on LLVM-proper during my day job. This was just a fun exercise to demonstrate that it was possible. I also have plans to write a tutorial based on it.
Also an example of how to implement a direct-threaded interpreter.
Some performance data from a Brainfuck mandelbrot benchmark.
Interpreter: 37.787s
Tracing JIT: 11.716s
Static Compiler: 2.402s
The tracing JIT loses out to the static compiler largely because there's no dynamic dispatch in Brainfuck for the tracer to optimize out. There's probably some performance to be recovered by tuning the tracer thresholds and minor optimizations, but I would be shocked if it ever beat the static compiler at least for Brainfuck.
Thanks for writing this -- it's awesome to see JIT principles boiled down to the point where you can easily understand the whole system. Please let us know if you publish the tutorial; I'd love to see more detail on the JIT. In particular, it would be the perfect template to demonstrate feedback-directed optimization opportunities and to measure the overhead of tracing; it would be incredibly interesting to see what has to be done to make the JIT outperform the AOT compiler.
The performance problems originate in the design of your trace compiler, not in static vs. dynamic dispatch. Some suggestions:
* The interpreter should have a fast profiling mode (hashed counting of loop backedges) and a slower recording mode (for every instruction call the recorder first, then execute the instruction). Either implement it twice (it's small enough), use a modifiable dispatch table and intercept the dispatch in recording mode (indirect threading), or compute branch offsets relative to a base (change the base to switch modes).
* Don't record traces for a long time and then compile everything together. Do it incrementally:
- Detect a hot loop, switch into recording mode, record a trace, compile it, attach it to the bytecode, switch to profiling mode (which may call your compiled trace right away).
- Make the side exits branch to external stubs which do more profiling (one counter per exit). Start recording hot traces and continue until it hits an existing trace or abort if it hits an uncompiled loop.
- If you completely control the machine code generation (i.e. not with LLVM), you can attach the side traces to their branch points by patching the machine code. Otherwise you may need to use indirections or recompile clusters of the graph after a certain threshold is reached.
- Region selection has a major impact on performance, so be prepared to carefully tune the heuristics.
* Sink all stores, especially updates of the virtual PC, data pointers etc. Don't count on the optimizer to do this for you.
* Due to the nature of the source language you may need to preprocess the IR or you need to teach the optimizer some additional tricks.
- E.g. the loop [-] should really be turned into 'data[0] = 0'.
- Or the loop [->>>+<<<] should be turned into 'data[3] += data[0]; data[0] = 0'.
- It's unlikely any optimizer handles all of these cases, since no sane programmer would write such code ... oh, wait. :-)
> * The interpreter should have a fast profiling mode (hashed counting of loop backedges) and a slower recording mode (for every instruction call the recorder first, then execute the instruction).
It already does this. The recording method is specialized for '[' (since loop headers can only be '['). All other opcodes go through a fast path that simply checks if we're in recording mode and stores to the trace buffer.
> * Don't record traces for a long time and then compile everything together.
The tricky part with this is knowing how to start up the profiler when we hit a side-exit. PC 123 may occur at multiple places in the trace tree. If we want to extend the tree on side-exit, we need to be able to recreate the path through the trace tree that led to that point. In essence, we need the compiled trace to continue updating the trace buffer. Certainly possible, but doesn't seem like a great idea offhand.
> * Sink all stores, especially updates of the virtual PC, data pointers etc. Don't count on the optimizer to do this for you.
Because I'm using tail-call based direct threading, there are no stores to the virtual PC or the data pointer. They're passed in registers to the tail-callee.
> * Due to the nature of the source language you may need to preprocess the IR or you need to teach the optimizer some additional tricks.
Yes, there's a whole range of pre-processing tricks that could be used to accelerate both the interpreter and the traces. I haven't even scratched the surface of that.
Hefty! As an exercise, any sense in inlining the tracing into compiled code? That way you sacrifice the startup time benefits of the JIT (by AOT compiling at start) but still retain trace specialization.
You could certainly do that, though the interface between statically JIT'd code and trace JIT'd code would get hairy. Plus, as I said above, there's simply not a whole lot of trace specialization going on in the first place, since Brainfuck lacks dynamic dispatch.
I don't know... "neat hack", but it seems there is so much out there that could actually have some kind of practical application that it's a bit of a waste to work on "silly" projects. I love to hack on things that don't have any immediately evident business model or real world application, but I think purposefully working on something that never will is perhaps a bit unfortunate. Yeah, he learned something for sure, but that's pretty much all it can be.
To expand on that: if he'd written his own toy language, say, odds are it would never go anywhere, but, who knows... maybe it will find a niche. Using "brainfuck" pretty much guarantees that the code will never find a practical use.
That's not the point. These are the kind of projects programmers do for fun. Your comments is like saying "if this guy didn't go the zoo, but rather spent his day at work, he at least might have been productive". Now, I can imagine you wouldn't enjoy watching somebody else's pictures of zoo animals, unless you were very interested in zoo animals, like, say, some people are interested in compiler technology and/or esoteric languages. Because Brainfuck is an insanely simple language to compile/interpret it's actually makes for a nice small example!
> I am not saying fun projects need to be useful (I'd be the last one to say that), I'm saying "why ensure they'll never be useful?".
Because some things are done for learning exercises and nothing more.
Just because that code isn't going to do anything useful when executed, it does not mean that there is no value in the exercise of producing or sharing the code.
Not that it needs that much justification, fun is a good enough reason.
Ok, you people are seriously failing to comprehend that which I am attempting to communicate to you.
* I have no problems with 'fun projects'. I do them myself.
* I have no problem with 'useless code'. Many fun projects are never useful. I have done these myself.
* I have no problem with learning for learning's sake. I have engaged in lots of that.
What I don't agree with is artificially limiting your project's potential. By choosing to implement brainfuck, instead of Tcl or Lua or Forth or Lisp or any number of other languages (even if it's just a toy version, and not the full spec), you've consigned your project to the dustbin before you've even finished it. To my way of thinking, that just doesn't make any sense. If you want to be artistic, go out and make your own language.
Not sure why you see implementing a simple language to learn the underlying techniques as "artificially limiting" the project's potential. Maybe this is just a step to implementing a more complicated language.
> Not sure why you see implementing a simple language to learn the underlying techniques as "artificially limiting" the project's potential.
Even a simple language like Hecl is used for real things in the real world, something I wouldn't have imagined when I started writing it. An implementation of Brainfuck will never be used for anything more than learning.
Which continues to be my point: learning's great, but why not learn and do something that has a tiny chance of developing into something big?
You're assuming that every thing that people work on actually needs the potential to become "big".
What about the people that study klingon? Or Esperanto? Or Elvish?
You assume that a person cannot gain something by simply learning something that has no practical purpose. This isn't the case. By working on BF, someone might make some neuron connections that help them with logical thought with other projects, by breaking down logical thinking into very simplistic, basic symbols--completely removing "language" out of the equation.
The Klingon Dictionary is very good at explaining grammatical concepts. It made me interested in how language worked. After reading it I was able to understand a French grammar book and go from getting an E in my French GCSE mock exam to a B in the final a couple of months later. I went on to get a degree in French (which I never used but that's beside the point).
Specifically, some of the gutteral sounds used in Klingon are sounds that are used in real languages. If you're a native American-English speaker, these sounds do not exist in our language. If someone here is so dedicated to learning Klingon and Star Trek, it might open their ability to A) learn real languages easier, B) to pronounce many of the sounds used in those languages.
> You assume that a person cannot gain something by simply learning something that has no practical purpose
Actually, what I wrote, if you'd bother to read it, is: "Yeah, he learned something for sure, but that's pretty much all it can be." so don't put words into my mouth.
The point boils down to this:
* It'll probably never become big.
* That doesn't really matter, it's mostly for learning and for fun.
* But even so, why preclude that? <- this is all that is causing people to get their knickers in a knot over what I wrote.
> You're assuming that every thing that people work on actually needs the potential to become "big".
If you get that essentially for free, then why the hell not?
The effort you've expended to debate this will soon meet & then exceed the effort that went into the tracing JIT in question. When you reach that point, you'll have some soul searching to do.
Writing fast Brainfuck interpreter is one of the best optimization exercises I've ever encountered. Trivial to start, easy to improve and then it gets progressively harder with every iteration.
This is a great project for learning how to use LLVM because there is nothing to add complexity. There is no need for a runtime library. There is no need for a complicated parser. With those parts gone, all that matters is building the compiler, JIT core, tracing algorithm, and so on.
Now that the author knows how to do all that, he's mentally prepared to be productive when taking on a more complex task, like writing his own language or improving an existing one.
You seem to think that because it's on Github you're supposed to use it. Incorrect. Github is just a public backup and a way of sharing something with your friends. Kind of like posting your drunken photos on Facebook. Productive? No. Fun? Yes.
> You seem to think that because it's on Github you're supposed to use it.
Actually, if you read my post, you would have understood
that that is not the case:-/
Your point about it being a "minimum viable language" is a good one though, but I find it difficult to think there isn't something else he could have chosen.
I don't give a shit about karma, I do intensely dislike the idea of "piling on" downvotes, for myself or others, as I have expressed many times in the past with regards to others' postings.
It's a great way of discouraging the expression of dissenting ideas. You're correct that I have the 'reserves' to express any number of actual crockpot ideas, but not everyone does.
> This is a great project for learning how to use LLVM because there is nothing to add complexity.
Speaking as someone who's written a very simple Brainfuck interpreter, this is absolutely spot-on. As someone who'd never written an interpreter before, I didn't want to have to think about the language itself, just focus on getting the absolute fundamentals down.
More so, because it's Brainfuck you don't have to worry about people mistaking it for a serious project, so you can concentrate on the educational aspect rather than fine-tuning the performance, fixing every single bug, etc.
It's the same as carpentry and sculpting. It's the same thing, but one is a craft and the other is an art. This guy just took time off making furniture to build a sculpture.
Sure, he could have made a new kind of furniture, but making sculptures also has its merits (or not, depending on your viewpoint, I guess).
I actually work on LLVM-proper during my day job. This was just a fun exercise to demonstrate that it was possible. I also have plans to write a tutorial based on it.
I'm assuming you've been downvoted because slightly more than 50% of HNers think of this project as an artistic/fun project.
But the fact is, even a purely artistic/fun project will have some creativity or originality in it. I would consider a toy language or Brainf__k written for the first time as artistic.
But this project is just a JIT for Brainf__k, there's no creativity in it, and all it did was give the author some experience writing JITs. In that sense this is an exercise project, and IMO exercise projects do not belong to HN.
Generally, as long as it's part of something constructive and adds rather than detracts from the message, the community won't downvote profanity. http://news.ycombinator.com/item?id=1636262
I actually work on LLVM-proper during my day job. This was just a fun exercise to demonstrate that it was possible. I also have plans to write a tutorial based on it.