I really do wish the Community as a whole embrace TruffleRuby rather than leaving it on the sideline. GraalVM, which TruffleRuby is based on, along with TruffleRuby is literally few hundred million dollar of investment as an implementation over the years.
Much of the problem has to do with C extensions, despite the team's effort to even have C Code Compiled with TruffleRuby, they also had to go all the way to fix all the C library that didn't work with TruffleRuby due to undefined behaviours.
Nothing touched by Oracle will be adopted, nor should it. It would be reckless and stupid. The risk to base your technology stack on such a plainly evil actor is just not worth it. Doesn't matter how many millions the investment was.
> "what you think of Oracle is even truer than you think it is. There has been no entity in human history with less complexity or nuance to it than Oracle"
> "this company is about one man and his alter ego and what he wants to inflict upon humanity"
> You need to think of Larry Ellison the way you think of a lawnmower. You don't anthropomorphize your lawnmower, the lawnmower just mows the lawn, you stick your hand in there and it'll chop it off, the end. You don't think 'oh, the lawnmower hates me' -- lawnmower doesn't give a shit about you, lawnmower can't hate you. Don't anthropomorphize the lawnmower. Don't fall into that trap about Oracle.
> Unfortunately, there is one huge problem with Ruby’s current MJIT. At the time I write this in mid-to-late 2019, MJIT will slow Rails down instead of speeding it up.
> That’s a pretty significant footnote.
Oops.
> In addition to memory usage, there’s warmup time. With JIT, the interpreter has to recognize that a method is called a lot and then take time to compile it. That means there’s a delay between when the program starts and when it gets to full speed.
Why not cache the compiled methods to make warmup a once-per-version delay? It would be a JIT/precompiled hybrid. Call it gradually compiled.
Matz (Ruby lead maintainer/original author) has said that Ruby 2.7 (coming out this Christmas) has JIT improments that make JIT neutral for Rails. The goal is to have it be net positive by Ruby 3.0 (Christmas 2020)
It's a dynamic language. Typical JIT compilation involves generating specialized versions over specific types. You need run time profiling to know what to generate.
Some experiments are underway soon, current implementation uses a cache that is tightly scoped to a specific run including stuff such as class serial numbers that can not be reused between runs safely
>> MJIT will slow Rails down instead of speeding it up.
>> That’s a pretty significant footnote.
> Oops.
FWIW / from what I remember, generating high-performance native code for Rails was hard (or at least significantly different from optimizing standard benchmark type code). The OMR JIT for Ruby had similar challenges, with at least a few 'at least we didn't make it worse' moments.
I was only tangentially involved, but I vaguely recall Rails needing (or at least appearing to need) a lot more inlining and string manipulation optimizations to fly.
Nope, that's not what Bootsnap does. Bootsnap caches the pre-parsed .rb files as ISEQs (bytecode buffers.) It's conceptually a bit similar but not the same.
I’m excited about this work. Having a faster Ruby would be some wonderful icing on the cake. But, for me, using Ruby (and Rails) has always been about optimizing for developer hours over system performance. IMO Ruby is not a race horse, but it’s plenty “fast”. The real value is how quickly my team can iterate, and how enjoyable the process is.
Languages like Smalltalk or Lisp can have it both ways, the problem with Ruby is the manpower to have a comparable JIT available, although MJIT isn't the only one already available.
...Cincom is the largest commercial provider of Smalltalk in the world, with twice as many partners and customers than all other commercial providers combined."
>When a method has been called a certain number of times (10,000 times in current prerelease Ruby 2.7), MJIT will mark it to be compiled into native code and put it on a “to compile” queue. MJIT’s background thread will pull methods from the queue and compile them one at a time into native code.
I would have thought that you wouldn't wait for 10,000 iterations but just start from the beginning and compile the methods with the most calls and keep the compiler thread busy. Flush less frequently or no-longer needed compiled calls and limit the total to some defined resource usage cap. You'd probably win overall.
For what it's worth, the JVM defaults to 10,000 times for a method being called before deciding it is hot, so that threshold is not without precedent (but can be adjusted via XX:CompileThreshold)
Interesting. In .NET Core 2.1, they decided just 30 calls was enough to recompile from the fast “tier 0” JIT code to the fully optimized “tier 1” JITed code.
10,000 calls really isn't that many, when you consider the sorts of operating environments that the JVM is targetted at.
30 seems crazy low to me. That seems like you'd be spending a bunch of compute time early on compiling stuff that may be only used during the start-up stages of your code.
> I would have thought that you wouldn't wait for 10,000 iterations
I haven't kept up with Ruby JITs, so I don't know if / how mjit handles inlining, but what I remember from my J9 / OMR days is that choosing what to inline made a massive difference in compiled code performance. Inlining too much was a great way to hobble performance.
> Flush less frequently or no-longer needed compiled calls...
The technical complexity of this process should not be underestimated. Given the minimalism of mjit's approach, I could easily see such a strategy being unviable without infrastructure investment that would (at least appear to) be significantly further along the effort / reward curve.
Because rails is extremely eager to redefine classes and monkey patch instances by design. This is hostile to almost all JIT models. V8 spends a very large amount of engineering time optimizing specific cases of handling the common cases of this, but it’s non trivial.
Aren't most of those shenanigans done early in the VM's lifespan (as Rails boots, actions are called the first time,...), after which the world would be stable, and JIT-related caches wouldn't have to be lost?
Generally yes, and I won’t get into the weeds about how inlined methods can have their inlinedness invalided if you look too hard at or breathe on them- but this isn’t a deterministic process. The first time you touch an ActiveRecord::Model, the database is queried and your classes ARE redefined. It’s not even close to the same ballpark of loading a new class on the JVM, and the stabilizing assumption you’re hoping exists often doesn’t. Death by 1,000 cuts
Long time Rails dev here. I believe not. The Rails core devs have long known that monkey patching is not thread safe, and blows Ruby method caches, so they already reserve monkey patching for initialization time.
I feel that the definition of "monkey patching" doesn't (or shouldn't) cover what Rails does in AR. Monkey patch suggests a change of behaviour over what was initially intended, however dynamic method generation in AR is very-much part of the intended API.
That said, there are plenty of Rails apps that do monkey patch Rails internals, for better or worse.
Do you have something specific in mind? From my understanding of ActiveRecord, most of the method generation should happen the first time the class is defined.
They are the notable exception, yes. Note that aside from V8, all have multiple JITs (and V8 is gaining a mid-tier compiler, so they can move away from the interpreter sooner), which adds a fair amount of extra complexity and maintenance cost. All have pretty advanced interpreters, which also helps make warmup smoother.
Did I get this right: the JVM also uses a JIT, so basically first you compile your java program into java byte code, and then once it's running there's also a JIT running to further optimise the bytecode ?
First of all there are many JVM implementations, the commercial ones used to offer AOT compilation as well, going back to early 2000.
Then the ones that have JITs offer multiple flavours.
One way is to initially interpret the bytecodes, after enough information it is gathered, the first level JIT gets into action and compiles that block into native code, here block is usually a function, but can be something else.
This first level compiler is rather simple and does only basic optimizations.
The application keeps profiling execution and eventually notices that the already compiled block (into native code) keeps being used significantly, now it is time to bring the big brother JIT, which is somehow equivalent to -O3 on gcc, and recompile again to native code using all major optimizations.
Other JVMs (like JRockit) never interpret, when they start the first level is already the dumb level one compiler to native code.
Then all of them now support JIT caches, meaning after a run, the JITed methods get saved and re-used by next execution, so the profiler gets to learn from previous runs, and execution of the system already starts from a much better performance state.
Yes, although HotSpot has multiple layers, not just two.
It initially interprets, and when a specific threshold is reached (you can configure it), the C1 compiler gets called into action doing basic optimizations.
After awhile if that native generated code keeps getting even more hot, the C2 compiler (the one with -O3 capabilities) gets called into action.
In both cases the optimized code gets safety guards to validate that the assumptions made by the JIT are still valid. For example if a dynamic dispatch always lands on the same method, then it gets replaced by a direct call instead. Even that is proven wrong, then the JIT throws the optimized code away and starts with the new assumptions.
Then in what concerns OpenJDK, you have actually 2 C2 JIT compilers available, HotSpot written in C++ and still the default, and Graal written in Java taken from GraalVM (nee MaximeVM) project. Currently Graal is much better than HotSpot in escape analysis for example, but worse in other scenarios.
In both cases, OpenJDK has inherited the JIT cache infrastructure from JRockit, so you also get to save the native code between runs, and start much faster in consequent runs.
As note, even though it is usually not a good idea, if you set the interpreter threshold to zero, then C1 kicks right at the beginning, but it won't have any information available, so the generated code is going to be most likely worse than just interpreting.
It does. If you’re asking why not AOT compile a whole ruby app? Well Ruby is a really dynamic language and the generated C might not be valid after the app begins execution. I guess theoretically you could enumerate all of the different possible class states or the final class states but that sounds really hard.
If you actually need a compiled Ruby binary mRuby is the best option but it has its own drawbacks of course.
It may work a bit better now, but it's still very much not tuned for that. The result would probably still be slower than not JITting, for the same reasons mentioned for 2.6 in that link.
Much of the problem has to do with C extensions, despite the team's effort to even have C Code Compiled with TruffleRuby, they also had to go all the way to fix all the C library that didn't work with TruffleRuby due to undefined behaviours.