JIT and Ruby's MJIT

ksec · on Dec 7, 2019

I really do wish the Community as a whole embrace TruffleRuby rather than leaving it on the sideline. GraalVM, which TruffleRuby is based on, along with TruffleRuby is literally few hundred million dollar of investment as an implementation over the years.

Much of the problem has to do with C extensions, despite the team's effort to even have C Code Compiled with TruffleRuby, they also had to go all the way to fix all the C library that didn't work with TruffleRuby due to undefined behaviours.

onli · on Dec 7, 2019

Nothing touched by Oracle will be adopted, nor should it. It would be reckless and stupid. The risk to base your technology stack on such a plainly evil actor is just not worth it. Doesn't matter how many millions the investment was.

pjmlp · on Dec 7, 2019

Your loss then.

Graal has an open source license, and all corporations are alike, regardless of what hippie developers think, showing to the man and such stuff.

Instead other language stacks will get adopted into detriment of Ruby, e.g. Go.

tsomctl · on Dec 7, 2019

> all corporations are alike

lol.

> "what you think of Oracle is even truer than you think it is. There has been no entity in human history with less complexity or nuance to it than Oracle"

> "this company is about one man and his alter ego and what he wants to inflict upon humanity"

> You need to think of Larry Ellison the way you think of a lawnmower. You don't anthropomorphize your lawnmower, the lawnmower just mows the lawn, you stick your hand in there and it'll chop it off, the end. You don't think 'oh, the lawnmower hates me' -- lawnmower doesn't give a shit about you, lawnmower can't hate you. Don't anthropomorphize the lawnmower. Don't fall into that trap about Oracle.

https://www.youtube.com/watch?time_continue=2318&v=-zRN7XLCR...

pjmlp · on Dec 7, 2019

As you wish, HN is not the place for this kind of quality content anyway.

ksec · on Dec 8, 2019

I have asked the same question Multiple times, and as far as I can tell. There is nothing that stop me from forking TruffleRuby.

onli · on Dec 8, 2019

Sure! But there is nothing stopping Oracle from suing you inventing some insane reason like copyrighting APIs.

hirundo · on Dec 7, 2019

> Unfortunately, there is one huge problem with Ruby’s current MJIT. At the time I write this in mid-to-late 2019, MJIT will slow Rails down instead of speeding it up.

> That’s a pretty significant footnote.

Oops.

> In addition to memory usage, there’s warmup time. With JIT, the interpreter has to recognize that a method is called a lot and then take time to compile it. That means there’s a delay between when the program starts and when it gets to full speed.

Why not cache the compiled methods to make warmup a once-per-version delay? It would be a JIT/precompiled hybrid. Call it gradually compiled.

Gasparila · on Dec 7, 2019

Matz (Ruby lead maintainer/original author) has said that Ruby 2.7 (coming out this Christmas) has JIT improments that make JIT neutral for Rails. The goal is to have it be net positive by Ruby 3.0 (Christmas 2020)

TylerE · on Dec 7, 2019

It's a dynamic language. Typical JIT compilation involves generating specialized versions over specific types. You need run time profiling to know what to generate.

hirundo · on Dec 7, 2019

Seems like the profile from run 1 would be pretty predictive of run 2. And the profile and cache can be updated during each run.

sams99 · on Dec 7, 2019

Some experiments are underway soon, current implementation uses a cache that is tightly scoped to a specific run including stuff such as class serial numbers that can not be reused between runs safely

ShroudedNight · on Dec 7, 2019

>> MJIT will slow Rails down instead of speeding it up.

>> That’s a pretty significant footnote.

> Oops.

FWIW / from what I remember, generating high-performance native code for Rails was hard (or at least significantly different from optimizing standard benchmark type code). The OMR JIT for Ruby had similar challenges, with at least a few 'at least we didn't make it worse' moments.

I was only tangentially involved, but I vaguely recall Rails needing (or at least appearing to need) a lot more inlining and string manipulation optimizations to fly.

Aearnus · on Dec 7, 2019

> Why not cache the compiled methods to make warmup a once-per-version delay? It would be a JIT/precompiled hybrid. Call it gradually compiled.

That's what Bootsnap does, and it now comes standard with Rails. https://github.com/Shopify/bootsnap

angelbob · on Dec 7, 2019

Nope, that's not what Bootsnap does. Bootsnap caches the pre-parsed .rb files as ISEQs (bytecode buffers.) It's conceptually a bit similar but not the same.

elliotlarson · on Dec 7, 2019

I’m excited about this work. Having a faster Ruby would be some wonderful icing on the cake. But, for me, using Ruby (and Rails) has always been about optimizing for developer hours over system performance. IMO Ruby is not a race horse, but it’s plenty “fast”. The real value is how quickly my team can iterate, and how enjoyable the process is.

pjmlp · on Dec 7, 2019

Languages like Smalltalk or Lisp can have it both ways, the problem with Ruby is the manpower to have a comparable JIT available, although MJIT isn't the only one already available.

igouy · on Dec 7, 2019

> Languages like Smalltalk or Lisp can have it both ways...

You seem to be suggesting that Smalltalk implementations are a whole lot faster than current Ruby ?

They are a whole lot faster than Matz's 2008 Ruby 1.8.7 but so is current Ruby:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

pjmlp · on Dec 7, 2019

I don't any commercial versions there, which is what I was referring to.

igouy · on Dec 7, 2019

If you meant to say that no commercial implementations are shown on the benchmarks game website — that is not true.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

and

"Largest Provider of Commercial Smalltalk

...Cincom is the largest commercial provider of Smalltalk in the world, with twice as many partners and customers than all other commercial providers combined."

http://www.cincomsmalltalk.com/main/products/visualworks/

igouy · on Dec 7, 2019

As for Lisp implementations [SBCL against LispWorks and Allegro] do you have reason to doubt the opinion you heard last year —

"Generally they are in the same league."

https://news.ycombinator.com/item?id=17482481

joelbluminator · on Dec 7, 2019

Yep, agreed. I actually think Rails should focus on reducing memory rather than raw speed (though the two are probably connected).

Lerc · on Dec 7, 2019

I was a bit surprised by

>When a method has been called a certain number of times (10,000 times in current prerelease Ruby 2.7), MJIT will mark it to be compiled into native code and put it on a “to compile” queue. MJIT’s background thread will pull methods from the queue and compile them one at a time into native code.

I would have thought that you wouldn't wait for 10,000 iterations but just start from the beginning and compile the methods with the most calls and keep the compiler thread busy. Flush less frequently or no-longer needed compiled calls and limit the total to some defined resource usage cap. You'd probably win overall.

Twirrim · on Dec 7, 2019

For what it's worth, the JVM defaults to 10,000 times for a method being called before deciding it is hot, so that threshold is not without precedent (but can be adjusted via XX:CompileThreshold)

nycdotnet · on Dec 7, 2019

Interesting. In .NET Core 2.1, they decided just 30 calls was enough to recompile from the fast “tier 0” JIT code to the fully optimized “tier 1” JITed code.

https://github.com/dotnet/coreclr/blob/master/Documentation/...

Twirrim · on Dec 8, 2019

10,000 calls really isn't that many, when you consider the sorts of operating environments that the JVM is targetted at.

30 seems crazy low to me. That seems like you'd be spending a bunch of compute time early on compiling stuff that may be only used during the start-up stages of your code.

josefx · on Dec 7, 2019

It also uses a heuristic to check if application startup has finished to decide whether a call counts towards the tier1 JIT threshold.

ShroudedNight · on Dec 7, 2019

> I would have thought that you wouldn't wait for 10,000 iterations

I haven't kept up with Ruby JITs, so I don't know if / how mjit handles inlining, but what I remember from my J9 / OMR days is that choosing what to inline made a massive difference in compiled code performance. Inlining too much was a great way to hobble performance.

> Flush less frequently or no-longer needed compiled calls...

The technical complexity of this process should not be underestimated. Given the minimalism of mjit's approach, I could easily see such a strategy being unviable without infrastructure investment that would (at least appear to) be significantly further along the effort / reward curve.

angelbob · on Dec 7, 2019

Earlier 2.6 versions of Ruby used 5 as the threshold. For a variety of reasons, that was way too low.

felixarba · on Dec 7, 2019

Would be interesting to hear why it doesn't play nice with rails yet

haimez · on Dec 7, 2019

Because rails is extremely eager to redefine classes and monkey patch instances by design. This is hostile to almost all JIT models. V8 spends a very large amount of engineering time optimizing specific cases of handling the common cases of this, but it’s non trivial.

hsm3 · on Dec 7, 2019

Aren't most of those shenanigans done early in the VM's lifespan (as Rails boots, actions are called the first time,...), after which the world would be stable, and JIT-related caches wouldn't have to be lost?

haimez · on Dec 7, 2019

Generally yes, and I won’t get into the weeds about how inlined methods can have their inlinedness invalided if you look too hard at or breathe on them- but this isn’t a deterministic process. The first time you touch an ActiveRecord::Model, the database is queried and your classes ARE redefined. It’s not even close to the same ballpark of loading a new class on the JVM, and the stabilizing assumption you’re hoping exists often doesn’t. Death by 1,000 cuts

t-writescode · on Dec 7, 2019

New Rails dev here: I believe the entire ActiveRecord model is constant monkey patching.

FooBarWidget · on Dec 7, 2019

Long time Rails dev here. I believe not. The Rails core devs have long known that monkey patching is not thread safe, and blows Ruby method caches, so they already reserve monkey patching for initialization time.

thomasfedb · on Dec 7, 2019

I feel that the definition of "monkey patching" doesn't (or shouldn't) cover what Rails does in AR. Monkey patch suggests a change of behaviour over what was initially intended, however dynamic method generation in AR is very-much part of the intended API.

That said, there are plenty of Rails apps that do monkey patch Rails internals, for better or worse.

toasterlovin · on Dec 7, 2019

Do you have something specific in mind? From my understanding of ActiveRecord, most of the method generation should happen the first time the class is defined.

viraptor · on Dec 7, 2019

Closer to the first time it's used. Schema gets queried and methods get added the moment you do the first query on the model normally.

mperham · on Dec 7, 2019

This is not true in production, where eager loading is always on.

saagarjha · on Dec 7, 2019

> And while it’s possible to tune a JIT implementation to be okay for warmup time, most JIT is not tuned that way.

Most JavaScript JITs?

gsnedders · on Dec 7, 2019

They are the notable exception, yes. Note that aside from V8, all have multiple JITs (and V8 is gaining a mid-tier compiler, so they can move away from the interpreter sooner), which adds a fair amount of extra complexity and maintenance cost. All have pretty advanced interpreters, which also helps make warmup smoother.

joelbluminator · on Dec 7, 2019

Did I get this right: the JVM also uses a JIT, so basically first you compile your java program into java byte code, and then once it's running there's also a JIT running to further optimise the bytecode ?

pjmlp · on Dec 7, 2019

Kind of.

First of all there are many JVM implementations, the commercial ones used to offer AOT compilation as well, going back to early 2000.

Then the ones that have JITs offer multiple flavours.

One way is to initially interpret the bytecodes, after enough information it is gathered, the first level JIT gets into action and compiles that block into native code, here block is usually a function, but can be something else.

This first level compiler is rather simple and does only basic optimizations.

The application keeps profiling execution and eventually notices that the already compiled block (into native code) keeps being used significantly, now it is time to bring the big brother JIT, which is somehow equivalent to -O3 on gcc, and recompile again to native code using all major optimizations.

Other JVMs (like JRockit) never interpret, when they start the first level is already the dumb level one compiler to native code.

Then all of them now support JIT caches, meaning after a run, the JITed methods get saved and re-used by next execution, so the profiler gets to learn from previous runs, and execution of the system already starts from a much better performance state.

thenewnewguy · on Dec 7, 2019

Would their interpretation would be correct for the HotSpot JVM? (As in does HotSpot start with interpretation and JIT "hot" code?)

pjmlp · on Dec 7, 2019

Yes, although HotSpot has multiple layers, not just two.

It initially interprets, and when a specific threshold is reached (you can configure it), the C1 compiler gets called into action doing basic optimizations.

After awhile if that native generated code keeps getting even more hot, the C2 compiler (the one with -O3 capabilities) gets called into action.

In both cases the optimized code gets safety guards to validate that the assumptions made by the JIT are still valid. For example if a dynamic dispatch always lands on the same method, then it gets replaced by a direct call instead. Even that is proven wrong, then the JIT throws the optimized code away and starts with the new assumptions.

Then in what concerns OpenJDK, you have actually 2 C2 JIT compilers available, HotSpot written in C++ and still the default, and Graal written in Java taken from GraalVM (nee MaximeVM) project. Currently Graal is much better than HotSpot in escape analysis for example, but worse in other scenarios.

In both cases, OpenJDK has inherited the JIT cache infrastructure from JRockit, so you also get to save the native code between runs, and start much faster in consequent runs.

As note, even though it is usually not a good idea, if you set the interpreter threshold to zero, then C1 kicks right at the beginning, but it won't have any information available, so the generated code is going to be most likely worse than just interpreting.

ddtaylor · on Dec 7, 2019

Heads up this could have been submitted as an HTTPS link instead.

mrtweetyhack · on Dec 7, 2019

"If you can’t write the .c files to compile, you can’t compile them." So it generates C, why not just compile the C into an executable?

sosodev · on Dec 7, 2019

It does. If you’re asking why not AOT compile a whole ruby app? Well Ruby is a really dynamic language and the generated C might not be valid after the app begins execution. I guess theoretically you could enumerate all of the different possible class states or the final class states but that sounds really hard.

If you actually need a compiled Ruby binary mRuby is the best option but it has its own drawbacks of course.

angelbob · on Dec 7, 2019

At least as of Ruby 2.6 I tried that and it really didn't work: http://engineering.appfolio.com/appfolio-engineering/2018/4/...

It may work a bit better now, but it's still very much not tuned for that. The result would probably still be slower than not JITting, for the same reasons mentioned for 2.6 in that link.

seaghost · on Dec 7, 2019

I’m closely following both Ruby and PHP communities and I can say that unfortunately PHP is light years away from Ruby in terms of language features.

sunseb · on Dec 7, 2019

Ruby: too little, too late. It's sad. :(

thomasfedb · on Dec 7, 2019

The majority of workloads still respond better to optimising for programmer satisfaction than for raw speed. Ruby still has plenty of relevance.

goatlover · on Dec 7, 2019

Meh, Ruby was never chosen because performance was the overriding concern.

sosodev · on Dec 7, 2019

I’m not sure I follow. The MRI is plenty fine for now and higher performance Ruby is in the pipeline.