Ruby might be faster than you think

throwaway_62022 · on April 25, 2024

>The Ruby implementation has a subtle mistake which causes signficantly more work than it needs to.

To be fair, I do not think that is a "mistake" as such. I have written Ruby professionally for 6 years or so and have committed to several Ruby open source projects and haven't seen an innocus `nil` sitting at the end of a loop, to prevent array allocation.

The argument would be fair, if it wasn't idiomatic Ruby.

More like - knowing internals of a language will allow one to gain more performance out of it. That has been true for almost every programming language, but general speaking the goal of a VM based language is to not require that _specialized_ knowledge.

the_gastropod · on April 25, 2024

> if it wasn't idiomatic Ruby

It's idiomatic Ruby in a very particular case that likely was explicitly chosen to demonstrate such a dramatic effect.

You're _usually_ not returning implicit arrays from loops in production code. Parallel assignments, when they're used, are almost always in the first line of an initialize method, not the returned line of an enumerable block.

wrs · on April 25, 2024

I don’t think there’s any language, interpreted or otherwise, with the goal that knowing its internals won’t help you gain more performance. I mean, that would be nearly impossible.

Ruby code, perhaps more than most code, is written for readability and “beauty”. It’s a part of Ruby culture that I greatly appreciate. But if you care about performance, you will act differently, regardless of language. And the whole point of this code is to show that if you care about performance above all else, there’s of plenty of room to maneuver in interpreted Ruby.

irjustin · on April 25, 2024

That's an interesting question, how does the YJIT perform using the original code? Does it find the optimizations that it results in the same gain such that you don't actually need to personally know the optimization?

hiyer · on April 25, 2024

On my machine, the YJIT version of the original code is only ~30% faster than the non-YJIT version

    ~/scripts > ruby fib.rb                                                                                                                                                                                                                                                                     
    2.3346780000720173
    ~/scripts > ruby --yjit fib.rb                                                                                                                                                                                                                                                           
    1.5913339999970049

So looks like YJIT doesn't "know" about this optimization

irjustin · on April 25, 2024

Ah thanks so much for trying it out. Interesting that it couldn't figure out that code path

nirvdrum · on April 25, 2024

I think it's just a matter of time. YJIT is still fairly young and doesn't do extensive inlining at the moment. If it did inline the block it could see the array is unused and avoid the allocation.

Running the original fib benchmark (i.e., without the author's technique to eliminate the array allocation) on an M1 Pro, I see:

  CRuby 3.3.1:
  2.058589000022039

  CRuby 3.3.1 w/ YJIT:
  1.4314430000958964

  TruffleRuby 24.0.1 (Native):
  0.20155820800573565

  TruffleRuby 24.0.1 (JVM):
  0.1336908749944996

I took the best time out of three for each implementation, but there wasn't that much variance over all. Standard caveats about benchmarking on an actively used laptop apply.

Running the new prime_counter benchmark that the crystalruby author mentions in another thread¹, I see:

  Crystal 1.12.1 (LLVM 18.1.4) w/ crystalruby 0.2.0 in CRuby 3.3.1:
  0.34096299996599555

  CRuby 3.3.1:
  2.9615250000497326

  CRuby 3.3.1 w/ YJIT:
  1.640430000028573

  TruffleRuby 24.0.1 (Native):
  0.2504862080095336

  TruffleRuby 24.0.1 (JVM):
  0.25282600001082756

YJIT and TruffleRuby make different trade-offs, so I'm not trying to say the latter is necessarily better. But, I think the TruffleRuby numbers show what are possible in terms of Ruby optimization. Unfortunately, there's currently an issue in TruffleRuby with one of the crystalruby gem's dependencies³, so I had to extract the Ruby benchmark out to a separate file. incompatibility.

¹ -- https://news.ycombinator.com/item?id=40153218

² -- The method_source gem used by crystalruby catches exceptions and matches against the message² for some conditional handling. TruffleRuby 24.0 now uses to Prism as its parser and Prism has an exception message with slightly different wording from CRuby. Consequently, method_source's handling doesn't work with Prism. It's hard to say where the compatibility issue lies, since exception messages aren't stable APIs. We'll get it sorted out.

³ -- https://github.com/banister/method_source/blob/06f21c66380c6...

Alifatisk · on April 26, 2024

> TruffleRuby 24.0.1 (JVM): > 0.1336908749944996

That's impressive numbers for running the unoptimized code. I might give TruffleRuby a shot!

elif · on April 25, 2024

>I don’t think there’s any language, interpreted or otherwise, with the goal that knowing its internals won’t help you gain more performance.

It is, indeed, a fundamental goal of ruby that there are multiple ways to write the same thing, and that the programmer should not need to understand nuances of the compiler.

"I need to guess how the compiler works. If I'm right, and I'm smart enough, it's no problem. But if I'm not smart enough, and I'm really not, it causes confusion. The result will be unexpected for an ordinary person. This is an example of how orthogonality is bad." -matz 2003.

wrs · on April 25, 2024

That's about semantics, not performance. I don't think Matz would say that a goal of Ruby is to prevent you from improving performance using whatever knowledge you do have about the compiler.

In this particular example, the fact that assignments return the value of the right-hand side is well-known and used frequently in Ruby code. The fact that arrays have to be allocated is obvious. The fact that allocations have a runtime cost is obvious. The only thing that isn't obvious is that the return value allocation of assignments whose value isn't used are optimized away. If you know that, you'll think of appending the nil to activate that optimization. Characterizing the lack of that step as a "mistake" only makes sense if the goal for your code is to maximize performance -- which in this case, most unusually for Ruby, it was.

itake · on April 25, 2024

+1. I love golang, because for the most part, there is only 1 way to do something. With ruby, there are a billion ways to do the same thing, with some being slower than others.

the_gastropod · on April 25, 2024

I've just started learning Go as a very long time Rubyist. I really enjoy both languages for very different reasons. In Ruby, I can write code that really makes me happy to read. Enumerable is just wonderful. You can go a long way in Ruby without writing a single if statement. It's great. If I'm working on a solo-project, it's the language I'd choose every time. But working with inexperienced or people who "know" Ruby, but never adopted "the Ruby way" is a nightmare. Ruby code, written poorly, can be extremely brutal to follow. When the great deal of freedom Ruby offers isn't handled responsibly, a hot mess can ensue.

Go is the opposite. It's great, as you say, because it's dirt simple. It's a brutalist get-the-job-done kind of language, and I think if I were to start a company working with other engineers, I'd absolutely choose Go for that reason. It's easy to read. It's easy to reason about. And there's very little implicitness in it.

JohnBooty · on April 25, 2024

    with some being slower than others.

Do we all agree that in practice, these sort of micro-optimizations almost never matter?

It's certainly easy to think of situations where they do matter, but unless your project is FaaS (Fibonacci As A Service) probably not.

itake · on April 26, 2024

I've seen that more than a few cases where people used the wrong data structure (like array, for O(N)) look ups, instead of a hash.

All the inefficiencies add up, at scale. a 3% inefficiency means you're spending ~3% more on compute. CI takes longer. Dev velocity decreases.

ffsm8 · on April 25, 2024

> +1. I love golang, because for the most part, there is only 1 way to do something.

Are you aware that you're referencing the Python mantra with that? Feel free to Google it, it's from 2004.

There should be one-- and preferably only one --obvious way to do it.

JohnBooty · on April 25, 2024

    the Python mantra with that?

Offtopic but, I switched from Ruby to Python for a new job about six months ago.

While I love Ruby, I was looking forward to some of that "one way to do things" simplicity I'd been promised by Python.

Boy... were my hopes crushed. There are a lot of possible ways to do any given thing, from iteration to package/import structures, etc.

For the most part it seems like the proliferation of options in Python is pretty sane; I can generally see why each choice was made to address existing pain points. So kudos to Python for that. But man, did they leave "one way to do it" behind a loooong time ago.

ffsm8 · on April 25, 2024

That they did, indeed.

And they honestly never really adhered to it either. It was just a knee-jerk reaction to a (I believe Haskell) presentation that said something like "we got n ways to do x", where n was in the double digits and x was something extremely uninteresting like looping over an array.

Can't really recall details though, it was old knowledge by the time I Heard about it around 2010 and a quick Google didn't help me dig it up

WhatIsAModel · on April 25, 2024

How are there companies with Ruby source code making enough money to hire full time Ruby devs?

simonw · on April 25, 2024

Ask GitHub, Shopify and Stripe.

nirvdrum · on April 25, 2024

Presumably by streamlining development so they can quickly deliver functionality to paying customers.

lolinder · on April 25, 2024

I found something similar with Java's JIT.

I had a very hot loop (runtime of 15 minutes) that I wanted to speed up. I profiled it over and over again in IntelliJ, identifying every possible allocation I could eliminate. When I was done there were zero allocations in the hot loop and the thing ran something like 4x faster than it had previously.

At that point, looking at the code, I realized that what I had written was very similar to how I'd have implemented it in Rust—I allocated a bunch of structs upfront and then "borrowed" them into the various parts of the algorithm. Since it was so close in style to Rust anyway, I decided to port it over and see if I could get any more performance out of it by being closer to the metal.

It turned out that the difference in performance between the Rust version and the Java version was statistically insignificant. I tried a few different optimization settings but didn't manage to get Rust to be any faster than Java's JIT.

This was eye opening to me. Now that most runtimes have JIT compilers, I suspect that far more important than choosing the right language is deeply understanding how the language you're working with works under the hood so you can eliminate hot spots and unnecessary allocations.

nirvdrum · on April 25, 2024

When you say Java's JIT, I presume you mean C2? I'd be curious to hear how it performs with Graal. If what you ran into was truly side effect free, either the OpenJDK or GraalVM teams would be interested in seeing your use case for further optimization. You're right, this isn't something an application developer should have to think about with a good JIT compiler in place, but JITs are complicated and having real world code samples that aren't performing well are incredibly useful to the compiler devs.

pjmlp · on April 25, 2024

Note that there are plenty of Java JIT's to chose from, depending on the JVM implementation.

To pick on your example, back when Android used Dalvik as a plain interpreter, followed by a basic JIT method tracing, they implemented floating point math on native code.

Nowadays the JIT takes care of it,

https://developer.android.com/reference/android/util/FloatMa...

Unfortunely it has taken us several decades to catch up to Lisp and BASIC (the original Dartmouth BASIC), were already offering in the 1970's.

cjen · on April 25, 2024

YJIT and Ruby 3.3 have really impressed me as well. Their VM engineers are clearly doing something right.

Related to Ruby perf, I still hear folks worried about rails “not being able to scale”. Let me say something controversial (and clearly wrong): Rails is the only framework that has proven it _can_ scale. GitHub, Shopify, AirBnb, Stripe all use rails and have scaled successfully. Very few other frameworks have that track record.

There’s plenty of reasons to not use rails, but scaling issues doesn’t feel like a strong one to me.

dbosch · on April 25, 2024

I'm with you on this claim :)

But, for the sake of truth: - AirBnB migrated from Rails to a micro-services architecture (which I think, they regretted doing too early - I read that somewhere I believe) - Stripe never used Rails: they use Ruby (and Sinatra for the Web part - i.e. dashboard).

But it's true that Github and Shopify both use and scaled Rails monoliths. There are showing the way :)

weaksauce · on April 25, 2024

gitlab too.

Runways · on April 25, 2024

>Rails is the only framework that has proven it _can_ scale.

Citation? Seems like a pretty extraordinary claim.

aeze · on April 25, 2024

It was mentioned in the OP but: GitHub, Shopify, AirBnb, Stripe

pjmlp · on April 25, 2024

There are enough business of similar scale being powered by J2EE (or whatever you want to call it today), Spring and ASP.NET.

And a well known fruity brand used to have all their online services on Web Objects for several years.

Wouterken · on April 25, 2024

As the author of the library referenced in the linked post, I'd like to add a small clarification, in that the example in the README.md was not one specifically cherry-picked to demonstrate an unrealistic advantage, but was chosen simply because it's a widely recognized example of a CPU bound algorithm. In reality, this example actually does more to demonstrate one of the major weaknesses of this library, which is the significant overhead exhibited by the FFI interface between Ruby and Crystal for trivial operations.

This particular example crossed this painful divide 1 million times. I found it interesting that despite this disadvantage, the Crystal implementation was still able to take the lead over my identical, naively written Ruby implementation (warts and all). As the author of this post points out, for trivial operations crossing the interface at high frequency, finely tuned Ruby will easily take the lead!

That said, I still believe there are times where having the ability to write and interface with a performant, precompiled language (that is somewhat familiar to the average Rubyist) in an ergonomic way that avoids the need to context switch can be beneficial. Sure, performance is unlikely to match a finely tuned (but arguably more difficult to maintain) C or Rust extension and ergonomics are unlikely to match an approach that sticks to pure Ruby, but it exposes a new middle ground, which at times, may just hit the right spot!

I'd imagine realistic examples of where this type of library could be useful might include:

- Providing an easy way to expose and use high-quality Crystal shards from within your Ruby program.

- Allowing you to easily write performant CPU or memory-intensive procedures for which reusable native libraries do not exist, and where the majority of the overall execution time can be spent within Crystal.

- As a way to glue several different smaller Crystal shared objects together into a single application using Ruby glue code, allowing you to avoid some of the high compile times you might typically see with a large monolithic binary.

I would definitely not suggest this library has any business:

- Blindly replacing swaths of Ruby methods, without any tangible performance metrics to back this decision.

- Replacing code that is already highly performant in pure Ruby (whether that's code that lends itself well to being JIT'd, is backed by an existing native library etc.)

Funnily enough, if you take a look at the commit history of the project, you'll notice that last week I actually replaced the referenced example with one that better demonstrates a performance difference (even compared against YJIT) and crosses the FFI divide only once. This came as a result of having to introduce a Reactor to get the library to play nice in multi-threaded Ruby applications, which regrettably added even more overhead to the FFI interface and further hammers home the point that this library is not going to perform well in cases where you need to jump between Crystal and Ruby at high frequency.

neonsunset · on April 25, 2024

No, it's not. Maybe only faster than when I bash it jokingly, using hyperbole (it won't take eternity for Ruby to do things that C# computes in a moment, only a half of it)

On the content of the post: "Now it’s Ruby that’s 5 times faster than Crystal!!! And 20x faster than our original version. Though most likely that’s some cost from the FFI, or something similar, though that does seem like a surprising amount of overhead."

There are tools to provide a definitive answer to this, and no, FFI is not a silver bullet solution to slowness of interpreted (or JIT compiled but dynamically typed) languages.

ao98 · on April 25, 2024

Honestly it’s been my experience with Ruby that it’s Rails that can potentially be slow. Ruby is quite fast and even has a JIT option. Rails is by design opinionated, and for some cases I’ve found that I’ve had to work extra hard to ensure performance. That means refactoring code in slightly non traditional ways and having a deeper understanding of how Rails works under the hood (esp in the ORM). So if you think you can just use Ruby+Rails out of the box in its simplest form without experience and depth of understanding: yes it might be slow. But like with all things, you can go quite far with care and experience.

bullfightonmars · on April 25, 2024

It’s not even that rails is necessarily slow, it’s more the way you use it that is slow. If you tie all your businesses logic to the database and commit complicated changes in transactions, sure it will be terribly slow.

fhdisjbsdjs · on April 25, 2024

What’s wrong with complicated (I’m not sure what that means - large numbers of rows updated? Disparate rows updated?) transactions? Depending on your RDBMS (and what it’s running on, config options, etc) this may or may not be slow.