It appears that Ruby 3 might come short of their 3x speedup goal [1][2] ... has ...

owens99 · on Aug 31, 2020

Look for tenderlove’s comments on TruffleRuby. The performance is at the expense of memory.

hinkley · on Aug 31, 2020

In Ruby, as in NodeJS, the GIL pushes you to scale horizontally. The memory footprint of Hello World becomes a big problem, because the number of copies you run will be proportional to the number of cores you have, not the number of machines. You get no benefit from moving from an 8 core box to 16 or 20 cores.

I suspect if they do manage to pull off more concurrency in Ruby 3, that vertically scaling machines will make more sense. If 8 cores benefit from a shared footprint, instead of one core per process, then the budget looks more attractive.

So now might not be the right time to cherry-pick some of these features, but it may not be far off.

schneems · on Sept 1, 2020

> GIL

FWIW the GIL has been the GVL since YARV was merged in and it became based on a virtual machine rather than purely interpreted. I believe this was 2.0.

> because the number of copies you run will be proportional to the number of cores you have, not the number of machines

While this is true, Ruby is also very CoW optimized so while forks grow linerally in size (with count), usually the first fork is drastically smaller than the process it was forked from.

I work at Heroku and recommend perf settings to customers. 5 years ago people were mostly hitting memory limits. Now it's pretty common to see apps that are maxing out the CPU well before coming close to ram limits.

Especially when compared to javascript, Ruby is extremely memory efficient.

I agree with your larger statement but wanted to chime in and expand on those two points.

jashmatthews · on Sept 1, 2020

CRuby could still be much better at CoW. In theory, a forked process only needs a similar memory allocation to a pthread. In practice the runtime writes in a bunch of these inherited pages and fucks it up. malloc-ed memory is usually bigger than the "Ruby heap" so that kind of limits the impact you can have by trying to not write/re-write.

wjossey · on Sept 1, 2020

Just want to reaffirm this post. I scaled ruby for a living for almost 8 years to millions of request per minute and this post is 100% accurate.

throwdbaaway · on Sept 2, 2020

The high memory usage of ruby still causes problem if the app is single-threaded. I scaled databases for ruby apps for a living for almost 8 years, and sadly single-threaded legacy ruby app is still a thing.

Anyway, in the single-threaded scenario, the app may appear to be CPU bound under the steady state. However, when some hiccup happens in a database or in another microservice, all the ruby processes could soon be blocked waiting for network responses. In this case, ideally there should be plenty of idling ruby processes to absorb the load, but it will be rather costly to do so due to the high memory usage.

There are potential fixes of course, but with trade-offs:

- Aggressive timeout: May cause requests to fail under the steady state

- Circuit breaker: Difficult to tune the parameters, may not get triggered, or may prolong the degraded state longer than necessary. Also not a good fit when the process is single-threaded, as it can only get one data point at a time.

- Burning money: Can only do this until we hit the CPU : memory ratio limit imposed by the cloud vendors.

- Multi-threading: Too late to do this with years of monkey-patching that expects the app to run single-threaded.

jashmatthews · on Sept 2, 2020

Dealing with latency variability is a Hard Problem™ and really not much to do with Ruby or process vs thread parallelism.

https://dl.acm.org/doi/pdf/10.1145/2408776.2408794

throwdbaaway · on Sept 2, 2020

Well, having more spare ruby processes / threads would make the app more resistant to latency variability, and could have made some incidents into nonevents.

Also, while I don't disagree that it is indeed a hard problem, I do have very good experience with an async java stack, where I didn't have to worry about things like this. As long as a sane queue limit is defined on let's say the jetty http client, if something bad happens at the other end, the back pressure would kick in by failing immediately the requests that couldn't make it into the queue. Other parts of the app would then continue to be functional.

So, I would contend that it has a lot to do with ruby high memory usage, made much worse when single-threaded, and it looks like ruby 3.0 still won't have a complete async story yet?

EDIT: I checked the link again, and it looks Jeff Dean was talking about latency at p999 or above? By "hiccup", I actually mean something that would increase avg latency by perhaps 5~10x times, e.g. avg latency of 100ms under steady state + timeout of 1 second + the remote being down. Sorry for the confusion. Here, I am lucky if people start caring about p95.

jashmatthews · on Sept 2, 2020

That's not an inherent property of a particular language or concurrency model, though. That's having logic to track request queue depth for a particular service or endpoint and fail fast/load shed. You can do the same in Ruby! Some would probably say this is what a service mesh is for.

Maybe you're thinking of the new Actor based model for compute parallelism? Async IO in production Ruby has been a thing for easily more than a decade.

throwdbaaway · on Sept 2, 2020

Of course it is not an inherent property of a particular language or concurrency model, but it is a property of a particular language ecosystem. As a turing complete language, everything is doable in ruby, but at what cost? Now we are back to trade-offs I listed above.

As for async IO in production, looking at the client library, https://github.com/socketry/async-http is barely 3 years old, and probably reached the production-ready state a few months ago, if we are being generous.

But good point about service mesh. Moving the circuit breaker responsibility to the service mesh would definitely help in my case, as the sidecar would have all the data points from the 10+ single-threaded ruby processes running in the same pod, and thus could make a much quicker decision.

jashmatthews · on Sept 3, 2020

https://github.com/socketry/async-http is just a new client library. PostRank, Zynga, CloudFoundry were all running async IO in production on Ruby ~10 years ago. CRuby support for non-blocking IO dates back to the mid 2000s. https://github.com/eventmachine/eventmachine actually dates back even further

If you're using Unicorn then you've already got Raindrops which gives you a really simple way to do shared metrics across forked processes like in-flight requests to another service or how many of your Unicorns are busy.

throwdbaaway · on Sept 3, 2020

EventMachine has been losing steam for awhile now, which is why I brought up Async as the new hotness. I don't think it is fair to classify async-http as "just a new client library". As of now, in the ruby ecosystem, the Async framework is the only player in town. From my perspective, it still looks pretty much unproven, but perhaps we just live inside different bubbles.

It kinda feel like we are talking past each other here. I would just like to clarify that I inherited all these different ruby apps, and I don't have the magical ability to go back in time and say "Hey, perhaps we should use an async framework from the beginning" or "Dude, enough with the monkey-patching". And even if I do, those could be bad advice, as the ruby apps are making money in production.

Anyway, thanks for the suggestion to share metrics across processes. That will definitely help with the circuit breaker decision making in my case.

jashmatthews · on Sept 1, 2020

There are some important differences here between NodeJS and Ruby. NodeJS child processes are completely independent and created with spawn. https://github.com/nodejs/node-v0.x-archive/issues/2334

CRuby forks using fork() and Copy-on-Write shares memory from parent to child.

JRuby doesn't have a GIL so you only need a single process. Same with TruffleRuby.

With CRuby, you're much better to run a bigger container with multiple processes than one process per container.

With either NodeJS or CRuby you're still better to run less containers on bigger hosts. Each host has to duplicate the host OS and container infrastructure. Each container of a real production app also duplicates a bunch of stuff despite Docker's best attempts at sharing.

schneems · on Sept 1, 2020

> JRuby doesn't have a GIL

Neither does CRuby! It's been the GVL since YARV was merged ;)

eregon · on Sept 1, 2020

It's exactly the same thing though, isn't it? I use both terms interchangeably. And it seems the term GIL is better known than GVL.

chrisseaton · on Aug 31, 2020

TruffleRuby doesn't have a GIL though.

3np · on Sept 1, 2020

NodeJS is single-threaded while Ruby has native threads and "fibers" - what makes you say you wouldn't be able to utilize additional cores in Ruby?

schneems · on Sept 1, 2020

Fibers are still restricted by the GVL. Threads are also restricted by the GVL. The only "true" concurrency in MRI is to fork a process.

winrid · on Sept 1, 2020

You can. You can with Node too.

With Node you can just use workers. I have tools I wrote in Node that can max out my 16 core MacBook.

3np · on Sept 1, 2020

Some major differences here are how they interface with I/O and the mechanisms around memory sharing.

Nodejs workers are more like webworkers and mostly suitable for proper CPU-intensive parallelization whereas in Ruby it's not uncommon to run e.g. multithreaded web server in the same process and namespace.

eregon · on Aug 31, 2020

Which comments?

loic-sharma · on Aug 31, 2020

https://blog.heroku.com/ruby-3-by-3/

norswap · on Aug 31, 2020

Fair warning: this is from 4 years ago.

eregon · on Sept 1, 2020

That's rather vague. But yes, no matter which JIT you always need some extra memory to run the JIT, and it creates a more optimized version while also needing the unoptimized version of the code, so it needs more memory.