"We end up being able to do many more stack allocations, with less memory fragme...

pcwalton · on Sept 16, 2014

> There's no reason why a program written in Ruby can't do the same stack allocations, reduced memory fragmentation and predictable performance automatically - if the Ruby VM was better designed.

Escape analysis falls down pretty regularly. No escape analysis I know of can dynamically infer the kinds of properties that Rust lets you state to the compiler (for example, returning structures on the heap that point to structures on the stack in an interprocedural fashion).

This is not in any way meant to imply that garbage collection or escape analysis are bad, only that they have limitations.

pkolaczk · on Sept 16, 2014

Rust must statically prove lifetime of references to stack allocated variables does not exceed lifetime of the variables they point to at compile time, in order to be 100% memory safe. How is that different than just an advanced escape anlysis? Theoretically a VM could do much more, because it could do speculative escape analysis (I heard Azul guys were working on such experimental thing called escape detection) and even move variables from stack to heap once it turns out they do escape.

tomp · on Sept 16, 2014

You could do escape analysis equivalent to Rust's only if you inlined all the functions. Sometimes, that's simply not possible, e.g. with separate compilation.

On the other hand, Rust is still able to perform its kind of escape analysis (via lifetime-tracking type system), because the relevant facts are embedded in type signatures of Rust functions, and as such must be present even for separate compilation (even if the actual implementation of the function is unknown).

pkolaczk · on Sept 16, 2014

VM could do see all the functions and do whole-program analysis. Inlining is an orthogonal concept.

pcwalton · on Sept 16, 2014

You could in theory, but that analysis would likely be really slow.

In any case, without the type system and compiler to enforce the discipline the programmer is going to lose a lot of control and predictability.

pkolaczk · on Sept 16, 2014

Not necessarily. As you said, Rust encodes that information in type signatures. Exactly the same information can be used in a VM and it could do escape analysis one method at a time then.

tomp · on Sept 16, 2014

That's true. Maybe it would work, but I wonder if anyone has attempted it before...

wtetzner · on Sept 16, 2014

For one thing, Rust can reject programs at compile time if it's not satisfied with its analysis. For Ruby to get that, you might have to restrict the language in a similar way.

shin_lao · on Sept 16, 2014

The JVM, who is several order of magnitudes better than the Ruby VM, constantly gets smashed in terms of performance and memory usage by a well designed C++ program.

Why? Because in theory, a VM should be able to do as well, if not better, than a programer.

The limit of a VM is that you are extremely limited in the amount of resources you can allocate to the VM to determine how much resources should be freed. See what I mean?

In the end the VM is nothing less than a program. With RAII, for example, there is zero overhead associated with the decision of releasing a memory block, because it's compiled into the program.

On top of that if you start to add some nasty memory optimization tricks allow you, you start to get why manually managing memory is still very interesting when performance and/or memory usage is important.

alkonaut · on Sept 16, 2014

Both the JVM and the CLR are regularly beaten by C/C++, despite claims that they are "as fast". What gives?

In my view the whole problem comes down to whether or not you are using an idiomatic approach or not. In idiomatic C#/Java you use a lot of heap allocations, garbage collection, you may be using dynamic dispatch and so on.

If you write a C# program that uses stack allocation only (no classes, only structs and primitives), no inheritance/polymorphism, no exceptions, you should find that the CLR stands up pretty well to a C++ program. Sadly, what you have done then is essentially code in a very small subset of C#, OR you have achieved something that is so hard and so prone to foot-shooting you could just as well have used C++ to begin with.

To reverse the argument: if you use C++ with loads of garbage collected objects etc. you will end up with performance similar to a java/C# program. But in idiomatic C++, you usually don't.

moonchrome · on Sept 16, 2014

C# lacks the language features to make dealing with such coding style sane and the libraries use the "idiomatic" approach - if you're willing to put on a straitjacket and throw away the libraries then why bother with C# ?

That's not saying that C# can't be made more efficient by avoiding allocation in performance critical sections, but overall it's going to perform worse than C++, both idiomatic and non idiomatic versions. It's just that for most use cases C# is used the performance difference isn't relevant anyway.

Java doesn't even have value types, so it can't even come close in terms of memory layout efficiency and avoiding allocations without perverse levels of manual data structure expansions - for eg. consider how would you do the equivalent of : struct small { int foo; float bar; char baz; }, std::vector<small>.

alkonaut · on Sept 16, 2014

I agree completely. For an "inner loop" scenario in C# you have to use the straight jacket version of C#. Luckily you can use idiomatic C# for the rest of the program.

The option to using "straight jacket C#" is using C/C++ interop for a part of the program. If that part is large enough, such as in most high end games, it's usually worth biting the bullet and go C++ throughout. Luckily again, those programs are rare.

Point is still that C-style C# is almost as fast as C++ (modulo compiler specific optimizations) but for the reasons above, that fact isn't very interesting.

pkolaczk · on Sept 16, 2014

Java will have value types soon. Also, C++ does some heap allocations behind-the-scenes quite often (e.g. with vector or string) which are hard to get rid of and they are much more expensive than in JVM/CLR. So not always idiomatic C++ smashes JVM/CLR in performance. YMMV and typical differences are small enough to not matter for most of server-side software like web-apps, web-servers or databases.

nickik · on Sept 16, 2014

> What gives?

What gives is this. The people who right these fast C/C++ programs that beat Java/C# are usally far more skilled and trained.

Any programmer who know neither language well and has to write a big application in it will probebly find java/c# far easier.

I remember a long blogpost where somebody set out to test this on himself. And that guy was a very good C++ programmer. He found that his C++ prgrogrammer was slower, but he then set on improving the speed and in the end beat java by quite a bit. However the amount of effort was completly unreasonable for most programmers.

So "What gives" is this, and this has been true for a long time. If you are a expert in a low level language and spend time optimizing you will probebly beat Java/C#.

I would suggest that you should look into what a JIT or GC can and can not do. Some of the performance problems you identivy are really almost never a bottleneck anymore.

dbaupp · on Sept 16, 2014

(I think shin_lao was actually talking along the same lines, saying that even the hyper-optimised JVM often isn't as good as a good C++ program.)

alkonaut · on Sept 16, 2014

I agree, my point is merely that the bulk of the difference between idiomatic Java programs and idiomatic C++ programs is due to the wildly different ways of of programming idiomatically in Java vs. C++.

The small (steady state) performance difference remaining when doing the "exact same thing" in both programs is just down to how good the C++ compiler is vs. the VM JIT at optimizing (usually better, sadly).

What intrigues me about Rust is that hopefully we won't have to choose between readability and elegance vs. performance and safety. Keep up the good work.

dbaupp · on Sept 16, 2014

It is very hard to have all this done reliably, especially in a rather dynamic language like Ruby. Maybe the Ruby VM could be better designed to perform these sort of optimisations, but it will never give the same control as a language like Rust, C or C++; i.e. a small code change could cause the optimiser to fail, leading to slow-down and bloat.

Furthermore, the hypothetical "sufficiently smart VM" isn't much value for code written now.

vidarh · on Sept 16, 2014

[Chris has done some fantastic work on a Truffle / Graal backend for jruby; for my part I'm (slowly) working on a "mostly-ahead-of-time" Ruby compiler]

I'm not sure I'd agree with "never", though I do agree Ruby is a hard language to optimize.

There are two challenges with optimizing Ruby: What people do and don't know the potential cost of, and what people pretty much never do, but that the compiler / VM must be prepared for.

The former includes things like "accidentally" doing things that triggers lots of copying (e.g. String#+ vs String#<< - the former creates a new String object every time); the latter includes things like overriding Fixnum#+, breaking all attempts at inlining maths, for example.

The former is a matter of education, but a lot of the benefits for many things are masked by slow VMs today that in many cases makes it pointless to care about specific methods, and an expectation not to think much about performance (incidentally, it's not that many years ago that C++ programmers tended to make the same mistakes en-masse)

The latter can be solved (there are other alternatives too), or at least substantially alleviated, in the general case by providing means of making a few small annotations in an app. E.g. we could provide a gem with methods that are no-ops for MRI but that signals to compilers that you guarantee not to ever do certain things, allowing many safeguards and fallbacks to be dropped.

Ruby's dynamic nature is an asset here in that there are many things where we can easily build a gem that provides assertions that on e.g. MRI throws an error if violated or turns into a no-op, but that on specific implementations "toggle" additional optimizations that will break if the assertions are not met. E.g. optional type assertions to help guide the optimizer for subsets of the app.

In other words: how optimized Ruby implementations we get depends entirely on whether people start trying to use Ruby for things where they badly need them.

InclinedPlane · on Sept 16, 2014

Meta note: I saw that this post was being pretty heavily downvoted and upvoted it to keep it from falling out of the discussion. I disagree with the assumptions in the post, but they are handily refuted by replies. I'd rather have a robust discussion with every reasonable side well represented than a boring echo chamber.

I urge other folks here to not merely blindly downvote posts they disagree with, reserve downvoting for posts that don't deserve to be seen at all because they don't contribute to the discussion.