Node.js and io.js – Very different in performance

bnoordhuis · on Jan 19, 2015

Interesting results, thanks for sharing. I can perhaps shed some light on the performance differences.

> Buffer 4.259 5.006

In v0.10, buffers are sliced off from big chunks of pre-allocated memory. It makes allocating buffers a little cheaper but because each buffer maintains a back pointer to the backing memory, that memory isn't reclaimed until the last buffer is garbage collected.

Buffers in node.js v0.11 and io.js v1.x instead own their memory. It reduces peak memory (because memory is no longer allocated in big chunks) and removes a whole class of accidental memory leaks.

That said, the fact that it's sometimes slower is definitely something to look into.

> Typed-Array 4.944 11.555

Typed arrays in v0.10 are a homegrown and non-conforming implementation.

Node.js v0.11 and io.js v1.x use V8's native typed arrays, which are indeed slower at this point. I know the V8 people are working on them, it's probably just a matter of time - although more eyeballs certainly won't hurt.

> Regular Array 40.416 7.359

Full credit goes to the V8 team for that one. :-)

mraleph · on Jan 19, 2015

I can explain what happened to Array case. 100000 used to be the threshold at which new Array(N) or arr.length = N started to return a dictionary backed array. Not anymore: this was changed by https://codereview.chromium.org/397593008 - now new Array(100001) returns fast elements array.

I will check out what happened to Buffer/TypedArray. Should not degrade that much unless something really goes south here.

mraleph · on Jan 19, 2015

Ok reporting back. There are two issues here.

The first major one is related to mortality of TypedArray's maps (aka hidden classes). When typed array stored in the Data variable is GCed and there are no other Uint8Array in the heap then its hidden class is GCed too. This also causes GC to find and discard all optimized code that is specialized for Uint8Array's and clear all type feedback related to Uint8Array's from inline caches. When we later come and reoptimize - optimizing compiler thinks that cleared type feedback means we need to emit a generic access through the IC (there is reasoning behind that) because this is potentially going to be a polymorphic access anyways. I have filed the issue[1] for the root cause (mortality of typed array's hidden class).

Now there is a second much smaller issue (which also explains performance of the Buffer case) - apparently there were some changes in the optimization thresholds and OSR heuristics. After these changes we hit OSR at a different moment: e.g. I can see that we hit inner loop one that loops over `j` instead of hitting outer loop which leads to better code. In V8 OSR is implemented in a way that tries to produce optimized code that is both suitable for OSR and as a normal function code - this is done by adding a special OSR entry block that jumps to the preheader of the selected loop we are targeting with the OSR. This allows V8 to reuse the same optimized code without optimizing it again for the normal entry - but this also leads to code quality issues if OSR does not hit the outer loop because OSR entry block inhibits code motion. This is a know problem and there are plans to fix it. The hit usually is quite small unless you have very tight nested loops (like in this case).

Disabling OSR (--nouse-osr) not only "solves" the second issue but it also partially fixes (hides)the first issue: 1) we no longer optimize with partial type feedback - so we never emit generic keyed access but always specialize it for the typed array 2) we no longer emit OSR entry - hence no code quality issues related to it.

[1] https://code.google.com/p/v8/issues/detail?id=3824

mschoebel · on Jan 19, 2015

Very interesting. After reading your comment, I tried allocating another Uint8Array and keeping it allocated throughout the entire test as a workaround for the issue you mentioned. Time for Node.js was unchanged, but io.js was down to about 5.5s now. Almost the same time as Node. Only about 10% slower.

The same happens when I use the --nouse-osr parameter that you mentioned.

mraleph · on Jan 19, 2015

Is it 10% slower even if you keep array alive and apply --nouse-osr (to both node.js and io.js)?

On my machine results are fluctuating within the same ballpark (though I am on Linux and benchmarking 64-bit builds).

mschoebel · on Jan 19, 2015

Ok, I hadn't tested with both before. Keeping the array alive and using --nouse-osr makes io.js only 2.3% slower than my original measurement for Node 0.10.35. Median of 5058ms.

And Node 0.10.35 shows basically the same results as before. I see less than 1% difference. Maybe just random fluctuation. Even if not. 1% is irrelevant.

mschoebel · on Jan 22, 2015

I just posted a follow-up blogpost, comparing Node 0.11.15 and io.js 1.0.3 which were both released yesterday.

In that post I also benchmarked the various fixes for the typed-array slowdown you mentioned. BTW --nouse-osr makes all three tests run faster.

http://geekregator.com/2015-01-21-node_js_0_11_15_and_io_js_...

mraleph · on Jan 23, 2015

Thanks for the update.

I posted this reply on your site, but I will duplicate it here for the sake of HN readers:

> BTW --nouse-osr makes all three tests run faster.

As I tried to explain above: OSR at it is implemented now impacts code quality depending on which loop OSR hits. Which in turn depends on heuristics that V8 uses. These heuristics are slightly different in newer V8. As a result of these changes V8 hits inner loop instead of outer loop. This leads to worse code.

Code that benefits from OSR is the code that contains a loop which a) can be well optimized b) runs long b) is run only few times in total. The Sieve benchmark is opposite of this and as a result it doesn't benefit from OSR - you get bigger penalty from producing worse code and no benefit from optimizing slightly earlier.

Not using OSR for Sieve also hides the other issue with mortality of typed array's hidden classes. I say "hides" not "fixes" because one can easily construct a benchmark where the mortality would still be an observable performance issue even if benchmark itself is run without an OSR: https://gist.github.com/mraleph/2942a14ef2a480e2a7a9

fulafel · on Jan 19, 2015

Does the dramatic speed difference between the "non-conforming" implementation and V8 mean that current Node typed arrays are not memory-safe and you may get C-style buffer overflow vulnerabilities when using them?

trevnorris · on Jan 20, 2015

"Non-conforming" only means they didn't completely adhere to the ES specification. There should be no possibility of buffer overflow.

mschoebel · on Jan 19, 2015

FWIW I also did a test of Node:master and that performance was within 2% of what I measured for io.js.

Interesting background about typed-arrays. I didn't know that. Thanks!

mikkom · on Jan 19, 2015

> FWIW I also did a test of Node:master and that performance was within 2% of what I measured for io.js.

It would have been a good thing to include that comment to the article as well.

mschoebel · on Jan 19, 2015

I thought about that. But it would have diminished the point I was trying to make: Always test with different versions as performance may differ by a LOT.

jacquesm · on Jan 19, 2015

Well, if the point is that they are different then I see what you're saying but in actual fact the point seems to be they're almost equal.

barrkel · on Jan 19, 2015

He was talking about differences between point versions. 0.10 has dramatically different performance to 0.11. io.js is using 0.11, as is node:master, but older node was using 0.10.

I.e. the difference isn't necessarily node vs io, it's one point release of V8 to the next as used by node and io.

jacquesm · on Jan 19, 2015

Supremely bad title then.

ixtli · on Jan 19, 2015

Yes I think it needs to be noted that V8 in node 0.10 is very very far behind when you take into account how quick the pace of development is. I would be interested to see these comparisons with bleeding edge node vs stable node.

q_no · on Jan 19, 2015

Thanks for the details! It makes a lot of sense.

Now I wonder how node 0.11.x compares to iojs :)

explorigin · on Jan 19, 2015

There are two very good comments at the bottom of the article. Here for your consumption:

Author: (Unknown) 2015-01-19 12:54 UTC

io.js based on node v0.11, so you need compare

- v0.10 (nodejs)- v0.11 (nodejs) - v0.11 (nodejs)- v1.0 (iojs)

Author: Michael Schöbel 2015-01-19 13:01 UTC

I also downloaded sources and compiled the latest master branch of Node yesterday evening. Performance was within 2% of io.js for all three tests.

But most people won't compile themselves. Most will use the latest stable release.

longlivegnu · on Jan 19, 2015

>But most people won't compile themselves. Most will use the latest stable release

I mean there is a good reason that

cdnsteve · on Jan 19, 2015

Reporting benchmark results on a single OS, on a single CPU type isn't really benchmarking. It's an isolated case of results.

I'd recommend to perform an accurate suite of performance tests, use different OS (CoreOS, Ubuntu) that are actually used in server environments. Also different machine hardware will play a role.

There's not enough data at this point to come to any conclusion at this point imo.

This result set is like saying that 95% of the people on the web use the safari browser on the Apple website.

morenoh149 · on Jan 21, 2015

... in the apple store

Kiro · on Jan 19, 2015

> This can be extremely important if you have a project with heavy CPU-use

Would you recommend using something different than JavaScript when writing CPU heavy apps? I was under the impression that it's better suited when dealing with high I/O.

richmarr · on Jan 19, 2015

A few folks have replied with the usual "use C/C++/Java instead", but in the real world it's often either impractical (or rather commercially indefensible) to fork out a different environment with its own training, testing, environment, automation, documentation and maintenance overheads. A blanket rejection of Node for CPU-heavy tasks is naive.

On the issue of performance, V8 lets Javascript run pretty quickly. Yes, there are languages that broadly offer faster execution, but that's far from the only factor in choosing a solution.

The main issue from my perspective is that the event loop can easily get blocked by CPU-bound tasks, preventing it from doing other things, e.g. responding to HTTP requests. You hit a similar problem with a Java servlet runner, eg. if a couple of your threads are bogged down on CPU-heavy tasks then they can't be responding to requests.

My personal preference would be to split CPU-heavy operations out so that they happen elsewhere, regardless of language, e.g having large PDFs generated by an internal microservice rather than by the webserver, or maybe via a queue in some cases. But that's just a personal preference.

pixelglow · on Jan 19, 2015

It's relatively straightforward (but moderately involved) to split out CPU-heavy operations in node.js so they don't block the event loop. A rough sketch would look like this:

* Write the CPU-heavy code in C++ and bridge it back to node.js as an add-on: http://nodejs.org/api/addons.html

* node.js is built atop libuv, so use libuv's work queues to offload the CPU-heavy code to a worker thread: http://nikhilm.github.io/uvbook/threads.html

k__ · on Jan 19, 2015

Really?

Companies I worked for always had two environments, one for new features and one for performance.

Like PHP and C, new features where implemented in PHP and if they caught on, they got reimplemented in C if they needed better performance.

richmarr · on Jan 19, 2015

Sure, if the companies you've worked for are in a field that needs incremental performance gains and are willing to pay for it then that's totally rational.

Typically I see client-side performance concerns outweighing server-side performance in a ratio of 70/30 or so, with the remaining server-side performance biased towards I/O concerns like waiting for data, or file system reads with a ratio of 90/10 or more. That puts the actual saving available to language or algorithm changes in the app layer to be less than 3% for the kinds of apps I've worked on.

I usually work at companies who are starting out, looking for Product Market Fit, where those marginal gains aren't worth the cost of reimplementing.

k__ · on Jan 19, 2015

Fair enough. The companies I talked about didn't do this right from the start. Most of them after a few years.

morenoh149 · on Jan 21, 2015

sounds expensive $$$

aikah · on Jan 19, 2015

> can easily get blocked by CPU-bound tasks,

That's not a problem with node where you can easily implement a distributed queue system.The job queue processes will block but not your web server.

richmarr · on Jan 20, 2015

Yep, you could use a queue. On the plus side that isolates the work from the fragility of a Node process. On the downside it comes with a specialist infrastructure requirement, often complex configuration rules and can be awkward when you need to return the result in a single HTTP cycle.

gdrulia · on Jan 20, 2015

What about web workers? I believe the whole idea behind them was to allow to run heavy tasks in the background threads. Havent used them personally, but quick search indicates that they are available on node.js via npm [0]. Unless the whole concept is misunderstood by me and they would still be able to block your servers response to http requests.

[0] - https://www.npmjs.com/package/webworker-threads

richmarr · on Jan 20, 2015

Yep. Web workers, child processes, whatever works. My preference is to use microservices to keep things isolated. It comes with an overhead of a millisecond or two but for most purposes that's fine.

askmike · on Jan 19, 2015

If you decide to offload some CPU heavy parts I would personally use a messaging protocol like zeroMQ to keep a loose coupling.

dagw · on Jan 19, 2015

Sure if your project is almost exclusively CPU intensive then use something else. But sometimes you have a project that already is in JavaScript and where JavaScript mostly makes sense, but you have one or two task that are small but fairly CPU intensive. Then knowing how to write fast JavaScript is a pretty good idea since JavaScript can be pretty fast these days and the overhead (both in terms of runtime and developer time) of calling out to second language isn't always trivial and it's nice if you can avoid it.

Secondly (and not entirely relevant to this) people are doing more and more 'clever' CPU-intensive things client side these days where JavaScript is your only choice. So understanding the performance characteristics of different JavaScript implementations can definitely come in handy there.

mschoebel · on Jan 19, 2015

IMO if you really need to do heavy-duty processing, C/C++ will be fastest. On the other hand Node will make development easier.

You have to judge for yourself which will be a better use of resources. Having more costs upfront for programming, or more costs in the long-run for servers.

FooBarWidget · on Jan 19, 2015

Technically you can offload CPU-heavy things to a cluster of extra Node processes that handle work in a serial fashion. Nothing wrong with this; if you're comfortable with Javascript then this is probably better than rewriting those parts in different languages.

lsiebert · on Jan 19, 2015

You could, and that might get you the performance you need, but if not I would likely use https://github.com/node-ffi/node-ffi or maybe http://www.swig.org/ to call out to C++ or C code.

There's probably also a way to call Java, ( a quick search suggests https://github.com/joeferner/node-java perhaps) but I can't speak to that in particular.

bananaoomarang · on Jan 19, 2015

Generally speaking for CPU bound tasks you should use something like edge.js (http://tjanczuk.github.io/edge/#/) if you're using Node for dev speed/ease's sake.

romanovcode · on Jan 19, 2015

I would not recommend using Javascript at all when using CPU heavy apps. C++, Java, C# would be much better.

Kiro · on Jan 19, 2015

Sorry, that's what I meant. Updated my question. Thanks.

SixSigma · on Jan 19, 2015

On a slight tangent, there's an article using the Sieve of Eratosthenes demonstrating the use of Communicating Sequential Threads (CSP) on Russ Cox' website (one of the developers of Go)

http://swtch.com/~rsc/thread/

wolframhempel · on Jan 19, 2015

These are very interesting findings. On a higher level though: Are there any significant performance differences between the APIs of node and io? E.g. tcp package processing, file system access etc? I know that a lot of them are effectively C, so independent of the V8 version.

richmarr · on Jan 19, 2015

> Depending on what your Node application does, my > findings may or may not apply to your use-case.

I'm going to go out on a limb and say that the proportion of real world Node apps that will be noticably affected by this is less than 1%

daphneokeefe · on Jan 19, 2015

The competition between these teams is going to make both of them better. They will not only be competing on speed, but also on features. A huge win for developers.