Interesting results, thanks for sharing. I can perhaps shed some light on the performance differences.
> Buffer 4.259 5.006
In v0.10, buffers are sliced off from big chunks of pre-allocated memory. It makes allocating buffers a little cheaper but because each buffer maintains a back pointer to the backing memory, that memory isn't reclaimed until the last buffer is garbage collected.
Buffers in node.js v0.11 and io.js v1.x instead own their memory. It reduces peak memory (because memory is no longer allocated in big chunks) and removes a whole class of accidental memory leaks.
That said, the fact that it's sometimes slower is definitely something to look into.
> Typed-Array 4.944 11.555
Typed arrays in v0.10 are a homegrown and non-conforming implementation.
Node.js v0.11 and io.js v1.x use V8's native typed arrays, which are indeed slower at this point. I know the V8 people are working on them, it's probably just a matter of time - although more eyeballs certainly won't hurt.
I can explain what happened to Array case. 100000 used to be the threshold at which new Array(N) or arr.length = N started to return a dictionary backed array. Not anymore: this was changed by https://codereview.chromium.org/397593008 - now new Array(100001) returns fast elements array.
I will check out what happened to Buffer/TypedArray. Should not degrade that much unless something really goes south here.
The first major one is related to mortality of TypedArray's maps (aka hidden classes). When typed array stored in the Data variable is GCed and there are no other Uint8Array in the heap then its hidden class is GCed too. This also causes GC to find and discard all optimized code that is specialized for Uint8Array's and clear all type feedback related to Uint8Array's from inline caches. When we later come and reoptimize - optimizing compiler thinks that cleared type feedback means we need to emit a generic access through the IC (there is reasoning behind that) because this is potentially going to be a polymorphic access anyways. I have filed the issue[1] for the root cause (mortality of typed array's hidden class).
Now there is a second much smaller issue (which also explains performance of the Buffer case) - apparently there were some changes in the optimization thresholds and OSR heuristics. After these changes we hit OSR at a different moment: e.g. I can see that we hit inner loop one that loops over `j` instead of hitting outer loop which leads to better code. In V8 OSR is implemented in a way that tries to produce optimized code that is both suitable for OSR and as a normal function code - this is done by adding a special OSR entry block that jumps to the preheader of the selected loop we are targeting with the OSR. This allows V8 to reuse the same optimized code without optimizing it again for the normal entry - but this also leads to code quality issues if OSR does not hit the outer loop because OSR entry block inhibits code motion. This is a know problem and there are plans to fix it. The hit usually is quite small unless you have very tight nested loops (like in this case).
Disabling OSR (--nouse-osr) not only "solves" the second issue but it also partially fixes (hides)the first issue: 1) we no longer optimize with partial type feedback - so we never emit generic keyed access but always specialize it for the typed array 2) we no longer emit OSR entry - hence no code quality issues related to it.
Very interesting. After reading your comment, I tried allocating another Uint8Array and keeping it allocated throughout the entire test as a workaround for the issue you mentioned. Time for Node.js was unchanged, but io.js was down to about 5.5s now. Almost the same time as Node. Only about 10% slower.
The same happens when I use the --nouse-osr parameter that you mentioned.
Ok, I hadn't tested with both before. Keeping the array alive and using --nouse-osr makes io.js only 2.3% slower than my original measurement for Node 0.10.35. Median of 5058ms.
And Node 0.10.35 shows basically the same results as before. I see less than 1% difference. Maybe just random fluctuation. Even if not. 1% is irrelevant.
I posted this reply on your site, but I will duplicate it here for the sake of HN readers:
> BTW --nouse-osr makes all three tests run faster.
As I tried to explain above: OSR at it is implemented now impacts code quality depending on which loop OSR hits. Which in turn depends on heuristics that V8 uses. These heuristics are slightly different in newer V8. As a result of these changes V8 hits inner loop instead of outer loop. This leads to worse code.
Code that benefits from OSR is the code that contains a loop which a) can be well optimized b) runs long b) is run only few times in total. The Sieve benchmark is opposite of this and as a result it doesn't benefit from OSR - you get bigger penalty from producing worse code and no benefit from optimizing slightly earlier.
Not using OSR for Sieve also hides the other issue with mortality of typed array's hidden classes. I say "hides" not "fixes" because one can easily construct a benchmark where the mortality would still be an observable performance issue even if benchmark itself is run without an OSR: https://gist.github.com/mraleph/2942a14ef2a480e2a7a9
Does the dramatic speed difference between the "non-conforming" implementation and V8 mean that current Node typed arrays are not memory-safe and you may get C-style buffer overflow vulnerabilities when using them?
I thought about that. But it would have diminished the point I was trying to make: Always test with different versions as performance may differ by a LOT.
He was talking about differences between point versions. 0.10 has dramatically different performance to 0.11. io.js is using 0.11, as is node:master, but older node was using 0.10.
I.e. the difference isn't necessarily node vs io, it's one point release of V8 to the next as used by node and io.
Yes I think it needs to be noted that V8 in node 0.10 is very very far behind when you take into account how quick the pace of development is. I would be interested to see these comparisons with bleeding edge node vs stable node.
Reporting benchmark results on a single OS, on a single CPU type isn't really benchmarking. It's an isolated case of results.
I'd recommend to perform an accurate suite of performance tests, use different OS (CoreOS, Ubuntu) that are actually used in server environments. Also different machine hardware will play a role.
There's not enough data at this point to come to any conclusion at this point imo.
This result set is like saying that 95% of the people on the web use the safari browser on the Apple website.
> This can be extremely important if you have a project with heavy CPU-use
Would you recommend using something different than JavaScript when writing CPU heavy apps? I was under the impression that it's better suited when dealing with high I/O.
A few folks have replied with the usual "use C/C++/Java instead", but in the real world it's often either impractical (or rather commercially indefensible) to fork out a different environment with its own training, testing, environment, automation, documentation and maintenance overheads. A blanket rejection of Node for CPU-heavy tasks is naive.
On the issue of performance, V8 lets Javascript run pretty quickly. Yes, there are languages that broadly offer faster execution, but that's far from the only factor in choosing a solution.
The main issue from my perspective is that the event loop can easily get blocked by CPU-bound tasks, preventing it from doing other things, e.g. responding to HTTP requests. You hit a similar problem with a Java servlet runner, eg. if a couple of your threads are bogged down on CPU-heavy tasks then they can't be responding to requests.
My personal preference would be to split CPU-heavy operations out so that they happen elsewhere, regardless of language, e.g having large PDFs generated by an internal microservice rather than by the webserver, or maybe via a queue in some cases. But that's just a personal preference.
It's relatively straightforward (but moderately involved) to split out CPU-heavy operations in node.js so they don't block the event loop. A rough sketch would look like this:
Sure, if the companies you've worked for are in a field that needs incremental performance gains and are willing to pay for it then that's totally rational.
Typically I see client-side performance concerns outweighing server-side performance in a ratio of 70/30 or so, with the remaining server-side performance biased towards I/O concerns like waiting for data, or file system reads with a ratio of 90/10 or more. That puts the actual saving available to language or algorithm changes in the app layer to be less than 3% for the kinds of apps I've worked on.
I usually work at companies who are starting out, looking for Product Market Fit, where those marginal gains aren't worth the cost of reimplementing.
Yep, you could use a queue. On the plus side that isolates the work from the fragility of a Node process. On the downside it comes with a specialist infrastructure requirement, often complex configuration rules and can be awkward when you need to return the result in a single HTTP cycle.
What about web workers? I believe the whole idea behind them was to allow to run heavy tasks in the background threads. Havent used them personally, but quick search indicates that they are available on node.js via npm [0]. Unless the whole concept is misunderstood by me and they would still be able to block your servers response to http requests.
Yep. Web workers, child processes, whatever works. My preference is to use microservices to keep things isolated. It comes with an overhead of a millisecond or two but for most purposes that's fine.
Sure if your project is almost exclusively CPU intensive then use something else. But sometimes you have a project that already is in JavaScript and where JavaScript mostly makes sense, but you have one or two task that are small but fairly CPU intensive. Then knowing how to write fast JavaScript is a pretty good idea since JavaScript can be pretty fast these days and the overhead (both in terms of runtime and developer time) of calling out to second language isn't always trivial and it's nice if you can avoid it.
Secondly (and not entirely relevant to this) people are doing more and more 'clever' CPU-intensive things client side these days where JavaScript is your only choice. So understanding the performance characteristics of different JavaScript implementations can definitely come in handy there.
IMO if you really need to do heavy-duty processing, C/C++ will be fastest. On the other hand Node will make development easier.
You have to judge for yourself which will be a better use of resources. Having more costs upfront for programming, or more costs in the long-run for servers.
Technically you can offload CPU-heavy things to a cluster of extra Node processes that handle work in a serial fashion. Nothing wrong with this; if you're comfortable with Javascript then this is probably better than rewriting those parts in different languages.
There's probably also a way to call Java, ( a quick search suggests https://github.com/joeferner/node-java perhaps) but I can't speak to that in particular.
Generally speaking for CPU bound tasks you should use something like edge.js (http://tjanczuk.github.io/edge/#/) if you're using Node for dev speed/ease's sake.
On a slight tangent, there's an article using the Sieve of Eratosthenes demonstrating the use of Communicating Sequential Threads (CSP) on Russ Cox' website (one of the developers of Go)
These are very interesting findings. On a higher level though: Are there any significant performance differences between the APIs of node and io? E.g. tcp package processing, file system access etc? I know that a lot of them are effectively C, so independent of the V8 version.
The competition between these teams is going to make both of them better. They will not only be competing on speed, but also on features. A huge win for developers.
> Buffer 4.259 5.006
In v0.10, buffers are sliced off from big chunks of pre-allocated memory. It makes allocating buffers a little cheaper but because each buffer maintains a back pointer to the backing memory, that memory isn't reclaimed until the last buffer is garbage collected.
Buffers in node.js v0.11 and io.js v1.x instead own their memory. It reduces peak memory (because memory is no longer allocated in big chunks) and removes a whole class of accidental memory leaks.
That said, the fact that it's sometimes slower is definitely something to look into.
> Typed-Array 4.944 11.555
Typed arrays in v0.10 are a homegrown and non-conforming implementation.
Node.js v0.11 and io.js v1.x use V8's native typed arrays, which are indeed slower at this point. I know the V8 people are working on them, it's probably just a matter of time - although more eyeballs certainly won't hurt.
> Regular Array 40.416 7.359
Full credit goes to the V8 team for that one. :-)