Performance comparison of Functional C++ 11, Java 8, JavaScript and Wolfram

dxbydt · on June 6, 2014

It is rather sad that none of the 49 comments (at this time of posting) have made a mention of the underlying math problem, which is super interesting, and instead focus on dubious speed metrics (dubious in the sense these metrics change with the chip/cache/RAM/compiler/lang/coding-style & aren't that relevant anyways, compared to the underlying problem)

Lemme rephrase this rather interesting problem: There are 2055 startups, 2006 bigcorps & 2017 zombies. If the startup gets bought out by the bigcorp, the bigcorp has wasted its well earned money and soon becomes a zombie. If the zombie folks instead join hands with the startup, they suddenly make pots of money & the startup become a bigcorp. Finally, if the bigcorp uses its cash prudently and buys the zombie, it starts innovating & becomes a startup.

So the claim is that if you let this economy play out in all its glory, there will be 1.448 billion buyouts. To arrive at this giant figure of 1,448,575,636 takes anywhere between 335 seconds for the C++ hacker to 7000 seconds for the JS guys.

Now give it your best shot!

curveship · on June 6, 2014

Let s = startups, b = bigcorps, z = zombies

1) In any of the transformations [(-,-,+),(-,+,-),(+,-,-)], the delta b/w numbers (s-b, s-z, b-z) changes by 0 or 2. What this tells us is that if the delta is odd to start, it will always be odd, if even always even.

2) Since the final state will involve only one type left standing (or else the stronger could eat the weaker), we know that the final counts will be two at 0, one at something >0. Since our losers end with an even delta (0), they must always have had an even delta. In the scenario you proposed, that's s and z (2055 - 2017 = 38). So b, bigcorps, is going to be our winner.

3) Now that we know who will win, we need to find the maximum possible number of bs remaining. Since each move reduces the total population by one, the maximum bs will be produced by the shortest sequence of moves that gets rid of all zs and ss.

4) As long as z>0 and s>0, our best move is to have z eat s, as it drops the population of z and s by 2. So our first sequence is to have all zs eat ss, leaving us with s = 38, b = 4023, z = 0.

5) When z = 0, our only possible move is b eats s, creating a z, which can then be eaten by an s to produce a b, thereby restoring b to its original count and lowering s by 2. After repeating this sequence 19 times to get rid of the 38 ss, we'll have s = 0, b = 4023, z = 0.

Ergo, maximal final state is 4023 bigcorps.

chapel · on June 6, 2014

The code is very inefficient to say the least.

I made some modifications (diverging further from the "functional" nature, if you could call it that) and as you can see it is much faster[0].

Output:

  $ node new-magicForest.js 2017 2055 2006
  total forests: 6128
  { goats: 0, wolves: 0, lions: 4023 }
  total time: 20ms

[0] https://gist.github.com/chapel/1c038b2bf64b3037aaea

skratky · on June 6, 2014

Your code contains a bug in the function getForestKey. See gist.

HardyLeung · on June 6, 2014

This is indeed an interesting problem. I don't know what the runtime breakdown is, most likely in the sorting routine, but if that's the case the problem can be solved much faster but not resorting to generic stable_sort().

I summarize the author's algorithm as follows: take an array of R^3 vectors, and for each element in the vector, perform the [+,-,-], [-,+,-], and [-,-,+] transform. So N numbers become 3N numbers, which is followed by a duplicate removal process faciliated by the sorting.

The improvement is that the duplicate removal can be accomplished without sorting.

Let's say you have an array of R^3 already sorted. Then say you create 3 arrays each of which is created by "shifting" the array in each of the 3 directions. Call the original A, and the derived B0, B1, B2. Note that B0, B1, and B2 are each sorted since element within each array is adjusted in the same direction. Then all you need to do is to do a 3-way merge, which takes linear time.

I bet by doing this, the runtime could be significantly reduced. My guess is that it runs in the range of 10-30 seconds (instead of 335 seconds).

aromogato · on June 6, 2014

It is a really cool math problem, and we should be using math to solve it! This problem took me 30 minutes to solve without programming and so I'm only a few minutes off Fortran. (I scale well too since I can now do it in constant time - fine log time if we count reading the input.)

A minor nit to pick: the 1,448,575,636 number is not the number of buyouts. (It looks like the number of nodes in the evaluation graph.)

Spoiler: the answer is 111110110111_2. (edit: had the wrong answer)

mobocat · on June 6, 2014

But C++11 solution is not functional. It uses state variables: look at the while loop, it updates variable. Look at the next_forests.reserve, std::back_inserter. These are not clear functional constructs. Of course it is possible to model memory state with the monads :) but... The C++ code is quite different from other codes. So i was surprised by Java which is only 3 times slower than C++ while running a much less effective code

pkolaczk · on June 6, 2014

Glad someone pointed that out. It is cheating. It is doing a lot of state transformations like in-place sorting / filtering. I'd like to see some real C++ functional code here, using immutable data structures, not an imperative program using lambdas. I guess it would be both much harder to write (C++ is not really a functional language) and much slower, because dynamic memory allocation in C++ is more costly than in Java.

taeric · on June 6, 2014

Isn't this just evidence of how broad of a definition "functional" programming is? It is the scottsman of programming debates. (Though, I suspect any "paradigm" example will fall into this trap.)

zik · on June 6, 2014

It's pretty astonishing how easily C++ trounces everything else including Java. I guess there's still something to be said for compiling directly to optimised binaries.

mtdewcmu · on June 6, 2014

Supposedly, Java is going to be faster than native code any day now. It's been said for years. The case was somewhat credible at one time, because the opportunity exists to optimize using runtime information. I think the reason it didn't go that way is:

1. CPUs have gotten very good at doing runtime optimization kinds of things on their own, like predicting branches and reducing the cost of virtual function calls. 2. Java only does optimizations that can be done quickly, since the optimizer has to compete with the executing program itself. 3. The claim was overblown to begin with, and Java is trying to do too many other things, like be secure, that interfere with performance.

pling · on June 6, 2014

People don't use Java because it's fast.

They use Java because it's fast enough, easy, safe, reliable, easy to instrument, easy to debug, has wonderful tool support, is mature, has many tested and heavily used frameworks, has a free chunk of code to do anything you can imagine you can just snag via maven, integrates with everything, works on all major platforms and costs nothing.

That makes it fast in other ways.

brazzy · on June 6, 2014

No, what happened is mainly that memory access has become a big bottleneck for most programs, and small footprint and cache-friendliness can easily mean a speed difference of 20x. For a number of reasons (such as Object overhead, UTF-16 strings, lack of true object arrays), it is very hard to write small-footprint cache-friendly code in Java.

xxs · on June 6, 2014

The mistake is using small objects since Java doesn't have structs and cannot really peel objects under normal circumstances. If I had to write Java code for benchmark I'd just use a long array and represent the state in a long (not a java.lang.Long)

Writing really cache friendly code in Java requires to look at the problem orthogonally and use int[]/long[] instead of a small Object with 3 small ints.

The Java version simply features some quite horrific code but can easily be run in parallel - no one mentions that. Using streams in pure functional way w/o the parallel in mind is a ritual suicide.

Finally, The CPU running benchmark has only 2 cores which is a disadvantage with tons of allocation and garbage.

anon4 · on June 6, 2014

Object inlining might be the next big thing for the JVM. It can already inline functions, but it can't inline member objects in containing objects, or convert your Integer members to int members, or convert an array of Objects into an array of structs.

Converting Object[] arrays into single chunks of packed members when they're all the same type would be a pretty big win by itself.

alayne · on June 6, 2014

Java's performance relative to other commonly used languages is very good. It's basically been at 2-3x slower than C/C++ for quite a while.

cbsmith · on June 6, 2014

2-3x slower? I guess it depends on what you're doing. For a lot of code I've dealt with the differences in performance have been factional.

coldtea · on June 6, 2014

This doesn't make much sense, unless:

1) You wrote that code in BOTH Java and C++, optimized both, and compared their running times.

2) Said code was CPU bound. For heavy IO bound code you could even use TCL and get "fractional" differences.

mtdewcmu · on June 6, 2014

Only computation-heavy code will show a difference. And the disparity is going to vary a lot. Some code will be easy for Java's optimizer and won't give C++ any advantage.

cbsmith · on June 6, 2014

Some will even put C++ at a disadvantage.

hatred · on June 6, 2014

Out of curiosity any such examples ?

cbsmith · on June 6, 2014

Of course there are, and it only takes a few seconds with Google to find them. Here's one: the frannkuch-redux benchmark at http://benchmarksgame.alioth.debian.org/u64q/java.php gives Java a >20% CPU efficiency advantage over C++.

igouy · on June 12, 2014

That compares a Java program written to use multiple cores with a C++ program not written to use multiple cores; because the C++ programs written to use multiple cores did not compile.

See http://benchmarksgame.alioth.debian.org/u64q/performance.php...

mtdewcmu · on June 6, 2014

It is very good. But it's not quite fair to compare it to, say ruby or PHP, because it still requires a compilation step, and it's still pretty demanding on the programmer.

tagrun · on June 6, 2014

I can't see it happening even in a far future except maybe for some artificial microbenchmarks. Java programs use a lot of memory, and since everything is allocated on the heap garbage collection is an issue, impacting real-world programs.

In terms of raw CPU speed, access to vector instructions directly can be game changer in various applications. Also C/C++ compilers are also getting better each day.

rdtsc · on June 6, 2014

> Java programs use a lot of memory, and since everything is allocated on the heap garbage collection is an issue, impacting real-world programs.

That is a very good observation. Looking at back in the past at some point memory speed wasn't that much slower than CPU speed (rather because CPU speeds were not that fast then).

So then throwing more memory at something seems like a very good way improve performance. Like say following long chains of pointers through some nested structure or long linked list was ok. At some point CPU speed went through the roof and left memory access speeds behinds.

So then caches became very important. Cache aware programming was a "thing".

That, coupled with lots having virtualized/cloud machines everywhere that have limited memory kind of turned that initial thinking on its head. Small memory footprint became a desirable trait. Just like in the old MS-DOS & Turbo Pascal days.

Java sort of flourished and grew in that time period where "throw more ram at it to get performance" was a very obvious thing to do. Now I think it is less obvious that is the best way.

(And perhaps Java's current or future GC and JIT strategies will start to take into account caches and memory frugality better).

Now that said I am still amazed at how it can achieve such great performance given all the stuff it does behind the scenes. It is not faster than C but heck, it is very fast still.

wbl · on June 6, 2014

Java still lets you make a giant array and operate on that. For high speed numerics programming you do that in every language: FORTRAN, C, Java, etc. At that point comes down to how much information the compiler and you can share: restricted pointers, use assembler kernels, etc.

gsnedders · on June 6, 2014

Also profile-guided optimization is ever more widespread with C/C++ compilers, which certainly wins some of the gains you'd normally only expect from the JITing VM.

pkolaczk · on June 6, 2014

C++ PGO can do less than JVM JIT does. E.g. JVM JIT takes into account actual configuration options chosen by the user and actual, not predicted workload. This can help e.g. with inlining, because JIT can see only one particular implementation of something was loaded by the user, and optimize for just that case. C++ PGO profiles for a single workload and config chosen by the program creator, which may or may not match what user is doing.

CountSessine · on June 6, 2014

In practice, PGO with C/C++ gets most of the really valuable low-hanging fruit, provided the test cases aren't too far off from a typical workload.

Also, the JVM JIT needs a steady-state workload to make good guesses about it's optimizations - if it JITs a bunch of methods and then the load changes such that the optimizations no longer apply (loops iterate differently, inlined methods don't get called as much), the JVM's runtime optimization can be foiled. JITting a very heterogenous program can trick the JVM into finding a local-minimum in native performance (which I guess is still a lot better than actually interpreting java byte code).

AnimalMuppet · on June 6, 2014

Maybe 5 years ago, I wrote a mancala game in C++ and Java. The minmax part was maybe 10% slower in Java.

mtdewcmu · on June 6, 2014

The type of code makes a big difference in how it will perform relative to C++. You also have to be somewhat expert in each language. You can write arbitrarily slow code in any language. It's not easy to be sure that you're giving each language its best shot.

Afforess · on June 6, 2014

I have never seen Sun or Oracle make that claim. You seem to simply be ranting on about a strawman argument, with a rather strange java-hate obsession.

I mean, Fortran did worse than Java. Where is your writeup for that?

georgemcbay · on June 6, 2014

There was certainly a lot of talk within the Java development community about how Java was going to meet or overtake native code as the HotSpot VM matured. See, for example (from 1998):

http://www.artima.com/designtechniques/hotspotP.html

"According to Sun Microsystems, the Hotspot virtual machine, Sun's next-generation Java virtual machine (JVM), promises to make Java "as fast as C++.""

kbenson · on June 6, 2014

I wonder how well modern JVMs compare to C++ compilers from 1998?

cbsmith · on June 6, 2014

I'd say that statements like that are subject to some degree of interpretation. It's hard for one runtime to be definitively faster than another runtime. It is, however, quite possible for one runtime to have cases where it is better, cases where it is worse, and cases where it is equivalent such that it is reasonable to say that it is "as fast as" the other. Java tends to be a bit slower than C++ still, but the differences between it and "as fast as" are trivial enough that a number of HFT systems, for example, are written in Java.

kbenson · on June 6, 2014

While I think the comment here leaves that open as a possibility, the second sentence of the article it's from makes it pretty clear. "Specifically, Sun says that a platform-independent Java program delivered as bytecodes in class files will run on Hotspot at speeds on par with an equivalent C++ program compiled to a native executable."

There's not a lot of wiggle room there.

jdmichal · on June 6, 2014

Considering that I can get different performance for the same C++ program simply by using different compiler, or even different compiler options, I'd say that there's a lot of wiggle room.

cbsmith · on June 6, 2014

Actually, there is a country mile of wiggle room there. It doesn't have to be true of all programs for one, and "on par with" gives you plenty of wiggle room.

mtdewcmu · on June 6, 2014

There's nothing wrong with trying to sell your product. But I hope Sun's salesmanship doesn't continue to deceive people.

kmavm · on June 6, 2014

Cliff Click, a smart man in whom I see many admirable qualities, than whom there are few more complete experts on the JVM, gave a tech talk at Facebook about how Java is faster than C++ just a few months ago. Tons of perfectly smart engineers sat and took it seriously. Much of the content was a rehash of this older discussion: http://www.azulsystems.com/blog/cliff/2009-09-06-java-vs-c-p...

There are plenty of serious, technically deep people trying to convince their colleagues that Java is as fast as C.

frik · on June 6, 2014

I know a professor who worked on HotSpot Java VM in early 2000s. He is a smart man, but he always makes these false claims how Java will be as fast as C++ the next years. He was the boss of a local Oracle Labs company, though he left that company recently and is now an C# advocate and make similar claims in favor of MS.

Such persons are quite annoying as they try to influence a lot of students.

puppetmaster3 · on June 6, 2014

Agree. Some kind of group hope, a part of issue w/ the Java lang. Data points in the other direction: http://benchmarksgame.alioth.debian.org/u32q/java.php

igouy · on June 12, 2014

"Sieve of Erathosthenes"

mtdewcmu · on June 6, 2014

Actually, I made fun of Fortran for that down below. I don't think my comment is a rant.

cbsmith · on June 6, 2014

> Supposedly, Java is going to be faster than native code any day now.

You'll note that the FORTRAN native code is indeed beaten by Java.

mtdewcmu · on June 6, 2014

It's worth remembering that native code isn't fast automatically. An unskilled programmer, for instance, could easily write C++ that's slower than Java. There are a few possible reasons Fortran wasn't faster:

1. Fortran's compilers haven't advanced at the same rate as other languages due to its lack of popularity.

2. Fortran is inherently harder to optimize.

3. People have forgotten how to write fast Fortran, since no one uses it anymore.

I don't know which it might be.

tbirdz · on June 6, 2014

Fortran is actually easier to optimize than C or C++, as procedure arguments and variables cannot alias each other. Since C99, you can use the restrict keyword to enforce aliasing rules, but many C programmers don't do this, and the restrict keyword isn't even in C++.

praseodym · on June 6, 2014

But then again, things like the restrict keyword don't matter that much, except when using specialised compilers (e.g. for DSP chips).

cbsmith · on June 6, 2014

Um no. It's a huge deal for heavy duty computational code, so much so that most performance oriented numeric libraries have extra code to workaround the case where restrict is not available..

cbsmith · on June 6, 2014

Whatever the claims about Fortran, I would say that 3) is definitely not applicable here (two implementations provided, 2) is laughable (some people used to argue Fortran was the easiest to optimize) and 1) is true, but not in a way that would have a significant impact on this benchmark. Fortran compilers have been optimized over the years pretty extensively, and really of late the only reason to have a Fortran compiler is for efficient numerical computation, so that is something Fortran compiler writers have focused on.

aeroevan · on June 6, 2014

This was probably more of a case of 3.

Fortran compilers have been making quite some progress (particularly ifort, but gfortran isn't bad recently).

And due to the way multidimentional arrays are built into the language, aliasing isn't nearly as much of an issue for aggressive optimizations.

xxs · on June 6, 2014

The main issue is that C version is hardly functional programming: stuff like array<int,3> for forest and then counting instead of massive branching like java's isStable(); Reserving the entire next_forests like that:

  next_forests.reserve(forests.size() * possible_meals.size());

helps quite a lot compared to the small header-full instances Java features.

hiker · on June 6, 2014

I don't see how the use of std::array<int,3> and std::count makes the C++ version less functional. Also, the next_forests.reserve doesn't help as much as you state, it's barely measurable when commenting out from my experiments (try it out!).

pkolaczk · on June 6, 2014

C++ version is not "less functional". It is not functional at all. To make it functional you'd have to use immutable state.

wanderingmarker · on June 6, 2014

Having dealt with trying to make Java run within some reasonable margin of C++11, I'm not surprised at these results at all. It's really hard to make Java go that fast.

I'm instead shocked that the C++ version beat Fortran.

humanrebar · on June 6, 2014

The Intel and GNU C++ compilers usually produce even faster code than the compiler used (Clang). Also, if it's business critical, you can do heavy calculations through GPUs or other custom hardware.

SoapSeller · on June 6, 2014

Not for this example. On E5-2680 v2(Ivy Bridge):

  ./magic_forest_gcc 617 655 606  7.22s user 0.36s system 100% cpu 7.580 total
  ./magic_forest_icc 617 655 606  7.00s user 0.34s system 100% cpu 7.340 total
  ./magic_forest_clang 617 655 606  5.73s user 0.25s system 100% cpu 5.980 total

  gcc version 4.8.2
  icc version 14.0.1
  clang version 3.4
  (On 3.14 powered arch linux)

It's might be interesting to see what kind of optimisation clang does, but I currently don't have the time to look into it.

praseodym · on June 6, 2014

And then there's the question of what compiler flags you used. Furthermore, one of the compilers could be really good at profile-guided optimisation (PGO), beating the others.

SoapSeller · on June 6, 2014

I've used the same flags like the OP have suggested in the source header comment.

berkut · on June 6, 2014

How big are the binaries? ICC inlines a lot more heavily, it's possible (I've seen it before) the code size has bloated, not fitting into the IC...

Having said that, most of ICC's general gains over GCC and LLVM are due to the much faster maths libraries (even more so on Linux), and that doesn't look like that would be useful for these tests...

SoapSeller · on June 6, 2014

Yep, ICC's output is much bigger:

  clang 29K
  gcc 36K
  icc 89K

Yorrrick · on June 6, 2014

Compilation method isn't what's going on here. Static type systems are, as explained in the article.

en4bz · on June 6, 2014

One thing that I've yet to see mentioned is inlining. Almost all the functions/lambdas passed to STL functions in the code are likely inlined. This gives a huge performance boost and is one of the few cases where C++ is often faster than even C as calls through function pointers can be eliminated.

anon4 · on June 6, 2014

The JVM should, after enough iterations, inline all those calls as well.

xxs · on June 6, 2014

Few major points:

Using sort to filter duplicates is horribly worse compared to hashing (javascript for instance). Java version has rather poor impl, it's interesting to see the GC+allocation cost and the GC type used. C++ version does not use really use func. prog in find_stable_forests and meal...

panic · on June 6, 2014

Using sort to filter duplicates is horribly worse compared to hashing (javascript for instance).

Is this really true? A sort plus a linear scan has very low constant time factors, good cache locality (depending on the sorting algorithm used), and no need to allocate if you're sorting in place. I've seen good results using sort to filter duplicates in my own performance-sensitive code. Are you saying this technique is "horribly worse" based on your experience or intuition?

FreezerburnV · on June 6, 2014

Just to add to the discussion: you're likely correct, based on this SO question:

http://stackoverflow.com/questions/11227809/why-is-processin...

It will likely get faster the more duplicates you have, as it allows the CPU to predict a branch more reliably several times before being wrong and having to re-do some of its prediction stuff. (note that I'm not experienced with optimizing stuff, dealing with cache optimizations, etc., but I have read a decent amount, and I did take a class about CPU architecture) I would also suspect that hashing might end up messing with memory all over the place, causing the cache to be partly useless.

ygra · on June 6, 2014

That question (and answer) is probably not relevant here. The important thing is likely cache locality, not branch prediction (as the test case is very different in that question, amplifying branch misprediction issues, which won't be the issue here).

chapel · on June 6, 2014

In the case of the JS solution, it runs horribly and is highly inefficient.[0]

For non-hot code, it is perfectly fine, but in this case, inside of a main loop like this it is a waste of cpu cyles. Compare the amount of forests here:

   $ node orig-magicForest.js 117 155 106
   total forests: 1522899
   { goats: 0, wolves: 0, lions: 223 }
   total time: 816ms

   $ node new-magicForest.js 117 155 106
   total forests: 428
   { goats: 0, wolves: 0, lions: 223 }
   total time: 6ms

[0] https://gist.github.com/chapel/1c038b2bf64b3037aaea

defg · on June 6, 2014

I was curious how a non-functional version would fare, so I wrote one in Nimrod and it's a lot faster than the functional C++: https://gist.github.com/def-/8187448ea7a5c8da8265

  Goats Wolves Lions    C++11  Nimrod
     17     55     6     0.00    0.00
    117    155   106     0.17    0.01
    217    255   206     0.75    0.01
    317    355   306     2.16    0.01
    417    455   406     5.28    0.01
    517    555   506    10.75    0.01
    617    655   606    19.15    0.02
    717    755   706    31.58    0.02
    817    855   806    46.52    0.02
    917    955   906    67.94    0.02
   1017   1055  1006    93.75    0.02
   2017   2055  2006   731.42    0.04

defg · on June 6, 2014

As some people noticed this was broken. Fixed version is not quite as fast, but still faster than functional C++:

  Goats Wolves Lions    C++11  Nimrod
     17     55     6     0.00    0.03
    117    155   106     0.17    0.13
    217    255   206     0.75    0.62
    317    355   306     2.16    1.89
    417    455   406     5.28    4.34
    517    555   506    10.75    8.42
    617    655   606    19.15   14.45
    717    755   706    31.58   23.04
    817    855   806    46.52   33.69
    917    955   906    67.94   48.57
   1017   1055  1006    93.75   65.25
   2017   2055  2006   731.42  500.95

defg · on June 6, 2014

And a functional version in Nimrod: https://gist.github.com/def-/ccef8bb54170b639c497

qznc · on June 6, 2014

The important word here is "functional" C++. This is not about "fastest" C++. What does your comparison tell us?

taeric · on June 6, 2014

That functional might not be the best damned paradigm on the planet for every use case? :)

mimog · on June 6, 2014

It tells us that nimrod is pretty fast.

platz · on June 6, 2014

The term "Functional" here is used extremely liberally. In all 3 cases.

taliesinb · on June 6, 2014

Some of the WL code leaves a little to be desired. Here's a more idiomatic and readable Meal:

   Meal[forests_] := 
     Outer[Plus, forests, Permutations[{1, -1, -1}], 1] // Catenate // 
       DeleteDuplicates // Select[AllTrue[NonNegative]];

bottled_poe · on June 6, 2014

A message to the blog owner: get rid of that swipe to go forward/backward between posts when viewing on mobile. It looks good, but it is functionally frustrating. Why would a user expect different swipe behaviour in one direction to another?

eshyong · on June 6, 2014

I've seen this on other blogs too hosted by Blogger/Blogspot. I'm pretty sure it's their doing, not the blog owner himself.

speeder · on June 6, 2014

This... I got kicked out of the post 4 times when scrolling. The first time I did not even knew why, I thought it was a stupid JavaScript redirect.

twfarland · on June 6, 2014

I had a quick go at this with Racket: https://gist.github.com/twfarland/a9d8ce9eff22b39d3136

I'm not sure if I got the problem right, because it solves the hardest case almost instantly (2006 lions, 2055 wolves, 2017 goats -> 4023 lions), in 0.8s on my macbook air.

I used a general search algo that I've also used in the past for the missionaries and cannibals and snake cube puzzles.

It uses a set to store the past states seen, instead of deduping a list.

Dn_Ab · on June 6, 2014

Here is my natural (without any extra effort thinking about speed) F# solution: (for calibration I have the C++, unchanged, times)

   | 217 | 255 | 206 | 0.5 s (C++ 1.6s)
   | 317 | 355 | 306 | 1 s   (C++ 4.8s)
   | 617 | 655 | 606 | 3.5 s (C++ 35s)
   | 917 | 955 | 906 | 10s   (C++ 117.5)

code:

    let actions = [|[|-1; -1; 1|]; [|-1;1;-1|]; [|1; -1;-1|]|]

    let stable [| g ; w; l|] = (g = 0 && (w = 0 || l = 0)) || (w = 0 && l = 0)

    let isSound = function | [| x ; _; _|] | [|_; x; _|] | [|_; _; x|] when x < 0 -> false | _ -> true

    let stateChange state =  Array.map (Array.map2 (+) state) actions 

    let deduplicate sequence = sequence |> Seq.groupBy hash |> Seq.map (snd >> Seq.head)

    let forest start =
      let rec search curforest =  
            let nextforest = Array.collect stateChange curforest
                                      |> Array.filter isSound 
                                      |> deduplicate 
                                      |> Seq.toArray  
                                
            if nextforest.Length = 0 then curforest 
            else match Array.tryFind stable nextforest with 
                    | Some _ -> nextforest
                    | None -> search nextforest

      search (stateChange start) |> Array.filter stable

frik · on June 6, 2014

It would be interesting how LuaJIT, PyPy and HHVM (PHP JIT) score against JS v8 and Java 8 on that test environment.

Also, the clang C++ compiler is (a lot) slower than gnu C++ compiler. (we had a benchmark on HN that showed that clang C++ is 5% slower and asm.js is 10% slower than gnu gpp) The comparision should be executed on Linux or Windows as OS X is known for shipping with older versions of Unix tools/applications.

dschiptsov · on June 6, 2014

SBCL or at least Haskell by any chance? And then in terms of lines of code.)

hiker · on June 6, 2014

Here's a Haskell translation of the C++ version. Runtume is around 10x of the original (probably vector vs list cache trashing and vector sort/unique vs list to set/set to list), code reduction is 3x.

  import qualified Data.Set as S                                                                                                                                                
  
  data Forest = F Int Int Int
    deriving (Eq, Ord, Show)
  
  meal forests = (S.toList . S.fromList)
    [nextForest |
     forest <- forests,
     meal <- possibleMeals,
     let nextForest = forest <+> meal,
     valid nextForest]
   where
    possibleMeals = [
      F (-1) (-1)   1,
      F (-1)   1  (-1),
      F   1  (-1) (-1)]
    F x y z <+> F x' y' z' = F (x+x') (y+y') (z+z')
    valid (F x y z) = x >= 0 && y >= 0 && z >= 0
  
  findStable forest = iter [forest]
   where
    iter forests | not (done forests) = iter (meal forests)
                 | otherwise          = filter stable forests
    done forests = null forests || any stable forests
    stable (F _ 0 0) = True
    stable (F 0 _ 0) = True
    stable (F 0 0 _) = True
    stable _         = False
  
  main = print $ findStable (F 117 155 106)

SNvD7vEJ · on June 6, 2014

Strangely, if the hashCode() method in the Java version is replaced with one generated by Eclipse, the java program runs much slower.

In fact, the execution time is more than doubled.

Why is this so?

The Eclipse generated version of hashCode():

		@Override
		public int hashCode() {
			final int prime = 31;
			int result = 1;
			result = prime * result + goats;
			result = prime * result + lions;
			result = prime * result + wolves;
			return result;
		}

The version in the original code:

		@Override
		public int hashCode() {
			final int magic = 0x9e3779b9;
			int seed = 0;
			seed ^= this.goats + magic + (seed << 6) + (seed >> 2);
			seed ^= this.lions + magic + (seed << 6) + (seed >> 2);
			seed ^= this.wolves + magic + (seed << 6) + (seed >> 2);
			return seed;
		}

The two HashCode() methods both have about the same execution time.

Example:

Forest.makeForest(517, 555, 506)

With original hashCode(): 8.177 s

With Eclipse generated hashCode(): 19.237 s

(100% repeatable with only a few 100ms diff between executions)

curveship · on June 6, 2014

If anyone is still reading this thread, I optimized the javascript solution and algorithm, speeding it up by almost 20,000x and beating the C++11 one by almost 1,000x. Post is here: https://news.ycombinator.com/item?id=7858485 .

FloNeu · on June 6, 2014

Hmm... it doesn't mention anything about the used VM settings. System has 8gb ram. But as i remember java vm is restricted in its use of ram as is NodeJs ( i guess this test was performed in node). To use full 8gb ram you would have to run multiple nodejs instances? am i wrong?

jaytaylor · on June 6, 2014

I'd love to see how Go stacks up here.

kyrra · on June 6, 2014

This post seems to focus heavily on using map/filter type commands, and Golang does not have functional programming features like this. The Go way of solving this problem would not look like the other solutions here.

AYBABTME · on June 6, 2014

Go has all the 'functional' features that Javascript, Java and C++ have demonstrated here. The difference is in the library implementation of collections/containers; the default ones in Go don't provide the filter/map methods (or any methods for that matter), but nothing prevent you to use that style with a collection that would implement it.

That is, there's nothing inherent to Go, as a programming language, that makes it less functional than Java or C++. Go has first class support for functions and methods as objects, with proper closures, anonymous funcs and such.

jryan49 · on June 6, 2014

Do the Java benchmarks include the start-up time for the JVM? If so, I wouldn't say it's a completely fair comparison. (At least for the times that are small).

m_mueller · on June 6, 2014

yeah, I'd like to see that Fortran code. There's some trivial mistakes that can be made by messing up array access orders for example.

short_circut · on June 6, 2014

I am almost certain it wasn't included for some reason like that. I also suspect that they were using an outmoded version of fortran. Clicking the extra links would tend to validate that suspicion.

Also performance of fortran code is quite dependent on the compiler and compiler options. Somethng that was not well described.

aeroevan · on June 6, 2014

Same here...

I may even take a stab at rewriting it myself if I get some time.

Shorel · on June 6, 2014

I would like to have DLang added to that benchmark.

mike_ivanov · on June 6, 2014

Haskell, anyone?

hiker · on June 6, 2014

See my comment here https://news.ycombinator.com/item?id=7857316

codygman · on June 6, 2014

I plan on giving it a try late this weekend, so we'll see how it goes if I can get to it.

Groxx · on June 6, 2014

Neat results, and a neat comparison of the languages.

I'm a bit curious how an asm.js port (via whatever means. maybe c++ -> emscripten?) ends up performing.

edit: heh.

>Also note that FORTRAN is not included in the list [of relative speedups], because no sane person would switch to FORTRAN from another programming language voluntarily.

mtdewcmu · on June 6, 2014

It's embarrassing that Fortran gets beat by Java.

pkolaczk · on June 6, 2014

It's embarrassing that Fortran got beaten by a very unoptimized Java code. But this is probably not because of Fortran is slow and Java is fast, but because the whole benchmark results are pretty much a coincidence of bad coding and if someone else wrote it, the results could be as well completely reversed. E.g. the Java code uses some of the worst performance antipatterns that are possible to do in Java, e.g. like creating small objects everywhere (Optional), while C++ version uses primitives like ints and bools.

short_circut · on June 6, 2014

That is if we accept the assumption that this was well written modern fortran code. Since they don't actually show the fortran it is hard to say. I looked through the referenced posts and all of those are using fortran 77. Furthermore it depends very very heavily on compiler optimizations. Yet another thing they made no mention of.

http://unriskinsight.blogspot.co.at/2014/05/the-f-word-of-pr...

xkarga00 · on June 6, 2014

I don't think although all these languages have some functional characteristics that they should be called functional languages. It seems like the OP was more concerned with getting the results he wanted rather than answering the “Can a functional language do this as well?” question.

alok-g · on June 6, 2014

See also: https://news.ycombinator.com/item?id=7850394 (Code length comparison for several languages)

exabrial · on June 6, 2014

Actually surprised Java lagged behind so far behind... It's usually the case that "Java is 95% as fast as CPP, but can be written with a fraction of the violence."

Any real explanations?

zmmmmm · on June 6, 2014

I've never heard anybody say that Java is that fast. It can be in very specific situations. Actually 2 - 3 times as fast as C++ is pretty awesome, especially since that is hardly low level Java, rather quite abstract. I suspect low level Java could get it down to a flat 2x in this case. For a huge range of uses 2x as fast is effectively as fast as C, especially since taking advantage of multicore / parallelized operations is significantly easier (in a cross platform manner) than it is in C/C++.

sidww2 · on June 6, 2014

In my experience Java is usually at most 50% worse than C++. This is only the case though when I write Java with minimal use of objects (so no generics for instance). Also this is in comparison to C++ code that's not too micro-optimized (so mostly idiomatic) and that is not compiled via profile-guided optimization for instance.

zurn · on June 6, 2014

It would be very neat to see a Cuda or OpenCL solution

itsbits · on June 6, 2014

what did you use for Javascript??...nodejs??

Flenser · on June 6, 2014

How each program was compiled/run is documented in the source: http://www.unisoftwareplus.com/download/blog/2014-06/magicFo...

scope · on June 6, 2014

don't know if am mistaken or not but I think I read somewhere that V8 got beaten by native