It is rather sad that none of the 49 comments (at this time of posting) have made a mention of the underlying math problem, which is super interesting, and instead focus on dubious speed metrics (dubious in the sense these metrics change with the chip/cache/RAM/compiler/lang/coding-style & aren't that relevant anyways, compared to the underlying problem)
Lemme rephrase this rather interesting problem: There are 2055 startups, 2006 bigcorps & 2017 zombies. If the startup gets bought out by the bigcorp, the bigcorp has wasted its well earned money and soon becomes a zombie. If the zombie folks instead join hands with the startup, they suddenly make pots of money & the startup become a bigcorp. Finally, if the bigcorp uses its cash prudently and buys the zombie, it starts innovating & becomes a startup.
So the claim is that if you let this economy play out in all its glory, there will be 1.448 billion buyouts. To arrive at this giant figure of 1,448,575,636 takes anywhere between 335 seconds for the C++ hacker to 7000 seconds for the JS guys.
1) In any of the transformations [(-,-,+),(-,+,-),(+,-,-)], the delta b/w numbers (s-b, s-z, b-z) changes by 0 or 2. What this tells us is that if the delta is odd to start, it will always be odd, if even always even.
2) Since the final state will involve only one type left standing (or else the stronger could eat the weaker), we know that the final counts will be two at 0, one at something >0. Since our losers end with an even delta (0), they must always have had an even delta. In the scenario you proposed, that's s and z (2055 - 2017 = 38). So b, bigcorps, is going to be our winner.
3) Now that we know who will win, we need to find the maximum possible number of bs remaining. Since each move reduces the total population by one, the maximum bs will be produced by the shortest sequence of moves that gets rid of all zs and ss.
4) As long as z>0 and s>0, our best move is to have z eat s, as it drops the population of z and s by 2. So our first sequence is to have all zs eat ss, leaving us with s = 38, b = 4023, z = 0.
5) When z = 0, our only possible move is b eats s, creating a z, which can then be eaten by an s to produce a b, thereby restoring b to its original count and lowering s by 2. After repeating this sequence 19 times to get rid of the 38 ss, we'll have s = 0, b = 4023, z = 0.
This is indeed an interesting problem. I don't know what the runtime breakdown is, most likely in the sorting routine, but if that's the case the problem can be solved much faster but not resorting to generic stable_sort().
I summarize the author's algorithm as follows: take an array of R^3 vectors, and for each element in the vector, perform the [+,-,-], [-,+,-], and [-,-,+] transform. So N numbers become 3N numbers, which is followed by a duplicate removal process faciliated by the sorting.
The improvement is that the duplicate removal can be accomplished without sorting.
Let's say you have an array of R^3 already sorted. Then say you create 3 arrays each of which is created by "shifting" the array in each of the 3 directions. Call the original A, and the derived B0, B1, B2. Note that B0, B1, and B2 are each sorted since element within each array is adjusted in the same direction. Then all you need to do is to do a 3-way merge, which takes linear time.
I bet by doing this, the runtime could be significantly reduced. My guess is that it runs in the range of 10-30 seconds (instead of 335 seconds).
It is a really cool math problem, and we should be using math to solve it! This problem took me 30 minutes to solve without programming and so I'm only a few minutes off Fortran. (I scale well too since I can now do it in constant time - fine log time if we count reading the input.)
A minor nit to pick: the 1,448,575,636 number is not the number of buyouts. (It looks like the number of nodes in the evaluation graph.)
Spoiler: the answer is 111110110111_2. (edit: had the wrong answer)
But C++11 solution is not functional. It uses state variables: look at the while loop, it updates variable. Look at the next_forests.reserve, std::back_inserter. These are not clear functional constructs. Of course it is possible to model memory state with the monads :) but... The C++ code is quite different from other codes. So i was surprised by Java which is only 3 times slower than C++ while running a much less effective code
Glad someone pointed that out. It is cheating. It is doing a lot of state transformations like in-place sorting / filtering. I'd like to see some real C++ functional code here, using immutable data structures, not an imperative program using lambdas. I guess it would be both much harder to write (C++ is not really a functional language) and much slower, because dynamic memory allocation in C++ is more costly than in Java.
Isn't this just evidence of how broad of a definition "functional" programming is? It is the scottsman of programming debates. (Though, I suspect any "paradigm" example will fall into this trap.)
It's pretty astonishing how easily C++ trounces everything else including Java. I guess there's still something to be said for compiling directly to optimised binaries.
Supposedly, Java is going to be faster than native code any day now. It's been said for years. The case was somewhat credible at one time, because the opportunity exists to optimize using runtime information. I think the reason it didn't go that way is:
1. CPUs have gotten very good at doing runtime optimization kinds of things on their own, like predicting branches and reducing the cost of virtual function calls.
2. Java only does optimizations that can be done quickly, since the optimizer has to compete with the executing program itself.
3. The claim was overblown to begin with, and Java is trying to do too many other things, like be secure, that interfere with performance.
They use Java because it's fast enough, easy, safe, reliable, easy to instrument, easy to debug, has wonderful tool support, is mature, has many tested and heavily used frameworks, has a free chunk of code to do anything you can imagine you can just snag via maven, integrates with everything, works on all major platforms and costs nothing.
No, what happened is mainly that memory access has become a big bottleneck for most programs, and small footprint and cache-friendliness can easily mean a speed difference of 20x. For a number of reasons (such as Object overhead, UTF-16 strings, lack of true object arrays), it is very hard to write small-footprint cache-friendly code in Java.
The mistake is using small objects since Java doesn't have structs and cannot really peel objects under normal circumstances. If I had to write Java code for benchmark I'd just use a long array and represent the state in a long (not a java.lang.Long)
Writing really cache friendly code in Java requires to look at the problem orthogonally and use int[]/long[] instead of a small Object with 3 small ints.
The Java version simply features some quite horrific code but can easily be run in parallel - no one mentions that. Using streams in pure functional way w/o the parallel in mind is a ritual suicide.
Finally, The CPU running benchmark has only 2 cores which is a disadvantage with tons of allocation and garbage.
Object inlining might be the next big thing for the JVM. It can already inline functions, but it can't inline member objects in containing objects, or convert your Integer members to int members, or convert an array of Objects into an array of structs.
Converting Object[] arrays into single chunks of packed members when they're all the same type would be a pretty big win by itself.
Only computation-heavy code will show a difference. And the disparity is going to vary a lot. Some code will be easy for Java's optimizer and won't give C++ any advantage.
Of course there are, and it only takes a few seconds with Google to find them. Here's one: the frannkuch-redux benchmark at http://benchmarksgame.alioth.debian.org/u64q/java.php gives Java a >20% CPU efficiency advantage over C++.
That compares a Java program written to use multiple cores with a C++ program not written to use multiple cores; because the C++ programs written to use multiple cores did not compile.
It is very good. But it's not quite fair to compare it to, say ruby or PHP, because it still requires a compilation step, and it's still pretty demanding on the programmer.
I can't see it happening even in a far future except maybe for some artificial microbenchmarks. Java programs use a lot of memory, and since everything is allocated on the heap garbage collection is an issue, impacting real-world programs.
In terms of raw CPU speed, access to vector instructions directly can be game changer in various applications. Also C/C++ compilers are also getting better each day.
> Java programs use a lot of memory, and since everything is allocated on the heap garbage collection is an issue, impacting real-world programs.
That is a very good observation. Looking at back in the past at some point memory speed wasn't that much slower than CPU speed (rather because CPU speeds were not that fast then).
So then throwing more memory at something seems like a very good way improve performance. Like say following long chains of pointers through some nested structure or long linked list was ok. At some point CPU speed went through the roof and left memory access speeds behinds.
So then caches became very important. Cache aware programming was a "thing".
That, coupled with lots having virtualized/cloud machines everywhere that have limited memory kind of turned that initial thinking on its head. Small memory footprint became a desirable trait. Just like in the old MS-DOS & Turbo Pascal days.
Java sort of flourished and grew in that time period where "throw more ram at it to get performance" was a very obvious thing to do. Now I think it is less obvious that is the best way.
(And perhaps Java's current or future GC and JIT strategies will start to take into account caches and memory frugality better).
Now that said I am still amazed at how it can achieve such great performance given all the stuff it does behind the scenes. It is not faster than C but heck, it is very fast still.
Java still lets you make a giant array and operate on that. For high speed numerics programming you do that in every language: FORTRAN, C, Java, etc. At that point comes down to how much information the compiler and you can share: restricted pointers, use assembler kernels, etc.
Also profile-guided optimization is ever more widespread with C/C++ compilers, which certainly wins some of the gains you'd normally only expect from the JITing VM.
C++ PGO can do less than JVM JIT does. E.g. JVM JIT takes into account actual configuration options chosen by the user and actual, not predicted workload. This can help e.g. with inlining, because JIT can see only one particular implementation of something was loaded by the user, and optimize for just that case. C++ PGO profiles for a single workload and config chosen by the program creator, which may or may not match what user is doing.
In practice, PGO with C/C++ gets most of the really valuable low-hanging fruit, provided the test cases aren't too far off from a typical workload.
Also, the JVM JIT needs a steady-state workload to make good guesses about it's optimizations - if it JITs a bunch of methods and then the load changes such that the optimizations no longer apply (loops iterate differently, inlined methods don't get called as much), the JVM's runtime optimization can be foiled. JITting a very heterogenous program can trick the JVM into finding a local-minimum in native performance (which I guess is still a lot better than actually interpreting java byte code).
The type of code makes a big difference in how it will perform relative to C++. You also have to be somewhat expert in each language. You can write arbitrarily slow code in any language. It's not easy to be sure that you're giving each language its best shot.
I have never seen Sun or Oracle make that claim. You seem to simply be ranting on about a strawman argument, with a rather strange java-hate obsession.
I mean, Fortran did worse than Java. Where is your writeup for that?
There was certainly a lot of talk within the Java development community about how Java was going to meet or overtake native code as the HotSpot VM matured. See, for example (from 1998):
"According to Sun Microsystems, the Hotspot virtual machine, Sun's next-generation Java virtual machine (JVM), promises to make Java "as fast as C++.""
I'd say that statements like that are subject to some degree of interpretation. It's hard for one runtime to be definitively faster than another runtime. It is, however, quite possible for one runtime to have cases where it is better, cases where it is worse, and cases where it is equivalent such that it is reasonable to say that it is "as fast as" the other. Java tends to be a bit slower than C++ still, but the differences between it and "as fast as" are trivial enough that a number of HFT systems, for example, are written in Java.
While I think the comment here leaves that open as a possibility, the second sentence of the article it's from makes it pretty clear. "Specifically, Sun says that a platform-independent Java program delivered as bytecodes in class files will run on Hotspot at speeds on par with an equivalent C++ program compiled to a native executable."
Considering that I can get different performance for the same C++ program simply by using different compiler, or even different compiler options, I'd say that there's a lot of wiggle room.
Actually, there is a country mile of wiggle room there. It doesn't have to be true of all programs for one, and "on par with" gives you plenty of wiggle room.
Cliff Click, a smart man in whom I see many admirable qualities, than whom there are few more complete experts on the JVM, gave a tech talk at Facebook about how Java is faster than C++ just a few months ago. Tons of perfectly smart engineers sat and took it seriously. Much of the content was a rehash of this older discussion: http://www.azulsystems.com/blog/cliff/2009-09-06-java-vs-c-p...
There are plenty of serious, technically deep people trying to convince their colleagues that Java is as fast as C.
I know a professor who worked on HotSpot Java VM in early 2000s. He is a smart man, but he always makes these false claims how Java will be as fast as C++ the next years. He was the boss of a local Oracle Labs company, though he left that company recently and is now an C# advocate and make similar claims in favor of MS.
Such persons are quite annoying as they try to influence a lot of students.
It's worth remembering that native code isn't fast automatically. An unskilled programmer, for instance, could easily write C++ that's slower than Java.
There are a few possible reasons Fortran wasn't faster:
1. Fortran's compilers haven't advanced at the same rate as other languages due to its lack of popularity.
2. Fortran is inherently harder to optimize.
3. People have forgotten how to write fast Fortran, since no one uses it anymore.
Fortran is actually easier to optimize than C or C++, as procedure arguments and variables cannot alias each other. Since C99, you can use the restrict keyword to enforce aliasing rules, but many C programmers don't do this, and the restrict keyword isn't even in C++.
Um no. It's a huge deal for heavy duty computational code, so much so that most performance oriented numeric libraries have extra code to workaround the case where restrict is not available..
Whatever the claims about Fortran, I would say that 3) is definitely not applicable here (two implementations provided, 2) is laughable (some people used to argue Fortran was the easiest to optimize) and 1) is true, but not in a way that would have a significant impact on this benchmark. Fortran compilers have been optimized over the years pretty extensively, and really of late the only reason to have a Fortran compiler is for efficient numerical computation, so that is something Fortran compiler writers have focused on.
The main issue is that C version is hardly functional programming: stuff like array<int,3> for forest and then counting instead of massive branching like java's isStable();
Reserving the entire next_forests like that:
I don't see how the use of std::array<int,3> and std::count makes the C++ version less functional. Also, the next_forests.reserve doesn't help as much as you state, it's barely measurable when commenting out from my experiments (try it out!).
Having dealt with trying to make Java run within some reasonable margin of C++11, I'm not surprised at these results at all. It's really hard to make Java go that fast.
I'm instead shocked that the C++ version beat Fortran.
The Intel and GNU C++ compilers usually produce even faster code than the compiler used (Clang). Also, if it's business critical, you can do heavy calculations through GPUs or other custom hardware.
./magic_forest_gcc 617 655 606 7.22s user 0.36s system 100% cpu 7.580 total
./magic_forest_icc 617 655 606 7.00s user 0.34s system 100% cpu 7.340 total
./magic_forest_clang 617 655 606 5.73s user 0.25s system 100% cpu 5.980 total
gcc version 4.8.2
icc version 14.0.1
clang version 3.4
(On 3.14 powered arch linux)
It's might be interesting to see what kind of optimisation clang does, but I currently don't have the time to look into it.
And then there's the question of what compiler flags you used. Furthermore, one of the compilers could be really good at profile-guided optimisation (PGO), beating the others.
How big are the binaries? ICC inlines a lot more heavily, it's possible (I've seen it before) the code size has bloated, not fitting into the IC...
Having said that, most of ICC's general gains over GCC and LLVM are due to the much faster maths libraries (even more so on Linux), and that doesn't look like that would be useful for these tests...
One thing that I've yet to see mentioned is inlining. Almost all the functions/lambdas passed to STL functions in the code are likely inlined. This gives a huge performance boost and is one of the few cases where C++ is often faster than even C as calls through function pointers can be eliminated.
Using sort to filter duplicates is horribly worse compared to hashing (javascript for instance).
Java version has rather poor impl, it's interesting to see the GC+allocation cost and the GC type used.
C++ version does not use really use func. prog in find_stable_forests and meal...
Using sort to filter duplicates is horribly worse compared to hashing (javascript for instance).
Is this really true? A sort plus a linear scan has very low constant time factors, good cache locality (depending on the sorting algorithm used), and no need to allocate if you're sorting in place. I've seen good results using sort to filter duplicates in my own performance-sensitive code. Are you saying this technique is "horribly worse" based on your experience or intuition?
It will likely get faster the more duplicates you have, as it allows the CPU to predict a branch more reliably several times before being wrong and having to re-do some of its prediction stuff. (note that I'm not experienced with optimizing stuff, dealing with cache optimizations, etc., but I have read a decent amount, and I did take a class about CPU architecture) I would also suspect that hashing might end up messing with memory all over the place, causing the cache to be partly useless.
That question (and answer) is probably not relevant here. The important thing is likely cache locality, not branch prediction (as the test case is very different in that question, amplifying branch misprediction issues, which won't be the issue here).
In the case of the JS solution, it runs horribly and is highly inefficient.[0]
For non-hot code, it is perfectly fine, but in this case, inside of a main loop like this it is a waste of cpu cyles. Compare the amount of forests here:
$ node orig-magicForest.js 117 155 106
total forests: 1522899
{ goats: 0, wolves: 0, lions: 223 }
total time: 816ms
$ node new-magicForest.js 117 155 106
total forests: 428
{ goats: 0, wolves: 0, lions: 223 }
total time: 6ms
A message to the blog owner: get rid of that swipe to go forward/backward between posts when viewing on mobile. It looks good, but it is functionally frustrating. Why would a user expect different swipe behaviour in one direction to another?
I'm not sure if I got the problem right, because it solves the hardest case almost instantly (2006 lions, 2055 wolves, 2017 goats -> 4023 lions), in 0.8s on my macbook air.
I used a general search algo that I've also used in the past for the missionaries and cannibals and snake cube puzzles.
It uses a set to store the past states seen, instead of deduping a list.
It would be interesting how LuaJIT, PyPy and HHVM (PHP JIT) score against JS v8 and Java 8 on that test environment.
Also, the clang C++ compiler is (a lot) slower than gnu C++ compiler. (we had a benchmark on HN that showed that clang C++ is 5% slower and asm.js is 10% slower than gnu gpp) The comparision should be executed on Linux or Windows as OS X is known for shipping with older versions of Unix tools/applications.
Here's a Haskell translation of the C++ version. Runtume is around 10x of the original (probably vector vs list cache trashing and vector sort/unique vs list to set/set to list), code reduction is 3x.
import qualified Data.Set as S
data Forest = F Int Int Int
deriving (Eq, Ord, Show)
meal forests = (S.toList . S.fromList)
[nextForest |
forest <- forests,
meal <- possibleMeals,
let nextForest = forest <+> meal,
valid nextForest]
where
possibleMeals = [
F (-1) (-1) 1,
F (-1) 1 (-1),
F 1 (-1) (-1)]
F x y z <+> F x' y' z' = F (x+x') (y+y') (z+z')
valid (F x y z) = x >= 0 && y >= 0 && z >= 0
findStable forest = iter [forest]
where
iter forests | not (done forests) = iter (meal forests)
| otherwise = filter stable forests
done forests = null forests || any stable forests
stable (F _ 0 0) = True
stable (F 0 _ 0) = True
stable (F 0 0 _) = True
stable _ = False
main = print $ findStable (F 117 155 106)
Strangely, if the hashCode() method in the Java version is replaced with one generated by Eclipse, the java program runs much slower.
In fact, the execution time is more than doubled.
Why is this so?
The Eclipse generated version of hashCode():
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + goats;
result = prime * result + lions;
result = prime * result + wolves;
return result;
}
If anyone is still reading this thread, I optimized the javascript solution and algorithm, speeding it up by almost 20,000x and beating the C++11 one by almost 1,000x. Post is here: https://news.ycombinator.com/item?id=7858485 .
Hmm... it doesn't mention anything about the used VM settings.
System has 8gb ram. But as i remember java vm is restricted in its use of ram as is NodeJs ( i guess this test was performed in node). To use full 8gb ram you would have to run multiple nodejs instances? am i wrong?
This post seems to focus heavily on using map/filter type commands, and Golang does not have functional programming features like this. The Go way of solving this problem would not look like the other solutions here.
Go has all the 'functional' features that Javascript, Java and C++ have demonstrated here. The difference is in the library implementation of collections/containers; the default ones in Go don't provide the filter/map methods (or any methods for that matter), but nothing prevent you to use that style with a collection that would implement it.
That is, there's nothing inherent to Go, as a programming language, that makes it less functional than Java or C++. Go has first class support for functions and methods as objects, with proper closures, anonymous funcs and such.
Do the Java benchmarks include the start-up time for the JVM? If so, I wouldn't say it's a completely fair comparison. (At least for the times that are small).
I am almost certain it wasn't included for some reason like that. I also suspect that they were using an outmoded version of fortran. Clicking the extra links would tend to validate that suspicion.
Also performance of fortran code is quite dependent on the compiler and compiler options. Somethng that was not well described.
Neat results, and a neat comparison of the languages.
I'm a bit curious how an asm.js port (via whatever means. maybe c++ -> emscripten?) ends up performing.
edit: heh.
>Also note that FORTRAN is not included in the list [of relative speedups], because no sane person would switch to FORTRAN from another programming language voluntarily.
It's embarrassing that Fortran got beaten by a very unoptimized Java code. But this is probably not because of Fortran is slow and Java is fast, but because the whole benchmark results are pretty much a coincidence of bad coding and if someone else wrote it, the results could be as well completely reversed. E.g. the Java code uses some of the worst performance antipatterns that are possible to do in Java, e.g. like creating small objects everywhere (Optional), while C++ version uses primitives like ints and bools.
That is if we accept the assumption that this was well written modern fortran code. Since they don't actually show the fortran it is hard to say. I looked through the referenced posts and all of those are using fortran 77. Furthermore it depends very very heavily on compiler optimizations. Yet another thing they made no mention of.
I don't think although all these languages have some functional characteristics that they should be called functional languages. It seems like the OP was more concerned with getting the results he wanted rather than answering the “Can a functional language do this as well?” question.
Actually surprised Java lagged behind so far behind... It's usually the case that "Java is 95% as fast as CPP, but can be written with a fraction of the violence."
I've never heard anybody say that Java is that fast. It can be in very specific situations. Actually 2 - 3 times as fast as C++ is pretty awesome, especially since that is hardly low level Java, rather quite abstract. I suspect low level Java could get it down to a flat 2x in this case. For a huge range of uses 2x as fast is effectively as fast as C, especially since taking advantage of multicore / parallelized operations is significantly easier (in a cross platform manner) than it is in C/C++.
In my experience Java is usually at most 50% worse than C++. This is only the case though when I write Java with minimal use of objects (so no generics for instance). Also this is in comparison to C++ code that's not too micro-optimized (so mostly idiomatic) and that is not compiled via profile-guided optimization for instance.
Lemme rephrase this rather interesting problem: There are 2055 startups, 2006 bigcorps & 2017 zombies. If the startup gets bought out by the bigcorp, the bigcorp has wasted its well earned money and soon becomes a zombie. If the zombie folks instead join hands with the startup, they suddenly make pots of money & the startup become a bigcorp. Finally, if the bigcorp uses its cash prudently and buys the zombie, it starts innovating & becomes a startup.
So the claim is that if you let this economy play out in all its glory, there will be 1.448 billion buyouts. To arrive at this giant figure of 1,448,575,636 takes anywhere between 335 seconds for the C++ hacker to 7000 seconds for the JS guys.
Now give it your best shot!