The Python code is terrible. That's not how you code in Python. Also, this implementation seems to take more than 32 GB on my machine. Here is a refactor: https://gist.github.com/lqdc/9149772
Anyway, this problem should not be solved this way in python either way. I would have stored everything in a Pandas table and computed stuff from there.
Thanks! I'm rerunning it and updating the tables progressively every few hours as optimisations are suggested. Feel free to make a pull request. Ideally within a day or so the implementations will have all been optimised enough to make for fairer comparison.
I understand that a Pandas table would be a better idea, but the purpose of this benchmark is comparing the speed of the raw languages.
Great, I'll include that in the next run of the benchmark. Is it Python 3 or 2? (Does it work with Pypy) Also, if you're comparing it to C you should compare it to C3.c, from the final tables, which uses bignums like Python does automatically.
*Edit: I tried your faster_py.py, but it didn't seem any faster than Pypy2.py in the repo (running both with Pypy). I haven't yet got the Pandas version to work due to library compatibility issues (I've got about five versions of Python installed; still working on it).
If you wanted to, you could set GCSettings.LatencyMode [1] to LowLatency [2]
> Enables garbage collection that is more conservative in reclaiming objects. Full collections occur only if the system is under memory pressure, whereas generation 0 and generation 1 collections might occur more frequently ... This mode is not available for the server garbage collector.
Just for fun, you wouldn't want to run in LowLatency for too long.
Interesting. I suspect for this particular program it'd have to be in low latency mode for the entire run. It doesn't seem to make too much of a different to the C#2 implementation however (the one using sensible allocation), so I suspect that one isn't generating too much garbage.
Yeah, thats where I had some problems understanding the intention of the test.
In my edited one, the trade object is immutable, so you could just reuse the same object without editing it. In FS.fs, a whole new set of objects is created/allocated on the heap for each test iteration; whereas in FS2.fs, pre-allocated memory is just being edited in place. The first approach seems much more likely to create GC pressure than the second approach.
Similarly, there are a few differences between the implementation in the different languages. For example, in CS.cs, the entire trades array is being re-allocated in each iteration of the test, whereas in the F# tests, it is allocated up front. Changing the array allocation to once only in the C# implementation caused a large speedup (from about 5 secs down to about 1.5 secs on my machine).
As a side note, I changed it from using DateTime.Now to using the Stopwatch class. The reason for this is that DateTime.Now has limited timer resolution (about 10ms), which you can see in the remarks section of the msdn docs (http://msdn.microsoft.com/en-us/library/system.datetime.now(...)
blah, looked at the other c# implementations and saw you already changed the array initialisation, please ignore that part :D
Edit:
yeah, pretty much ignore me, the array initialisation for implementation 1 of F# and C# is almost incomparable to that of the C implementation; which is why the first implementations are so much slower.
If you wanted them to be more directly comparable then for C, you should have an array of pointers which you then fill with pointers to malloc'd addresses.
I should have noticed that sooner... i guess this is why you dont read code at 1am.
No problem. The style of implementations 1 of F# and C# is meant to mimic the style of Java1, the implementation from the original post that inspired the benchmark. F#2 and C#2 use a faster form of allocation. JavaUnsafe uses an implementation more directly comparable to C, but it's actually slower than the Java2 implementation.
It ran at about the same speed as my FS.fs, so I've updated FS.fs in the repo to use your code. Interestingly, I had to change "let main() =" to "let main =" to make it run.
Anyway, this problem should not be solved this way in python either way. I would have stored everything in a Pandas table and computed stuff from there.