Memory cache optimizations

powertower · on Jan 2, 2014

A dead comment (due to a hellbanned account) of an infamous person here - whos comments don't usually make sense (because of schizophrenia) - is actually quite good this time.

You can turn-on dead comments in your HN account.

...Has anyone actually tested the performance improvements? Are there any?

stusmall · on Jan 2, 2014

I wish he wasn't hellbanned. I enjoy the presence of his comments. They usually aren't very insightful, sometimes not PC, and usually a little frightening but he is without a doubt part of the character of this site. I always keep dead comments on.

daviesliu · on Jan 2, 2014

Cache miss metric in OProfile may be the better solution than this.

tenfingers · on Jan 2, 2014

Do you know if the new "perf" tool can also be used for the same purpose?

minimax · on Jan 2, 2014

Yes you can use perf to profile on cache misses (perf record -e cache-misses).

mtanski · on Jan 3, 2014

Not only that but you can annotate the source and see which function and which instruction (C or asm) caused it. This way you know exactly which field is the one causing cache misses.

Dito for branch miss-prediction.

FooBarWidget · on Jan 2, 2014

Finally, a useful article about cache optimizations. All the other articles I've been able to find give vague hints, but no actual, practical, measurable advise. With these tools I can finally see what's going on in my code rather than making educated guesses.

xyzzy123 · on Jan 3, 2014

What I would have been interested to see is the author's analysis of whether this was actually worth doing or not.

jheriko · on Jan 2, 2014

this is interesting - i would leave a comment there but its not obvious...

looking at the implementation I am curious how good the coverage is: what about the array new/delete operators? what about placement new? what about stack allocations? what about the static data area?

fulafel · on Jan 2, 2014

Sad that this optimization has to be done manually in 2014…

nostrademons · on Jan 2, 2014

It's because of backwards compatibility. The order of struct fields in memory is defined to be their declaration order by the C standard, and a lot of network protocol code will stop working if that assumption fails. So compilers are not free to reorder fields in memory.

The packing algorithm for Cap'n Proto [1] is cache-aware, within the bounds of also accommodating network optimizations, backwards-compatibility, etc. So yes, newer systems do perform this optimization.

[1] http://kentonv.github.io/capnproto/encoding.html