Hacker News new | past | comments | ask | show | jobs | submit login
Memory cache optimizations (libtorrent.org)
138 points by dbaupp on Jan 2, 2014 | hide | past | favorite | 11 comments



A dead comment (due to a hellbanned account) of an infamous person here - whos comments don't usually make sense (because of schizophrenia) - is actually quite good this time.

You can turn-on dead comments in your HN account.

...Has anyone actually tested the performance improvements? Are there any?


I wish he wasn't hellbanned. I enjoy the presence of his comments. They usually aren't very insightful, sometimes not PC, and usually a little frightening but he is without a doubt part of the character of this site. I always keep dead comments on.


Cache miss metric in OProfile may be the better solution than this.


Do you know if the new "perf" tool can also be used for the same purpose?


Yes you can use perf to profile on cache misses (perf record -e cache-misses).


Not only that but you can annotate the source and see which function and which instruction (C or asm) caused it. This way you know exactly which field is the one causing cache misses.

Dito for branch miss-prediction.


Finally, a useful article about cache optimizations. All the other articles I've been able to find give vague hints, but no actual, practical, measurable advise. With these tools I can finally see what's going on in my code rather than making educated guesses.


What I would have been interested to see is the author's analysis of whether this was actually worth doing or not.


this is interesting - i would leave a comment there but its not obvious...

looking at the implementation I am curious how good the coverage is: what about the array new/delete operators? what about placement new? what about stack allocations? what about the static data area?


Sad that this optimization has to be done manually in 2014…


It's because of backwards compatibility. The order of struct fields in memory is defined to be their declaration order by the C standard, and a lot of network protocol code will stop working if that assumption fails. So compilers are not free to reorder fields in memory.

The packing algorithm for Cap'n Proto [1] is cache-aware, within the bounds of also accommodating network optimizations, backwards-compatibility, etc. So yes, newer systems do perform this optimization.

[1] http://kentonv.github.io/capnproto/encoding.html




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: