Hacker News new | past | comments | ask | show | jobs | submit login

L1 cache lines are typically 64 bytes long, and you're only reading 16 bytes of it per level of the tree. And a random pointer tree traversal is the worst case for CPU branch prediction, too.

There are methods for optimising the use of various tiers of cache, trying to pack multiple nodes into one 64 byte line, arranging nodes such that left and right children are usually adjacent in memory so prediction losses are minimised, and other tricks.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: