Hacker News new | past | comments | ask | show | jobs | submit login

>the entire lookup table fits in L1 cache you can treat KIND_TABLE[id] as only taking ~1-2 cycles.

You can do so, but on modern processors it's ~4 cycles to access the L1 cache and

        return 175 & id & ((id ^ 173) + 11)
takes 3 cycles (cylce 1, do the first & and the ^; cycle two, do the +; cycle 3, do the second &).



Actually on Intel CPUs ALU operations take 1/3 of a cycle and the processor can schedule them back to back.


AFAIK this is not quite right. They take only 1/4th of a cycle on average, but that is because of pipelining. if you have a dependency on the result of an ALU you will still have to wait the full latency (1 cycle) before you can continue.

This makes sense as the clock is also somewhat of the 'driving force' for pushing signals through the chip from one part to the other. (some architectures have 'zero cost' operations I believe, but these are usually baked into the pipeline and have to be turned on-or-off depending on need).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: