Hacker News new | past | comments | ask | show | jobs | submit login

I recently tested different approaches. I’ve been working on some code that downsamples large set of 1 bit voxels to get shades of gray on the edges. For that, I had to counts gigabytes of those bits as fast as possible.

Advanced manually-vectorized SIMD code worked several times faster that lookup, esp. on the hardware that supports SSSE3 or XOP instructions.

And even when the hardware doesn’t have SSE4, doesn’t have SSSE3, doesn’t have XOP — SSE2-only backup plan is still faster than lookup tables. Here’s the code: http://stackoverflow.com/a/17355341/126995




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: