Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
An LZ Codec Designed for SSE Decompression (2016) (conorstokes.github.io)
81 points by mmastrac on May 11, 2017 | hide | past | favorite | 14 comments


If you're interested in results, it's also worth reading the follow-up which show that on equal footings LZSSE8 is a more general choice than LZSSE2 which works well on text and in high compression scenarios (http://conorstokes.github.io/compression/2016/02/24/compress...).


posts involving vectorization are always nice, especially if it involves compression. I'd love to see a vectorized version of arithmetic coding, that would definitely be interesting



Finite State Entropy is the leaner, faster entropy encoder that you probably want to use instead of Arithmetic coding. It's also much more likely to be something that could be made parallel--at least for encoding, Arithmetic coding traditionally uses division, and there's no SSE integer division.

But that's okay, because ANS / FSE are here.


There is LZNA (http://cbloomrants.blogspot.se/2015/05/05-09-15-oodle-lzna.h...) that uses vectorization to update and query its statistical models.


This is pretty cool. It competes well with lz4 even if it is amd64 only. I wonder what an ARM version would look like. Here's to hoping he author keeps working on this.


LZSSE author here, Evan Nemerson has started a portable version of LZSSE that uses his own SSE emulation layer and has apparently got it working on the Raspberry Pi 2.


For anyone wondering, that project is called LZSSE-SIMDe and is available here (https://github.com/nemequ/LZSSE-SIMDe).


Has anyone seen any work done on a virtual machine bytecode that leans on Intel AVX to do its work in parallel?


This just barely counts, but... The game Destiny uses a custom SIMD VM to script loading constants buffers for their shaders. They mentioned it at GDC this year.

It would be fun to write a SIMD-based VM in Terra P:


Many intermediate languages support vector extensions...


I was referring to actual execution of bytecode in parallel, not just support for vectorized math.


Probably wouldn't work in the general case, as each of the bytecode streams would generate memory accesses that are completely independent.

Not to restate the obvious, but SIMD relies on locality of data access to keep the SIMD unit fed with enough data bandwidth.


Are you counting JITing?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: