An LZ Codec Designed for SSE Decompression (2016)

XDirtyPunkX · on May 12, 2017

If you're interested in results, it's also worth reading the follow-up which show that on equal footings LZSSE8 is a more general choice than LZSSE2 which works well on text and in high compression scenarios (http://conorstokes.github.io/compression/2016/02/24/compress...).

alkoumpa · on May 11, 2017

posts involving vectorization are always nice, especially if it involves compression. I'd love to see a vectorized version of arithmetic coding, that would definitely be interesting

eutectic · on May 11, 2017

https://arxiv.org/abs/1402.3392

klodolph · on May 12, 2017

Finite State Entropy is the leaner, faster entropy encoder that you probably want to use instead of Arithmetic coding. It's also much more likely to be something that could be made parallel--at least for encoding, Arithmetic coding traditionally uses division, and there's no SSE integer division.

But that's okay, because ANS / FSE are here.

gliptic · on May 12, 2017

There is LZNA (http://cbloomrants.blogspot.se/2015/05/05-09-15-oodle-lzna.h...) that uses vectorization to update and query its statistical models.

oso2k · on May 11, 2017

This is pretty cool. It competes well with lz4 even if it is amd64 only. I wonder what an ARM version would look like. Here's to hoping he author keeps working on this.

XDirtyPunkX · on May 12, 2017

LZSSE author here, Evan Nemerson has started a portable version of LZSSE that uses his own SSE emulation layer and has apparently got it working on the Raspberry Pi 2.

XDirtyPunkX · on May 12, 2017

For anyone wondering, that project is called LZSSE-SIMDe and is available here (https://github.com/nemequ/LZSSE-SIMDe).

Andys · on May 12, 2017

Has anyone seen any work done on a virtual machine bytecode that leans on Intel AVX to do its work in parallel?

corysama · on May 12, 2017

This just barely counts, but... The game Destiny uses a custom SIMD VM to script loading constants buffers for their shaders. They mentioned it at GDC this year.

It would be fun to write a SIMD-based VM in Terra P:

kobeya · on May 12, 2017

Many intermediate languages support vector extensions...

Andys · on May 12, 2017

I was referring to actual execution of bytecode in parallel, not just support for vectorized math.

sounds · on May 12, 2017

Probably wouldn't work in the general case, as each of the bytecode streams would generate memory accesses that are completely independent.

Not to restate the obvious, but SIMD relies on locality of data access to keep the SIMD unit fed with enough data bandwidth.

Ono-Sendai · on May 12, 2017

Are you counting JITing?