The goal was to showcase that MoEs quantized down to 1.58bit without any further training does in fact work!
reply
The goal was to showcase that MoEs quantized down to 1.58bit without any further training does in fact work!