That let you think if we can rewind the time, maybe we should just allocate one ...

		liuliu 22 hours ago \| parent \| context \| favorite \| on: Lossless LLM compression for efficient GPU inferen... That let you think if we can rewind the time, maybe we should just allocate one more bit for half precision (6 exp, 9 mantissa) and not doing this bfloat16 thing.