Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How is this possibly true? The argument that 16-bit DCT somehow gives better precision AND doesn't change the size of the encode makes no sense. If you get better precision you need to keep those extra-precise bits that are no longer zero due to truncation.

I haven't seen this argued as 16-bit DCT, but in color space conversion. The gist is that all 8-bit RGB values cannot be represented properly in 8-bit YUV420, so you're supposed to use 10-bit to get "proper" YUV values. But if you start with an 8-bit encode you've already thrown away the extra precision, so why waste the (considerable) extra compute on 10-bit just to make sure you don't truncate the already-truncated YUV?

I have a project in progress to measure all of the variations, but from quick testing with CRF encoding the same value results in much longer compute AND a larger file in 10-bit versus 8-bit. The larger file has a slightly higher VMAF score, as would be expected from spending more bits. The work is in finding a set of encoding parameters to measure the quality difference at the same output size, and to measure the relative improvement across CRF vs size vs bit depth.



Replying to just your 1st paragraph:

The process is: Raw input pixels (8 or 10 bit) minus predicted pixels (8 or 10 bit) -> residual pixels (8 or 10 bit + 1 sign bit).

You take these residual pixels and pass them through a 2D DCT, then scale and quantise them. At the end of this, the quantised DCT residual values are signed 16-bit numbers - you don't get to choose the bit-depth here; it's part of the standard (section 8.6). For every 16x16 pixel input, you get a 16x16 array of signed 16-bit numbers.

The last step is to pass all non-zero quantised DCT residual values through an entropy coder (usually an arithmetic coder), then you get the final bitstream.

The key point is that it didn't matter if the original raw pixel input was 8-bit or 10-bit; the quantised DCT residual values became 16 bits before being compressed and transmitted. This is also true for 12-bit raw pixel inputs.

This seems impossible; for 8-bit inputs, you've doubled the size of the data (slightly less than double for 10-bits), so you must be making things worse! The key is that after scaling and quantisation, most of those 16-bit words are zero. Those that are non-zero are statistically closer to zero so that the entropy encoder won't have to spend a lot of bits signalling them.

The last part comes when you reverse this process. The mathematical losses from scaling and quantising 10-bit inputs into the transmitted 16-bit values are less than the losses for 8-bit inputs. When you run the inverse quant, scale and iDCT, you end up with values that are closer to the original residual values at 10-bit than you do at 8-bit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: