VC-2 is an intra-only wavelet-based ultra low latency codec developed by the BBC years ago for exactly this purpose. It is royalty free and currently the only implementations are in ffmpeg and in the official BBC repository, and are CPU based. I am planning to make a CUDA accelerated version for my master thesis, since the Vulkan implementations made at GSoC last year still suck quite a bit. I would suggest people to look into this codec
Some capture cards (Blackmagic comes to mind) have worked together with NVIDIA to expose DMA access. This way video frames are automatically transferred from the card to the GPU memory bypassing the RAM and CPU. I think all GPU manufacturers expose APIs to do this, but it's not that common in consumer products.
> Are there APIs which can sidestep the "load to CPU RAM" part?
On windows that API is Desktop Duplication. The API delivers D3D11 textures, usually in BGRA8_UNORM format. When HDR is enabled you would need slightly different API method which can deliver HDR frames in RGBA16_FLOAT pixel format.
In your experience, how does VC-2 compare to JPEG XS from a quality perspective? The JPEG XS resources I’ve seen say JPEG XS has higher visual quality, but curious what it’s like in practice.
JPEG-XS is an almost direct successor to VC-2. They use the same techniques and if you read JPEG-XS's whitepaper they explicitly cite VC-2 as an inspiration and a target to surpass. JPEG-XS is an improvement, there is not doubt about that, but unfortunately they decided to patent it for all uses. In both cases, the publicly available software implementations are very few, CPU-based, and the ones that aren't are implemented in hardware inside business AV solutions.