I find it slightly funny that we go to such great lengths to encode noise, only for smart TVs to filter it out again with their noise reduction algorithms (at least you can turn it off, on most models).
Film grain is often added to computor generated output that is to be composited onto live action footage. If it were not, it would look fake. We also add motion blur and depth refocus.
Agreed though, it does feel weird as a vfx artist to be bringing down the quality of cg to that of film stock.
Ironic. AV1 is the descendant of On2's TrueMotion codecs -- better known as vp3-vp9-av1.
The very first codecs from On2 (then called DuckCorp) -- TrueMotion and TrueMotion2 (used in games such as FF7) -- used a compression technique that did a good job of filtering, then on playback simulating film grain. We used this effect to good advantage, demoing exclusively 24 fps film-sourced material and properly dealing with frame rate using 3/2 pulldown.
I had no idea AV1 or any other codec provided for this. Pretty interesting!
What would undermine it is imperfect removal of real noise or grain from the source material: Then you'd have some remnants of the original grain, plus the synthesized grain... an incorrect result. It seems that this scheme requires perfectly clean input images.
I hope this idea is taken further, and it becomes the norm among codecs or wrappers to provide room for an author-defined post-processing shader to be applied after decompression.
It really doesn't. The whole reason this technique works is that our brain generally processes effects like noise and grain in terms of macro properties, and even if none of the pixels are anywhere like the right value, it will still look "right" as long as the noise distribution is closish.
It doesn't require perfect removal of the original film grain. A mishmash composed of remnants of the original grain combined with new synthetic grain can still look correct to our brains.
According to this paper, they're going a lot of effort to reproduce the original grain very accurately. If they combine that with some half-assed residual grain from the original, why go to all the effort to faithfully reproduce it? Based on your assertion, it wasn't really necessary to begin with.
Various audio codecs have the same thing. They will synthesize background noise. The amount of background noise and the rough frequency spectrum are encoded in the stream.
If you removed the noise, it would sound a bit weird. For telephone calls in particular, the background noise lets you know that the phone call is connected. If you filter it out, people think that the call was dropped.
I can imagine getting a large amount of flaming for this idea, but wouldn't the logical extreme just be to basically define a container for a JS canvas with embeddable resources? In a more restricted form of course, to be sure that it will always render the same, and all compliant computers will have enough power to render it.
You could do things like 480p training videos with ultra-crisp text, and snow/fire/etc effects.
You'd probably have to prevent it from having network access or else you'd just be asking for the videos to rot over time, but you could give them an embedding API or even access to a few small bits of data, so you could have a YouTube video that changes with seasons and location.
If you were strict about the no network access thing you could give access to all sensors, since the data would never leave the device.
YouTube's already got some limited interactivity features, DVDs had them, why not all videos? It would mostly be a novelty, but you could also do really cool stuff like use Bluetooth beacons to skip around to certain parts for an interactive tour, although you might have to enforce downloading the whole thing before playing if you wanted 100% no privacy leaks.
That's an interesting idea. I reckon the complexity added with rendering vector graphics (text in particular, font rendering is notoriously difficult) outweighs the bandwidth savings, but still it seems like an area ripe for exploration. Canvas-like APIs might be too complicated to encode/decode efficiently, yet I suspect something closer to a Web/OpenGL fragment shader would be much more manageable (yet likely much worse for interactivity unfortunately). While that wouldn't quite mesh with the idea of vector-like text you proposed (without a heavy library or two thrown in), I suspect the engineering put into the existing graphics pipeline would make it a more feasible approach to augmented video. Looking at the stuff on Shadertoy.com and Inigo Quilez's work shows the capabilities of fragment shader based graphics, yet I suspect all of the magic would be in the details of the file format and encoding strategy. If anyone pokes around or explores a video/shader hybrid format let me know and post it on HN, I bet a bunch of people would be interested.
> ... but wouldn't the logical extreme just be to basically define a container for a JS canvas with embeddable resources?
Here's my attempt at creating a grainy video effect in the browser using a media stream (from the device's camera - the page should ask for permission first). Very much an MVP; I'm already working on making the dithering algorithm more efficient. https://codepen.io/kaliedarik/pen/OJOaOZz
> YouTube's already got some limited interactivity features, DVDs had them, why not all videos?
I don't know about videos, but again this is doable in the browser using the canvas. For instance, this example: https://scrawl-v8.rikweb.org.uk/demo/canvas-027.html - the swans, and both geese, are clickable (to their respective Wikipedia pages). The navigation around each goose is a clickable box, while the navigation on the swans uses a chroma-key effect to limit the clickable area.
The biggest problem with that idea is that it wouldn’t be possible to have hw decoders as efficient as the current ones.
But there is definitely a trend towards more customizable decoders, with a larger set of primitive operations (DCT, motion estimation/compensation, convolution filters) and parameters. The codec space is slowly moving towards a model where the decoders are VM highly optimized for specific tasks.
> It seems that this scheme requires perfectly clean input images.
It doesn't, the base encode comes out superclean/stable (from my limited tests). The problem is glacial encoding speed which makes this thing not so interesting for home use. Anyway the final result with synthetic grain is wonderful for the bitrate used (I was testing 2 - 4 MBps for HD).
the synthesized noise only covers what is removed by the denoiser, so it all adds up to what the input is (assuming the left-in noise makes it through the encoder)
From the flow graph, the film grain estimation is done on the difference between the de-noised video, and the original. So it only picks up the difference that the denoiser creates.
> and estimating the film grain parameters from flat regions of the difference between the noisy and de-noised versions of the video sequence.
I'm not sure I like the idea of a film's photography potentially looking different based on the decoder used. I would much rather dedicate additional bits to preserving the noise that was present in the original film stock or added during post-production.
That goal might already have been somewhat lost years ago. The video almost everyone after the editors watch has been re-encoded at least once (more likely twice) with one or more different codecs that definitely change the character of the video from that of the source. Any same source content will already look noticeably different depending one which service or provider you’re using, since they each have their own opinionated video content encoding and distribution pipelines. If you’re talking about archival storage in the sense of preserving masters, then I definitely agree and the film grain removal should be disabled in the encoder, so the decoder-side synthesis won’t happen. Thankfully, that’s very easy to do (for example, libaom-av1 encoder in ffmpeg supports denoise-noise-level parameter set to zero to disable those scary parts.)
No one is archiving with av1 and no one ever will. It's all film out, DPX sequences, and prores 4444 for the vast majority of things I've encountered as a post professional.
Funnily, I also don't like these techniques but that's because I'd rather eliminate the noise myself and have a noiseless image when possible. Grain, to me, is simply annoying. Adding it and preserving it has little value.
In subjective tests, normal people often report grain (real or reconstructed) as a defect so you're probably in the majority of viewers. Creators and codec nerds value them more than the audience.
So these grain removal techniques are in many ways a political compromise between people who just want the smaller video, and the people who want the grain.
I'd be fascinated to see how codecs with and without this feature diverge in terms of encoding changes. Presumably there were some big gains just sitting there but the minority that preferred grain stopped then being exploited.
Much of the drama around encoder quality has came down to this dichotomy. Choosing to lose the grain/detail was attacked as plastic or fake by the people who care about it, while people who didn't could point to better test scores. Which then caused more arguments about the validity of the measures.
I am a film person (director, camera guy, work in post production). Grain is an aesthetic choice and has an effect on the perception of an image.
Sure, for the typical Hollywood blockbuster it has not much value and can break the immersion, but for everything else grain can make you look on a picture, rather than into it. This could be the difference between looking through a window and looking at a painting for example. Additionally grain can emphasize the passing of time. A shot of a still life without grain is something different from the very same shot with grain.
Today it is just another tool in the aesthetical toolbox. And someone somewhere decides when to have it and when not to have it.
Now-a-days, I agree, grain is more of an aesthetic choice and for modern films. And, in fact, I generally don't enable de-noise when it's fairly clear that's what's happening.
However, for a lot of film shot pre-2010, grain is an artifact that's not there by choice or artistically. Other than perhaps the film 300.
It just so happens that most of my personal media fits in that box of being pre-2010 which is why I generally denoise.
Something like 30 rock, or The Office, for example, don't have film grain because of some artistic choice.
Interestingly, though, "That 70s' show" does in a few cases even though it's somewhat noisy be default. That's a more tough call.
Even before 2010, directors and their cinematographers put a lot of thought into what film stock to use in order to build the aesthetic they want for their film. Grain characteristics are one of the most prominent distinctions among different stocks.
Again, depends on what's being shot. Like I said, 300 is a good pre-2010 example of a film stock/grain specifically chosen for it's artistic value.
However, for a bunch of film and especially tv shows, the choice had FAR less to do with aesthetics and was more related to cost. Big budget films certainly could pick any sort of equipment/stock they wanted. Lower budget productions didn't necessarily have that luxury.
I'm not saying it didn't happen, rather that the choice in stock was more often than not "cheap while being as clear as we can afford".
Again, you see this with modern film where grain is almost never added (except for specific scenes trying to give more of a dated effect). That, to me, says cinematographers aren't generally trying to pick their stock to add grain. Some do, but that's the exception and not the rule.
Exactly this, there have long been plugins for edit suites that let you specify the grain and color process of specific film stocks to get the look you want.
People today miss the point that things like grain can be an intentional directorial choice and not an artifact to be removed.
in that case, this is still probably a good feature for you since you can make an encoder that just doesn't re-synthesize noise and you'll get a de-noised picture.
Not really, because these encoders are (generally speaking) making pretty large compromises with their noise filters in order to be timely. I want to spend the extra time doing motion compensated denoising.
These noise filter will sometimes be temporal, rarely will be motion compensating (because that's computationally expensive) and as a result can't get as good a result as I can.
Storage is, but additional bandwidth is seldom spent on quality. It's more often spent on cramming in more bullshit, AKA another home-shopping channel.
This seems fantastic news for home archivists! But on the practical side, what is the adoption rate of this technology? Are there any user-accessible tools that use it? (Handbrake and VLC come to mind for encoding/decoding.)
Subjective testing is required to verify that the "tools" the codec has make sense. For day-to-day development and tweaking of the implementation developers use SSIM, which works better than PSNR for noisy images.
Film grain synthesis? Oh man, I don't know how I feel about this one. Without viewing any comparisons, my first reaction is...why not just higher bitrate instead?
Film grain is similar to noise which by definition compresses extremely poorly. By actually modeling the way the film grain is produced, they can avoid sending what is essentially random data over the wire while still preserving the visual effect of the film grain itself. That way, the bitrate can be spent to encode the actual video data at a higher quality.
> The correct answer to this question is that often the choice is not between the original film grain and the synthesized one. When video is transmitted over a channel with limited bandwidth, the choice is often between not having the film grain at all (or having the grain significantly distorted by compression) and having synthesized grain that looks subjectively similar to the original one.
Film grain doesn’t compress well. If you buy a 1080p Blu-ray disk, you can get a bit rate of something like 40 Mbit/s. Go watch 1080p video on Netflix, and you’re going to get something closer to 5 Mbit/s. Yes, the codecs are different—yes, you can talk all you want about how everything is going to be fast when we all get 5G—but this is still the ground reality that most people are working with. The bit rate is more of a constraint you have to work within than a variable you can just tweak to get the results you want.
By comparison, ProRes 422 has a target bit rate of 147 Mbit/s for 1920x1080@29.97.
They really can't - the bandwidth costs would be insane not to mention most people don't have constant 45Mbps+ speed on their TV's or phones (or the hardware wouldn't be able to support it).
I know America isn't the rest of the world, but the state with the lowest average household internet speed is Alaska at 58.6 mbps.
Obviously a streaming service can't expect all their customers to have this kind of high-speed internet, but dynamically changing video quality on the fly to meet bandwidth limitations is a problem they all solved a very long time ago. It is not unreasonable for Netflix to offer such high bitrate streams to customers with the bandwidth for it.
As for hardware, pretty much any hardware h.264 decoder made in the last decade should have no problem with blu-ray spec AVC streams.
Would you take a 1080p BluRay over 4K Netflix stream? I'm legitimately curious. I feel like we sometimes prioritize higher resolutions over overall detail.
I generally take a Blu-Ray 1080p over any streaming option.
Note that plenty of movies, in the theater, were never done in 4K to begin with. 4K is great for selling TVs, but cinematographers and directors seem to be more lukewarm on using 4K. I think most movies are still done at 2K. Your UHD Blu-ray Disc might just be an upscale.
This can be said about every lossy compression technique. Why do quantization that throws away details, and not send higher bitrate instead? Why do edge prediction that can smudge things, and not send higher bitrate instead? Why do inter-frame prediction which can cause wobbly motion and not send higher bitrate instead?
The answer always is that the technique allows better use of bandwidth, so you can have a better image without increasing bandwidth. Or if you're able to increase the bandwidth, you can have even better picture with the technique than without it (until the bandwidth is so high that you can send the video uncompressed, but that's not happening anytime soon for video on the web).
Think of how much money Netflix saves by streaming movies to your TV at 5mbps instead of 10mbps. Serving a single user, the cost difference is negligible, but across 120 million users it probably saves them millions in bandwidth costs.
I still buy blu-rays though so I am a firm believer in the "just throw more bits at it" solution.
That's a false dichtomy. At any bitrate Netflix chooses to stream at, this technique could improve perceived quality. Meanwhile blurays also use codecs with similar advanced reconstructive techniques - even with the higher bitrate they are essential to maintain a high perceived quality.