I used to have a computer with an audio interface that had a wordclock input (BNC connector) and a huge rack mounted wordclock (Antelope Isochrone) to theoretically reduce jitter that a cheap internal crystal clock built into an interface might otherwise introduce. To work normally, the interface and external clock both needed to be set to the same thing (say, 48 or 192 kHz, or whatever you wanted). We quickly discovered that if the interface was set lower than the clock, any audio playback would be sped up and high pitch (and vice versa) -- not only playback from the DAW, but from anything, even YouTube videos and so forth. And of course the a/v sync was maintained, so the picture would also be sped up to match.
I wonder if this effect could be completely virtualized as an audio driver, where you choose this middleware as the default output device in the OS, and it messes with the audio clock speed: essentially overclocking the upstream (OS) side whenever an ad is detected, and dropping samples (basically a rudimentary sample rate conversion) proportionately so the downstream (hardware) side never skips a beat. I don't know how an extension/userscript would be able to communicate with said middleware, but maybe there's a way.
Aside: I wonder what would happen with live streams. Probably just periodic buffering, not from congestion but from the analog to digital conversion consuming the stream faster than it's being created. Theoretically a very miniscule version of this problem always occurs if the DAC on the production side is running slightly lower clock speed (say, 47999 Hz) than the ADC on the consumer side (say, 48001 Hz) and the player knows how to gracefully compensate to avoid occasional buffering (or buffering does occur but it's too brief for anyone to notice). Hmm.