> Rendering live audio is quite demanding: the system has to deliver n seconds o...

mbell · on Sept 22, 2019

There is really no such thing as 'instantaneous frequency response'. For any frequency to meaningfully exist, you need data for the corresponding period. e.g. If the audio contains content to 20hz, you need at least 1/40th to 1/20th of a second of data for that to materialize.

Put another way - what you are proposing it looping the buffer which is what some devices do, portable CD players were kinda notorious for it and it doesn't sound much better than cracks or pops. Computers also have a tendency to fall into buffer looping when the system hangs (which is likely the result of the failure mode of realtek codecs).

amelius · on Sept 22, 2019

> There is really no such thing as 'instantaneous frequency response'

Yes that's true, I'm proposing something that uses an approximation of it.

Consider it from a different angle: the inner ear essentially performs a Fourier transform. At every moment the "instantaneous" spectrum determines which hair cells are triggered. Now what I propose is to keep triggering those same hair cells (and not any others) when the buffer runs dry. The exact way of accomplishing this is left as an exercise (though using short windows where you take a FFT could be a good approximation).

human20190310 · on Sept 22, 2019

> The exact way of accomplishing this is left as an exercise

Perhaps you should undertake this exercise and let us know how it sounds :)

EDIT: In my experience with audio, when I have a bug that introduces even the slightest discontinuity (or even just a cusp) in the audio, well short of a pop to silence, I can still hear a "weirdness". Ears are pretty attuned to things that sound unnatural. I'm not confident that essentially "forging" the audio is going to sound natural.

stavros · on Sept 22, 2019

What if you train a deep neural network on the song so far, so it can generate plausible-sounding music whenever the buffer drops?

You can even hang intentionally to generate original music!

(/s, please don't)

nitrogen · on Sept 23, 2019

As another comment mentioned, this is done by conferencing software to deal with packet loss. It sounds like they either loop the previous DCT frame and gradually fade out, or feed the time domain output into a reverb, then cross-fade from 100% dry to 100% wet if the buffer is about to run out (someone on HN mentioned a while back that this approach was patented).

You could maybe make an argument that it would be useful in live music settings to prevent a bad situation from sounding even worse, and maybe you'd put it on some audio software so you can sort of still enjoy playing music on a crappy system, but really, it's best to have hardware and software that can 100% guarantee keeping up with audio processing.

squeaky-clean · on Sept 22, 2019

I like the idea but one problem is that you usually encounter a buffer underrun when the cpu can't keep up, and so adding an extra step would require something like leaving enough processing headroom each buffer to halt it early and run the approximater.

edited typo

kierenj · on Sept 22, 2019

"Output the same frequency spectrum" - takes some considerable processing! We're usually working in time-domain, not frequency-domain. I think you're saying do a load of processing to fill the gap when you don't have time to do processing?

amelius · on Sept 22, 2019

Audio software often works in the frequency domain too. CPUs have optimized instructions for this (video also uses them).

Also, the article speaks of multithreaded software, where deadlines can be missed because of complicated dependencies. The end stage where you correct for missing samples can work independently of them in its own thread.

vnorilo · on Sept 22, 2019

For the record, I think this is a terrible idea.

However, the way I'd do it if needed: keep two or the most recent good buffers. When you need to synthesize, start running a phase vocoder based on the hop between those two. You get frozen sinusoids and some random noise for the bins that don't have one, and almost no cpu use on the happy path of no underruns.

Still, don't do it :)

amelius · on Sept 22, 2019

> For the record, I think this is a terrible idea.

It really depends on who you are asking. Some people just hate those loud cracks and pops, and would love to have something that filters them out naturally.

atoav · on Sept 22, 2019

So you always calculate the same frequency spectrum just in case of dropouts instead of getting the actual code right?

Sorry, but if you ever find yourself in sich a situation stop, pause, make yourself a tea and consider how wise the thing you are doing really is.

hburd · on Sept 22, 2019

I actually think it's actively harmful to hide problems that can be otherwise fixed. If the CPU is too busy to keep filling the audio buffer, the solution is to increase the buffer size to put less stress on the scheduler. I recently reduced my buffer size in Ableton Live, but I knew I had to increase it because I could hear pops. If these pops were being covered up, I wouldn't have realized my buffer size was too small and I'd be unknowingly introducing subtle artifacts into every recording.

amelius · on Sept 22, 2019

Ok, but a large buffer size means more latency.

Also, the settings you use for development do not have to be the same as those used in production.

josteink · on Sept 22, 2019

> Ok, but a large buffer size means more latency.

Depending on the use-case, that may not be a problem. Not all things are latency-sensitive.

TylerE · on Sept 22, 2019

With anything besides headphones there will always be latency anyway. Roughly 1ms per foot from the speaker.

hunter2_ · on Sept 22, 2019

Based on nothing more than user experience, conferencing software like Zoom tends to do something that sounds quite like what you're describing, complete with the fade to silence after about half a second.

So it makes sense for certain live situations, but it wouldn't be desirable in studio recordings.

hashmal · on Sept 22, 2019

Assuming it's a viable strategy with regards to processing resources (hint: it's not for anything more than toys) you will have audible artifacts, especially around transients. Filtering and additional processing will only alter the signal even more.

rhizome · on Sept 22, 2019

Instead of outputting silence, you could output the same frequency spectrum as you were right before the event.

How does one detect whether it's a musical silence or a buffer underrun?

nitrogen · on Sept 23, 2019

You'd probably do this at a layer where you have access to the buffer stats to know that the buffer is nearly empty.