Couple of things to note: MP3 is not appropriate for use in real time due to a variable amount of silence added to the start of the sample intrinsic to the compression. You can sometimes use it for music But if there’s any changes based on game events mp3 is unusable. A lot of work has been put into optimizing mp3 including at os and hardware levels but that’s not usable in games. Commonly it’s some custom spin on vorbis which is
Additionally, there can easily be 20-40 sounds playing at once, more if you haven’t optimized yet (which typically happens in the last few months before release). These also need to be preloaded slightly, and once playing stream from disk, so source starvation needs to be handled and the codec needs to not be glitchy if missing some packets.
It’s also happening in a system that needs to be real-time and keep each frame timer on the order of milliseconds, though Moore’s law moving to parallelization has helped a lot. You’d be surprised how under powered the consoles are in this regard (caveat: I haven’t developed on the upcoming gen, which is getting better)
As for loading and caching on demand, that’s limited by memory, given the sheer amount of samples used in games, it’s just not practical. For specific example in a very well known game, there are over 1600 samples for boxes hitting stuff (impact sounds). What I’m building right now is meant to make generative audio easier and reduce the number of samples needed, so more tools to process sound could make this naive caching approach practical
> For specific example in a very well known game, there are over 1600 samples for boxes hitting stuff
That almost sounds as if it could be worthwhile to synthesize on demand from a very small set of (offset/overlaid) base samples through a deep chain of parameterized FX. With 1600 FX parameter preset tuples as the MVP, bonus points for involving game state context.
That’s literally my startup, I won’t get deep into the reasons why good tools for this don’t exist yet, but if you imagine game development is like regular development but with orders of magnitude more chaos you can understand how difficult it is to build stuff for reuse. After 15 years in the industry, my approach is just the same as yours
> You’d be surprised how under powered the consoles are in this regard
As another commenter mentioned, these games shipped with compressed audio for consoles. Also that generation of consoles have pretty good hardware codecs for audio (320 channels in the Xbox).
And MP3 was just an example of what I had here at my disposal. But as an exercise I converted my 4 minute MP3 to Vorbis. Decoding it converting to WAV took the same amount of time as before: about half a second on a very old and underpowered MacBook Air. Most of this time is spent writing 50mb to disk.
Yeah that is curious if consoles shipped with compressed audio but not PC. The prevailing wisdom on PC is codecs are easier to deal with due to dedicated audio thread. Decisions like that are not made lightly so now I’m curious what the reason was
Edit: reasoning is here: https://www.rockpapershotgun.com/2014/03/12/respawn-actually...
Minspec is 2-core PC, probably to support large player base and as noted before there can be 20-40 audio files all streaming from disk and decoding, so sure one file might be fast but no way that’s happening on a 2-core PC. Sure one file might decode fast, but 40 of them, all also streaming from disk while keeping frame rate playable, just impossible
Good points. But there's still the possibility to decompress during installation, which shouldn't be too hard even for 2-core computers, and is probably faster than downloading.
Also, according to the article they're packing all the locales. To me this seems like a bigger issue.
Additionally, there can easily be 20-40 sounds playing at once, more if you haven’t optimized yet (which typically happens in the last few months before release). These also need to be preloaded slightly, and once playing stream from disk, so source starvation needs to be handled and the codec needs to not be glitchy if missing some packets.
It’s also happening in a system that needs to be real-time and keep each frame timer on the order of milliseconds, though Moore’s law moving to parallelization has helped a lot. You’d be surprised how under powered the consoles are in this regard (caveat: I haven’t developed on the upcoming gen, which is getting better)
As for loading and caching on demand, that’s limited by memory, given the sheer amount of samples used in games, it’s just not practical. For specific example in a very well known game, there are over 1600 samples for boxes hitting stuff (impact sounds). What I’m building right now is meant to make generative audio easier and reduce the number of samples needed, so more tools to process sound could make this naive caching approach practical