- reading from ROM is slow, relative to working RAM; the ROM is mapped to the same memory address space as the RAM, but reading a ROM address takes (I think 8?) clock ticks.
- so to do audio playback, the GBA "stripes" a chunk of data into RAM, sets a pointer to the beginning of that stripe, and then lets the audio chip pull data and update that pointer. When the pointer reaches the end of the stripe, an interrupt triggers that is supposed to pull in the next chunk of audio into the stripe (overwriting the current stripe contents) and reset the pointer.
... but if interrupts are disabled, then that doesn't happen, and the simpler logic in the audio chip just keeps incrementing the pointer forever and reading more data. It'll eventually get to the ROM addresses and pull directly from those (it's slow, but fast enough that it doesn't starve the chip; you just wouldn't run your audio this way in the game normally because you'd have 1/8th the amount of time to do everything else every frame of animation if the audio system were reading directly from ROM all the time. Also, reading from RAM lets you edit the samples to do audio effects).
remember that it’s not like the music is just there as raw audio, the raw audio is just the sounds of the different instruments