Thank you! There has been a lot of work on midi generation in the past and many people have gotten great results using the approach that you describe. The reason why modern music generators like ours create audio files directly is because midi can't represent all of the nuances of acoustic instruments (vocals, violin, trombone, etc). The allure of modern generative AI (diffusion networks and autoregressive models) is that they are finally capable of generating high quality audio which sounds natural.
If you're interested in really exciting work on applying AI to creating synthesizer patches, I recommend you check out synplant2: https://soniccharge.com/synplant2. Their tool can load in any audio and then create a synth patch which sounds nearly identical to the input audio.
That's a solid point about the limitations of MIDI, @kantthpel. Synplant2 sounds like a neat bridge between traditional synthesis and AI's capabilities. Wonder if it could lead to a hybrid approach where AI-generated parameters enhance MIDI compositions, making them sound more natural without fully ditching the efficiency of MIDI. Could be a game-changer for composers looking to blend electronic and acoustic sounds.
If you're interested in really exciting work on applying AI to creating synthesizer patches, I recommend you check out synplant2: https://soniccharge.com/synplant2. Their tool can load in any audio and then create a synth patch which sounds nearly identical to the input audio.