To my ear, they all sound about okay for 4 seconds, until my brain recognizes that there's no tension being built or story being told. It's like every track is 4 seconds of music followed by 4 seconds of music followed by 4 seconds of music rather than a track with a real sense of progression.
Many have said in this thread already that maybe we ought to expect that a ml approach in the next few months/years could be much better. I'm not so confident that it will happen so soon. Audio might end up being a much harder problem than visuals, for a variety of different reasons. Having the time domain built into the medium requires some concept of memory, and even modern neural nets seem to struggle remembering what they said before the most recent prompt.
Once again though, its not impossible. Just requires the right techniques and enough people focused on it.
The thing is even if you can make a machine reproduce it, it's missing the human component, and the fact that you (I) know it's not human made already degrades the experience.
What AI gives you is a mash up, a mix of people's intent, a mix of people's feelings. What I want is the result of a singular person expressing his singularity though his work, I don't want the "average of the best" music or the "average of the best picture". This is good for content creation, when you need to pump out the maximum amount of "content" for people to "consume" (see marvel, netflix&co), but not for art
Art that leave a mark is always weird/quirky/personal/deep/&c. the fact that a machine can replicate the result removes the most interesting part of the equation, the human part. It's like making your own bread vs buying supermarket bread, the later is cheaper and faster, it might even taste better if you fucked it up, but it's a complete different experience
Not sure why this is downvoted, I think it’s exactly right. A huge part of what makes music feel meaningful is the parasocial relationship with the artist, and the cultural context the music captures and expresses.
That's... such a different way of relating to music!
Some of the most meaningful experiences I've had with music involved DJs whose names I didn't know, playing tracks produced by musicians whose names I didn't know and had no way to discover.
Didn’t say it’s the only way! But I think you would agree that most popular music is made by artists who have a very prominent public persona, expressed in different ways depending on the genre and subculture the music appeals to. As a fan you’re not just listening to the music as it is, but interpreted in terms of your thoughts and feelings about the artist. That context can make the music feel more meaningful.
Several years ago I implemented some very basic rules outlining some distance metrics between two chords and ran some multiobjective evolutionary algorithms to generate, say, a 16 bar progression while trying to minimize these distances between any two subsequent jumps. I added a couple or three other objective functions for judging the progressions by my idea of structure (i.e., starting and ending on the same chord), and found the results to be very promising.
With enough of a sophisticated rule system (which could be built from existing music) an AI should be able to optimize for tension building or storytelling quite easily. Of course it will only optimize for the definition of tension building or storytelling that it understands, via statistical methods or being told by the programmer explicitly. In the latter case the programmer is just doing one of the things composers always do, while in the former, the generated content is interesting - or not - in much the same way (IMHO) as transformer-based language generators like GPT-3.
Unfortunately I don't have the full working codebase anymore. Although I think I have enough I could recreate it, it was pretty rudimentary and I've always intended to revisit the idea one day and flesh it out as at least a blog post or something (at which point I will submit it here). I only have a couple of the progressions but they don't mean much on thier own since I hand-picked them out of the result set based on personal taste rather than any formal algorithm.
Most real tracks don't have tension buildup or progression. That you judge music based on it mostly just speaks of your preferences. As far as I heard the tracks were coherent and not just 4s snippets glued together. Having said that, I don't think they were exceptional or anything.
Besides ambient music, what music doesn't have any sort of buildup of tension and subsequent release? Do you have any particular example tracks that doesn't have any tension/release at all?
I feel like most of the music people commonly listen to have that, it's an essential part of what makes music feel "human".
I think maybe you two aren't on the same page. I know you are referring specifically to tension and release as a music theory concept, which is for sure very common. Even a single "tense" chord in a chorus resolving to the tonic is tension and release. I think the other person is speaking of "tension" in the context of progression, like a song's building up to a crescendo would be considered tension, or a dubstep drop, or a metal breakdown, etc. (those are also tension and release in music theory, but I think the person is speaking broadly in laymens terms)
Bruce makes the point that early "AI" image generation was pretty shit, but a couple of specific developments (especially diffusion modelling) changed all that remarkably quickly; the corollary is that we might expect that for music too.
"This is a business I know well. Jukedeck is an example of how founders and investors do not conduct appropriate market research. There is a limited market demand for low-cost royalty-free music for videos. One could argue there is an oversupply* of royalty-free music relative to buyers. The quality is not good enough to disrupt the billion dollar Production music industry that is top heavy, a relative small amount of creators at the top get the majority of the money, the rest compete for the little that is left. Jukedeck has raised enough money ($3million #) to be around for a few years if they control their burn rate. But Jukedeck in its current form, is just another music startup destined for the Deadpool."
I feel the same way about the IEEE article about immersive 3d audio - where’s the market?! It’s not the future of music - people crave live or festival / social atmosphere, not getting into an alien pod to jam out Lizzo.
Well jukedeck has been bought by TikTok, so their product was indeed valuable for someone. I guess the point with such AI systems is not to replace existing creators, but rather fill in new market needs
Not sure that article is making the point you have in mind:
“people let playlists run on in the background when they serve inoffensive, bland music they can’t be bothered to turn off. If you look at the music Spotify has broken, it’s all chill-out stuff. That isn’t art, it’s wallpaper.”
As I said, we will see in 2 years. I am confident that your prediction (which was not "widespread music generation" but quite specific - in a platform in which the majority of streaming is generated by a handful of artists) will be spectacularly wrong.
I'm sure the names of the "handful of artists" will still be attached to the music, it's just the music will be AI generated. Most of those artists don't compose their hits anyway.
Yes, I expected goal-post moving in line of "even if a single beat is AI-generated, the entire song counts for AI-generated".
Or, the more likely one, in this case, "the first draft of the beat was AI-generated, then human-edited and the vocals were by the singer but they were processed, so it's AI-generated". Typical singularitarian bullshit.
No goal-post moving here, I predict that the entire song, every sound in it, including vocals, will be AI generated. Optionally with the name of your favorite human artist attached to it. Whether people will know (or care) that it's fully AI generated I don't know, it depends on how Spotify decides to promote such songs.
I initially thought your prediction was "AI-generated, with mostly human vocals and some editing" - which has no way of occurring in 2 years even for new songs, let alone if you include the legacy catalog.
This insane interpretation will not only not occur in 2 years, it will not occur this decade. Singularitarian delusion runs higher than even I imagined.
First, your prediction is analogous to something like "in 2 years, 50% of images used in {ad platform}/getty, will be AI-generated". Your prediction is actually "AI-generated without human editing". This is not true today and will not be true in 2 years from now even for images.
I am not a musician, have no musical (or any other art, actually) talent and have kept up (shallowly) with state-of-the-art music generation for a decade (because of idiocy kurzweil has spewed out).
I am "reacting strongly" because I detest singularitarians and everything they stand for. This board in particular has been a hotbed of delusion and idiocy for the last 6 months in every single AI thread I have read comments in.
I don't know about half, but if Spotify don't have a large AI team they're missing the boat. They have a perfect opportunity to segue their audience into cheap content they create themselves.
i am sorry but we (i make music in my spare time) need to worry.
to echo what the comment above is saying. not because this AI generated music is good. but because it is good enough for the vast majority! and this concerns other fields as well. the consumption economy has set the bar for music and entertainment in general very low. most people don't know (or think) that the bar is low. because they have not experienced anything better and probably never will.
The thing is, and this applies to any other human made vs computer made thing: there will always be people who cannot or don't care to discern between quality and "whatever" (aka "good enough", but where "good" is a very low bar).
The people who would be satisfied with AI generated music are people who would not have paid for human made music in the first place.
You as a good musician will never lose a sale to an AI music source, because that buyer wouldn't have bought your music in the first place.
Now if you make your living providing easy listening background music, then you will eventually be replaced.
Not even that, I think. Even background music has more purpose than AI generated one. But you are right ... some people may have a low threshold with being satisfied and hear this kind of music.
Look at Max Martin - his songs are massive hits because they are highly engineered and use a proved Swedish recipe / formula / composition techniques (ex: balanced lines) that on paper are actually quite simple. On paper.
If it were as simple as it looks, everybody would be a Max Martin or somebody would train an AI on his canon and maybe actually have something decent. Music is based in emotion, and the bits and bytes aren’t close to simulating anything close other than probably raw schizophrenia.
The current state of AI music? There’s no amount of money you could pay me to listen to it for an hour a day for a week. No way. It’s that bad.
Counterpoint: a moderately musically inclined person with Propellerhead’s Figure can come up with more pleasing and coherent music on their third or fourth try that would put any of these to shame. It’s just reality at present.
My wife is an illustrator, and she got really depressed when I showed her the image AI tools available today. Again, none of this stuff would replace her unique work, but it devalues the entire profession.
This. All the previous comments made really good statements, but, in the end, it's the value we give to art that might suffer from AI generated content, and the people who produce "real" art.
I'm sure textile weavers a century ago said the same thing, and the world replied that if a machine can do what you do even adequately, your profession was always overvalued.
In the end, it's an iron law of post-industrial capitalism that the market value of human effort eventually approaches zero.
Midjourney improved so much over the summer, it was staggering. Arms no longer grow and point wherever the wind blows them. Tables are covered in identifiable objects instead of nonsense shapes.
The gap between 'okay' and 'really good' is smaller than we think
No really ... it sounds shitty. It's the same with images ... if a human would compose it a "story" or "intention" is behind the product. AI can't provide that because it's just a A without real I .
Image generation models didn't improve. Someone figured out how to use them to do something new.
Anyway models have been improving for a long time but when was the last time you saw a news item about the progress in machine translation? That was probably around 2016-17 when the hype cycle for neural machine translation was at its apex. You don't hear anything more about it in the news because the hype cycle has now dipped to its bottom-most pit of indifference and the hype about being able to speak to a machine and communicate with any human on the planet in their native language has been replaced with, for example, the hype about being able to generate human-like art.
My point being, just because performance increased in the past in one particular task, it doesn't mean that it will improve in all other tasks, or, indeed, that it will keep improving in the same task. For all we know what we've seen so far of image generation is all we'll get for the next ten years or so.
I don't have a crystal ball, but I do have history books, so I don't make predictions. We can't even speak about the past with confidence, let alone the future.
These things make me sad. Not that the results are anywhere close to good enough to replace a human artist yet, but eventually it'll probably be at a level that's good enough for most people, and that'll be the day when music will be truly disposable. I imagine an endless stream of music equivalent to the average Netflix-produced movie. Perfect for people who want music to play all the time without actively listening to any of it.
Music is disposable ... just today. Just take a listen at the most songs in charts ... senseless crap.
At the end always better like this AI crap at least to my ear.
Don't worry because AI would not replace a big part of music if ever.
Maybe AI would be used for generating catchy hook lines but that's all. The end selection of what's catchy and what's boring will always do a human.
The music in the top of the charts is definitely not appealing to me but it's not all senseless crap. The lyrics and emotions can be closer to a Marvel movie but there's a pretty high level of production in the bigger artists. It does nothing at all for me personally but when I hear something like a Dua Lipa song you realize there's real effort and skill going into it.
Now we just need something similar to https://github.com/nogasm with a different kind of "implant" that monitors my dopamine levels and optimizes based on that.
By the Transitive Property of the Monkey's Paw, I think you'd be in an endless loop of punching yourself in the face as the quickest and easiest way to release it.
I guess I have no taste. I had a lot of fun playing with the "live" music generator. Some of the settings produced interesting, catchy riffs.
As a musician and a listener, I vastly prefer real instruments played by real musicians, preferably acoustic. So I wouldn't actually listen to this stuff, but then I don't generally listen to any kind of electronic music. But this AI thing generates the same kind of shallow, bland, sometimes catchy stuff I hear all the time that's supposedly made by "creators."
Surprisingly good. Not Good good but I expected a lot worse.
I think there's great potential for that kind of music in video games, if they can procedurally generate it on the fly based on the current game state (think e.g. roguelikes).
I assume this is using a lot of soft synths and samples? Whatever they are, they sound really cheap. I hope I don't back myself into some Luddite corner here and this later turns out to be really impressive, like we are seeing in the image generation space. But given that we are living in a market of overabundance of stock photos, royalty free music, etc., I am sure that this is not endangering anything anytime soon.
And while comparing it to image generation, with Stable Diffusion and other models, a human has to be in the loop in order to generate the prompts, so we can't entirely replace them here either. How about an AI music generator that creates phrases, rhythms or sounds/VST presets based on a prompt for me?
If an average person can choose between listening to this, or listening to a musician that they have a personal connection to on their favorite streaming service, I wonder which one will be picked most of the time?
Fawed but interesting. Experimental page should have nudge/spin again option on each section; sometimes the results are musically very good and an interesting segue from the previous section, sometimes they're just kinda ass.
Also I think it's a mistake to just look at music (even electronic dance music) as a series of transformations that journey from A to B. A lot out-there music relies on fairly safe tonalities and very straightforward musical structures, to provide an anchor for wild timbral experimentation. Listeners can enjoy the departure from conventional sonic reality in somewhat the same way as a theme park roller coaster balances existential terror and reliable predictability.
Wow, this is terrible. Why would you release this in this state?
Edit: Wait, the Streams have had human involvement!?
>streams are ”harvested” by human operators: after choosing some initial settings, the operator lets the engine run and collects the score it composes.
> Without changing this score, the operator then produces a sound file for it, by choosing the electronic instruments that play the score, making a simple mix and recording the result, using standard music production software.
Imagine this being the curated best they could do, even after explicitly choosing instruments after the fact to best fit and then mixing it.
I highly highly recommend listening to the track called "Test_energylevel", it is absolutely bonkers, and more interesting than any of the other tracks I clicked on here. (You have to click on "Explore" and then scroll down a bit to find it)
It starts with a choir of ambient vocals singing "it's a sunshine", there's some bird noises and traffic sounds, a snippet of organ synth flourishes.
Then it all gets started - guitar, sitar, horns, strings a vocal duet singing "Shine your sweet loving down on me"
There's actually a ton going on, every few bars it changes things up, there's clever little harpsichord.
Then a male singer starts proclaiming "It's a sunshine daaaayyy", backed up by a chorus of "yeah, yeah, yeah"s. Honestly it's kind of catchy
The last 30 seconds or so are truly cursed, there's a voice in the right speaker moaning "wide eyed retina. mostly logical", which gets delayed, bitcrushed, and pingponged between the speakers.
Wow! I wonder why this specific track has so much going on compared to the others.
Wow, yeah it seems entirely plagiarized. It's still a very strange song, but I don't think the AI did anything noteworthy here. Thanks for looking into it!
It's going to be an interesting lesson for humankind to gradually come to realization that anything worth reading, worth watching or worth listening to cannot be born from algorithm. Any science that is based on averaging a gazillion things can only produce things that are average at best.
I wonder how people will react when this does become "good enough".
I'm thinking of those that consider themselves prompt artists or prompt engineers, and consider DALL-E/SD to be merely tools that creatives use to create their art, just like photoshop, and how dare you insinuate that the work isn't my original creation...
If I type in "Symphony, Beethoven, dramatic strings, romantic theme turning dark then triumphant, tenor flute solo in third movement" and such a thing is produced... am I a musician? A classical composer perhaps?
I'm not trying to say that there is no act of creation on the part of the user of such systems, but I do think it's an interesting area of discussion, because to me there is some sort of qualitative difference here.
If folks remember Ballblazer by Lucasfilm Games, Russ Lieblich wrote a music player that would string together riffs like weird chiptune jazz solos. The walking bass was the best part. It was constrained by the template but could come up with some surprising harmonies.
This is a wonderfully concise response and I plan to use it verbatim in the future to evidence my sincere and perhaps bottomless disdain for something. Bravo phrasing.
Interesting and listenable in a “don’t mind this, but forgettable” way. It’s currently okay as disposable background music.
But it seems very far from generating music that I’d actually connect with. Years, at least, away from creating a solid song, much less something that talented songwriters should worry about.
It's years, at least, away from creating anything that I'd connect with. But it's only a couple years away from filling all the lobbies and elevators of the world.
I think this sort of music is really good for video games or repetitive loopy music like in the earlier video games. I even love this during programming sessions as there is a certain quality to such tracks that help you gain more focus quickly.
The good thing about this hitting the music industry is that lawsuits will follow very soon and the stealing of intellectual labor and property all those models are based on can probably end a bit sooner.
This could have been better targeted for musicians as a VST tool or some other DAW extension. I'd be curious what it could do with my samples and recorded loops.
You have to start somewhere, and this probably has a place.
Still, having become accustomed to live acoustic music, I find this generated stuff not engaging. Maybe after a few more generations it will get there.
Many of the timbres the AI selected were not very rich in replicating natural instruments.
Also, almost every single track in every song is continuously overloaded with randomness, or perhaps I should call it juvenile improvisation attempts.
Together, these make for a frenetic chiptunes vibe, as I hear it.
Many have said in this thread already that maybe we ought to expect that a ml approach in the next few months/years could be much better. I'm not so confident that it will happen so soon. Audio might end up being a much harder problem than visuals, for a variety of different reasons. Having the time domain built into the medium requires some concept of memory, and even modern neural nets seem to struggle remembering what they said before the most recent prompt.
Once again though, its not impossible. Just requires the right techniques and enough people focused on it.