Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We're not likely to see one. DALL-E was trained by analyzing pixels along with the captions for images. It is the caption information which is used in understanding your prompts.

But music doesn't have a form that ties the audio portion so cleanly to a textual description. And when it does, those labels tend to be overly simplified and not really helpful. Music is, in fact, hard to describe.

One project that might interest you is Every Noise at Once[0], which does an amazing job of grouping known artists and songs by their sonic similarly, which results in similar style and listener appeal.

[0] https://everynoise.com/



I wrote the article, fun to find it here.

I thought about that quite a bit as I was researching. Image Generation Models were given an image and a caption. This allows CLIPT to work.

But then I thought, music actually has an official notation system of the Standard notation on 5-line musical staves.

Do you think this described the music? or merely informs how it is to be performed?


It describes how a person might play it, that's all. It won't help anyone identify a song they've heard or pick one out by genre or style. That information isn't really "encoded" in sheet music.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: