Hacker News new | past | comments | ask | show | jobs | submit login

While almost everything sounded really good and well put together, the voice itself has a constant... phaser? robotic vibrato? applied the whole time, which I found surprising. We've got better voice synthesis available already and other instruments don't suffer the same effect. I've not heard that before - does anyone know if that's specific to this model, or a more generic voice issue?



That static is present in most generations, but how bad it is depends on the generation itself (you get 2 variations of the same song every generation). I've had some amazingly crisp sounding generations in vastly different genres and languages, such as Opera tenor, UK rap, reggae, metal, country, Broadway musical, rock n' roll, Japanese, Italian, Swedish, various English accents/dialects, etc... Suno is a technical masterpiece, I understand why some people dislike the idea, but the point stands that we are HERE now and we started with most people not even imagining it possible, and those who did saying it wouldn't be this good.

Like many people have said, this tech will only get better.


Yes, it had this GlaDOS-like timbre.


My thoughts exactly, like I just finished portal.


It’s called a vocoder. It allows for (for instance) a monotone-sung piece of text to follow a set of midi notes, by modulating it using a carrier wave (I think. Please correct any inaccuracies!).

I use it sometimes in FL Studio when creating electronic music (plugin is called Vocodex).

Presumably they take the AI-generated voice and generate midi notes, and apply a vocoder to the voice, following the notes.


I think that's just an artifact, as they can also produce heavy metal scream singing etc. It just mimics something that was in the training data.

My guess is that they train the vocals and the music separately, the training data is trivial to create from any tracks with tools like with https://vocalremover.org/.


You mean it sounds autotuned?


Not quite. I'm not skilled in mixing enough to know the right description for it, sorry. I can hear vibrato-like modulation/beating, but in the vocal part only.


Yeah, surprised at the amount of comments here about how good it sounds. The voice is full of artifacts, making it quite uncomfortable.


It's got elements which are great and elements which fail hard. I can complain about one bit specifically but still recognise the massive improvements in other areas over what we've seen so far.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: