Hacker News new | past | comments | ask | show | jobs | submit login

I did! I tried creating a multi-speaker embedding model for practical concerns: saving on memory costs. I'm going to have to add additional layers, because it didn't fit individual speakers very well. I wish I'd saved audio results to share. I might be able to publish my findings if I look around for the model files.

I think you're right in that if we can get such a model to work, training new embeddings won't require much data.




Hmm. Would a multi-speaker model be able to interpolate between voices (eg. halfway between Morgan Freeman and James Earl Jones)?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: