I agree that they got pretty good but there’s still something that they get wrong, their intonation is a kind of passable average. If you want to be able to distinguish them from actual human speech pay close attention to intonation/inflection. They’re still very usable, Im not claiming otherwise
I don’t think they’ll ever get undistinguishable because humans have variability. Too much consistency and it starts having an artificial smell. Look at ChatGPT for example, you could read a perfectly writen answer and yet kind of sense it was written by ChatGPT..