Viewing a single comment thread. View all comments

suflaj t1_j7c73af wrote

Make no mistake - there is no TTS more humanlike than Azure ATM, but the exact voice was likely fiddled around with a bit to get the exact pronunciation, or ran through a filter.

2 days ago I was comparing all the state-of-the-art TTS', and while Google's Neural2 came close to the video, it does not feature similar voices to the one in the video.

1

candidhorse4 OP t1_j7cg5xb wrote

have you tried murf.ai and wellsaid labs?

1

suflaj t1_j7ckp0d wrote

Yes. Although impressive in the number of languages and voices, it does not match Azure's more expressive prosody. I have listened to far too many robocalls, so that kind of magic is gone for me.

Someone else might consider it more humanlike, as it's all subjective. Have they published benchmark scores yet?

1

candidhorse4 OP t1_j7cnaci wrote

i dont think they have, so what do you think then as a whole, which one is the best in replicating the human voice with all its nuances?

1

suflaj t1_j7cpf1d wrote

Azure

This is due to 2 issues both of these have and Azure mitigates to an extent:

  • they both lack humanity, i.e. they can at most be convincing as human prompt readers, but not anything else
  • those without a better ear and headphones probably do not notice a certain ring those two have, which a human voice cannot replicate - it might be that this effect is added to make the voices sharper, but ultimately it will make people like me, as well as robovoice detectors be able to more easily distinguish them as TTS
1