Comments

You must log in or register to comment.

who_ate_my_motorbike t1_j7akxkl wrote

I don't know what voice is being used but it almost looks entirely algorithmically generated content. Sometimes it doesn't quite understand the video segment so it gets it wrong.

1

suflaj t1_j7bmih2 wrote

This sounds like Azure TTS, specifically English US Eric

1

suflaj t1_j7c73af wrote

Make no mistake - there is no TTS more humanlike than Azure ATM, but the exact voice was likely fiddled around with a bit to get the exact pronunciation, or ran through a filter.

2 days ago I was comparing all the state-of-the-art TTS', and while Google's Neural2 came close to the video, it does not feature similar voices to the one in the video.

1

suflaj t1_j7ckp0d wrote

Yes. Although impressive in the number of languages and voices, it does not match Azure's more expressive prosody. I have listened to far too many robocalls, so that kind of magic is gone for me.

Someone else might consider it more humanlike, as it's all subjective. Have they published benchmark scores yet?

1

suflaj t1_j7cpf1d wrote

Azure

This is due to 2 issues both of these have and Azure mitigates to an extent:

  • they both lack humanity, i.e. they can at most be convincing as human prompt readers, but not anything else
  • those without a better ear and headphones probably do not notice a certain ring those two have, which a human voice cannot replicate - it might be that this effect is added to make the voices sharper, but ultimately it will make people like me, as well as robovoice detectors be able to more easily distinguish them as TTS
1