Submitted by gruevy t3_10yzq25 in MachineLearning

I've got a 4090 and some stuff that I think it would be fun to have narrated. I've looked at some of the paid online options and $20-$30/mo for 2 hours of AI TTS is not gonna gut it. Can anyone point me to software that I can run locally that'll give me high quality?

It seems like if people are making billions of waifus in stable diffusion there ought to be something like this out there.

16

Comments

You must log in or register to comment.

ellemoe-is-elleva t1_j80pr75 wrote

Pyttsx, mbrola, mimic 3. I like the mimic 3. Which is lightweight. And can run on docker or just native.

I started out with mycroft which has mimic 3 build in. But you can run it just stand alone as well and quite easy to set up. https://mycroft.ai/mimic-3/

If you want to go down the rabbithole of speech synthesis and analsys check out praat praat.org it's a quiet impressive piece of software.

6

ZBMakesSongs t1_j81dm5r wrote

If you want ML TTS, there are a lot of open source models out there, problem is most of them are trained on the same data, so your going to get similar voice options for the most part. You can definitely train your own text to speech, and pretty easily as well, but I'm assuming you don't want to go that route. Maybe try starting with Coqui TTS, but for reading long documents it definitely has its fair share of issues.

2

Remarkable_Ad9528 t1_j8bxx1t wrote

I've used React-Speech before in a project to test mental-math arithmetic. For example my project would show a card with an addition/subtraction or multiplication/division problem, and the user's job was to speak the answer outloud. Using this library I was able to capture the user's answer as text and could check whether or not they got it correct. Would something like this work for whatever you're trying to do?

2