Submitted by gruevy t3_10yzq25 in MachineLearning

I've got a 4090 and some stuff that I think it would be fun to have narrated. I've looked at some of the paid online options and $20-$30/mo for 2 hours of AI TTS is not gonna gut it. Can anyone point me to software that I can run locally that'll give me high quality?

It seems like if people are making billions of waifus in stable diffusion there ought to be something like this out there.

16

Comments

You must log in or register to comment.

Royal-Landscape9353 t1_j80oma3 wrote

Try TortoiseTTS on the highest quality setting

8

gruevy OP t1_j81gynl wrote

This one looks like what I'm looking for. Slow AF but I give it a book chapter and it gives me an audio narration. Seems pretty powerful if you have a lot of patience

3

ellemoe-is-elleva t1_j80pr75 wrote

Pyttsx, mbrola, mimic 3. I like the mimic 3. Which is lightweight. And can run on docker or just native.

I started out with mycroft which has mimic 3 build in. But you can run it just stand alone as well and quite easy to set up. https://mycroft.ai/mimic-3/

If you want to go down the rabbithole of speech synthesis and analsys check out praat praat.org it's a quiet impressive piece of software.

6

ZBMakesSongs t1_j81dm5r wrote

If you want ML TTS, there are a lot of open source models out there, problem is most of them are trained on the same data, so your going to get similar voice options for the most part. You can definitely train your own text to speech, and pretty easily as well, but I'm assuming you don't want to go that route. Maybe try starting with Coqui TTS, but for reading long documents it definitely has its fair share of issues.

2

gruevy OP t1_j81hds3 wrote

I'll check it out. Looks interesting, but not as good as TortoiseTTS, judging by the samples. Definitely worth looking at tho, thx

1

Remarkable_Ad9528 t1_j8bxx1t wrote

I've used React-Speech before in a project to test mental-math arithmetic. For example my project would show a card with an addition/subtraction or multiplication/division problem, and the user's job was to speak the answer outloud. Using this library I was able to capture the user's answer as text and could check whether or not they got it correct. Would something like this work for whatever you're trying to do?

2

gruevy OP t1_j8c2o6f wrote

Probably not, I want it to read long form text such as fiction. Tortoise TTS worked out pretty well but holy crap is it slow

1

[deleted] t1_j80jvp3 wrote

Speech WebKit

1

gruevy OP t1_j80kuq5 wrote

google search didn't get me much, can you be more specific?

1

gruevy OP t1_j80mwkq wrote

Yeah that's speech to text. I want text to speech. Thanks tho

1

[deleted] t1_j80o2s9 wrote

It does both buddy

1

gruevy OP t1_j80oc7f wrote

ah my bad then, must have misread. i'll take another look

1