I've got a 4090 and some stuff that I think it would be fun to have narrated. I've looked at some of the paid online options and $20-$30/mo for 2 hours of AI TTS is not gonna gut it. Can anyone point me to software that I can run locally that'll give me high quality?

It seems like if people are making billions of waifus in stable diffusion there ought to be something like this out there.

Comments

[deleted] t1_j80jvp3 wrote on February 10, 2023 at 7:03 PM

#1,773,125

Speech WebKit

gruevy OP t1_j80kuq5 wrote on February 10, 2023 at 7:10 PM

#1,773,186

Replying to [deleted] (#1,773,125)

google search didn't get me much, can you be more specific?

gruevy OP t1_j80mwkq wrote on February 10, 2023 at 7:23 PM

#1,773,294

Replying to [deleted] (#1,773,266)

Yeah that's speech to text. I want text to speech. Thanks tho

[deleted] t1_j80o2s9 wrote on February 10, 2023 at 7:31 PM

#1,773,349

Replying to gruevy (#1,773,294)

It does both buddy

gruevy OP t1_j80oc7f wrote on February 10, 2023 at 7:32 PM

#1,773,370

Replying to [deleted] (#1,773,349)

ah my bad then, must have misread. i'll take another look

Royal-Landscape9353 t1_j80oma3 wrote on February 10, 2023 at 7:34 PM

#1,773,396

Try TortoiseTTS on the highest quality setting

ellemoe-is-elleva t1_j80pr75 wrote on February 10, 2023 at 7:42 PM

#1,773,462

Pyttsx, mbrola, mimic 3. I like the mimic 3. Which is lightweight. And can run on docker or just native.

I started out with mycroft which has mimic 3 build in. But you can run it just stand alone as well and quite easy to set up. https://mycroft.ai/mimic-3/

If you want to go down the rabbithole of speech synthesis and analsys check out praat praat.org it's a quiet impressive piece of software.

ZBMakesSongs t1_j81dm5r wrote on February 10, 2023 at 10:19 PM

#1,774,609

If you want ML TTS, there are a lot of open source models out there, problem is most of them are trained on the same data, so your going to get similar voice options for the most part. You can definitely train your own text to speech, and pretty easily as well, but I'm assuming you don't want to go that route. Maybe try starting with Coqui TTS, but for reading long documents it definitely has its fair share of issues.

gruevy OP t1_j81gynl wrote on February 10, 2023 at 10:43 PM

#1,774,791

Replying to Royal-Landscape9353 (#1,773,396)

This one looks like what I'm looking for. Slow AF but I give it a book chapter and it gives me an audio narration. Seems pretty powerful if you have a lot of patience

gruevy OP t1_j81hds3 wrote on February 10, 2023 at 10:46 PM

#1,774,811

Replying to ZBMakesSongs (#1,774,609)

I'll check it out. Looks interesting, but not as good as TortoiseTTS, judging by the samples. Definitely worth looking at tho, thx

Remarkable_Ad9528 t1_j8bxx1t wrote on February 13, 2023 at 4:03 AM

#1,793,830

I've used React-Speech before in a project to test mental-math arithmetic. For example my project would show a card with an addition/subtraction or multiplication/division problem, and the user's job was to speak the answer outloud. Using this library I was able to capture the user's answer as text and could check whether or not they got it correct. Would something like this work for whatever you're trying to do?

gruevy OP t1_j8c2o6f wrote on February 13, 2023 at 4:44 AM

#1,794,172

Replying to Remarkable_Ad9528 (#1,793,830)

Probably not, I want it to read long form text such as fiction. Tortoise TTS worked out pretty well but holy crap is it slow

[D] Locally-runnable text to speech AI?