[D] Which is the fastest and lightweight ultra realistic TTS for real-time voice cloning? Submitted by akshaysri0001 t3_10w79eo on February 7, 2023 at 5:15 PM in MachineLearning 11 comments 11
gunshoes t1_j7nj5co wrote on February 8, 2023 at 2:00 AM Fast speech 2 would be your best bet. Permalink 2 nmfisher t1_j7osgdc wrote on February 8, 2023 at 9:41 AM FS2 is fine for training a TTS model from scratch, but I haven't come across a good FS2 model for cloning (which is basically zero-shot TTS). Permalink Parent 1 gunshoes t1_j7p91py wrote on February 8, 2023 at 1:03 PM You can throw GasTs or use a speaker embedding to influence the energy/ pitch outputs. The sound is meh but it works. Permalink Parent 1 nmfisher t1_j7pawou wrote on February 8, 2023 at 1:20 PM That's why I added the qualifier "good" :) Permalink Parent 3
nmfisher t1_j7osgdc wrote on February 8, 2023 at 9:41 AM FS2 is fine for training a TTS model from scratch, but I haven't come across a good FS2 model for cloning (which is basically zero-shot TTS). Permalink Parent 1 gunshoes t1_j7p91py wrote on February 8, 2023 at 1:03 PM You can throw GasTs or use a speaker embedding to influence the energy/ pitch outputs. The sound is meh but it works. Permalink Parent 1 nmfisher t1_j7pawou wrote on February 8, 2023 at 1:20 PM That's why I added the qualifier "good" :) Permalink Parent 3
gunshoes t1_j7p91py wrote on February 8, 2023 at 1:03 PM You can throw GasTs or use a speaker embedding to influence the energy/ pitch outputs. The sound is meh but it works. Permalink Parent 1 nmfisher t1_j7pawou wrote on February 8, 2023 at 1:20 PM That's why I added the qualifier "good" :) Permalink Parent 3
nmfisher t1_j7pawou wrote on February 8, 2023 at 1:20 PM That's why I added the qualifier "good" :) Permalink Parent 3
Viewing a single comment thread. View all comments