Submitted by HelloGoodbyeFriend t3_ycql5p in singularity

I’ve been hoarding acapella tracks from my favorite artists so I can hopefully use them to train and AI to recreate their voice with custom lyrics & melodies in the future. Curious if anyone is aware if there’s any emerging AI companies working on this? I’ve heard some of the rapper ones and I feel like Joe Rogan has been solved at this point lol. So I’d imagine vocalists are next up.

52

Comments

You must log in or register to comment.

enkae7317 t1_itnrpt2 wrote

We can do it now, albeit very poorly. I'd imagine in 5 years we will have close to perfect mimicry of anybody's voice given enough sampling.

10

Sashinii t1_itns1pv wrote

2024, which is also my answer for most synthetic media predictions, as that's the year I think AI will become so advanced that people will use AI to create their own personalized entertainment.

8

FranciscoJ1618 t1_itnubrt wrote

I think it will be possible in 1 year or less.

RemindMe! 1 year

11

Desperate_Donut8582 t1_itnuo30 wrote

You already can wasn’t there text to sound websites for the past years SpongeBob, trump etc?

8

Sandbar101 t1_itnuyak wrote

Today my guy just not public tech yet

69

Verzingetorix t1_ito0fs6 wrote

The voice or the singing?

Like others said, computer generated speech mimicry has been demonstrated already.

7

Jmsvrg t1_ito3o0i wrote

Descript has a consumer product you can train with less than 1 hr of audio… i use a digital version of both my podcast hosts with this tool (sparingly - single words and short phrases ) the intonation needs work but its pretty solid

5

ihateshadylandlords t1_ito65i7 wrote

I’m thinking a product that’s available for the masses will be available in 10 years. I wouldn’t be surprised if a proof-of-concept product hasn’t already been created.

6

Sonic_TertuL t1_ito6611 wrote

Check out the composer Holly Herndon and Holly+.

5

ishizako t1_itobyvg wrote

Singing is just speech that's modulated by the vocal cords to harmonize.

When we speak normally all that happens is also just notes sounding out in sequences that we are familiar with and have memorized those sequences as words. With singing the notes are just chosen in such an order that they sound good with each other.

Sampling singing voice is not different than sampling speaking voice. Although with current ai models the same person's singing voice and speaking voice would need to be trained on separately. As the common database of both spoken and sung voice would confuse the ai in terms of how to "pronounce" things. As it cannot naturally tell a difference between sung and spoken word.

9

Primus_Pilus1 t1_itomevr wrote

A few years from now music composers will be able to make completely virtual works of music using anyone's voices (it's just a particular timbre of the human throat instrument) and instruments to create virtual studio performances.

3

clusterstage t1_itp1aad wrote

Well, I came across respeecher. You should really try it out.

2

DeviMon1 t1_itpbtm9 wrote

Damn thats pretty good. I've heard a few decent Juice WRLD ones too.

I think it'll take someone to make a convincing MJ cover of a popular new pop song to make this thing blow up and get everyone talking about it.

1

swiggidyswooner t1_itpehah wrote

Disney’s doing to Darth Vader’s voice so not far off

1

swampshark19 t1_itq90jp wrote

Can a singer claim copyright over the sound of their voice?

2

Additional-Cap-7110 t1_its5who wrote

Singing is going to be much harder.There’s so much variation. Plus words just requires it to sound natural, singing requires much more of a performance and we have all kinds of other aspects. Like singing softly, loudly, vibrato, portamento, rhythm, not it mention notes themselves,

This might make it clear. We can do synthesized percussion much better than we can do synthesized tonal instruments like violins, flutes etc. sampling percussion has always been the easiest thing to get realistic and 100% synthesized instruments are no different.

If you want to sample percussion all you really need aside from recording quality is sampling multiple repetitions and a shit-ton of dynamic layers. The best percussion sample libraries today will have like maybe 10-20 dynamic layers and 5+ to 10+ repetition samples sometimes. You don’t even need that many to make it sound convincing. But with instruments like vocals, violins flutes etc that’s not scratching the surface. These are complex on much higher dimensions and you need completely different techniques to capture them, and even then it’s still not quite right or it’s highly limited in it’s use

2

Lawjarp2 t1_ituow68 wrote

Hope they don't copyright voices. Something that can be easily achieved should never be copyrighted. If we can create artificial singer voices and enjoy creating our own music, it will be a mini musical revolution.

2

sonderlingg t1_iuc4r9e wrote

Just imagine how many people work on it. How they want to be the first to create AGI. And their number quickly increase.

Imagine how right now many new models are being trained on GPUs. Moore's law still works. Hardware becomes better and better.

We've already recreated many brain's algorithms (art, speech, face recognition, driving and many more). All that's left is to teach a machine how to learn by itself.

And by the way, we already can copy singer's voice, read other comments

1

sonderlingg t1_iucdjtg wrote

If AGI is benevolent, it will help with everything.

All things that may increase intelligence, like neurointerfaces, gene editing and maybe unknown drugs, are other ways to singularity. Everything is connected. That's why the progress is exponential. Thought AI way seems the most possible to me

1