Submitted by HelloGoodbyeFriend t3_ycql5p in singularity

I’ve been hoarding acapella tracks from my favorite artists so I can hopefully use them to train and AI to recreate their voice with custom lyrics & melodies in the future. Curious if anyone is aware if there’s any emerging AI companies working on this? I’ve heard some of the rapper ones and I feel like Joe Rogan has been solved at this point lol. So I’d imagine vocalists are next up.

52

Comments

You must log in or register to comment.

quasi_aesthetic t1_ito8qaq wrote

It's already being done. Here is a Ted talk about it.

19

HelloGoodbyeFriend OP t1_itoo6os wrote

Holy shit. Thank you!!

5

quasi_aesthetic t1_itpws1p wrote

I just happened to hear her on TED radio hour a few months back. I'm still amazed they could change his voice in real time!

3

modestLife1 t1_itqrqxu wrote

i saw holly herndon at a show in austin in 2015 and shook her hand afterwards lol, i was shy. i tried to get through the ted talk but it was too unnerving. the singularity is coming.

3

FranciscoJ1618 t1_itnubrt wrote

I think it will be possible in 1 year or less.

RemindMe! 1 year

11

enkae7317 t1_itnrpt2 wrote

We can do it now, albeit very poorly. I'd imagine in 5 years we will have close to perfect mimicry of anybody's voice given enough sampling.

10

sonderlingg t1_itnzuno wrote

You highly underestimate pace of progress

16

styxboa t1_iub3y00 wrote

Can you explain this to me? Persuade me that it'll happen in less than 5 years. It seems insane to me, but I'm not well educated enough on it.

2

sonderlingg t1_iuc4r9e wrote

Just imagine how many people work on it. How they want to be the first to create AGI. And their number quickly increase.

Imagine how right now many new models are being trained on GPUs. Moore's law still works. Hardware becomes better and better.

We've already recreated many brain's algorithms (art, speech, face recognition, driving and many more). All that's left is to teach a machine how to learn by itself.

And by the way, we already can copy singer's voice, read other comments

1

styxboa t1_iuccxvj wrote

That makes sense. Thanks.

Do you think it'll rapidly help with things like CRISPR gene editing as well?

1

sonderlingg t1_iucdjtg wrote

If AGI is benevolent, it will help with everything.

All things that may increase intelligence, like neurointerfaces, gene editing and maybe unknown drugs, are other ways to singularity. Everything is connected. That's why the progress is exponential. Thought AI way seems the most possible to me

1

Sashinii t1_itns1pv wrote

2024, which is also my answer for most synthetic media predictions, as that's the year I think AI will become so advanced that people will use AI to create their own personalized entertainment.

8

Desperate_Donut8582 t1_itnuo30 wrote

You already can wasn’t there text to sound websites for the past years SpongeBob, trump etc?

8

HelloGoodbyeFriend OP t1_itnwsus wrote

That’s speech though. I’m curious about re-creating a singers voice.

6

ishizako t1_itobyvg wrote

Singing is just speech that's modulated by the vocal cords to harmonize.

When we speak normally all that happens is also just notes sounding out in sequences that we are familiar with and have memorized those sequences as words. With singing the notes are just chosen in such an order that they sound good with each other.

Sampling singing voice is not different than sampling speaking voice. Although with current ai models the same person's singing voice and speaking voice would need to be trained on separately. As the common database of both spoken and sung voice would confuse the ai in terms of how to "pronounce" things. As it cannot naturally tell a difference between sung and spoken word.

9

Additional-Cap-7110 t1_its5who wrote

Singing is going to be much harder.There’s so much variation. Plus words just requires it to sound natural, singing requires much more of a performance and we have all kinds of other aspects. Like singing softly, loudly, vibrato, portamento, rhythm, not it mention notes themselves,

This might make it clear. We can do synthesized percussion much better than we can do synthesized tonal instruments like violins, flutes etc. sampling percussion has always been the easiest thing to get realistic and 100% synthesized instruments are no different.

If you want to sample percussion all you really need aside from recording quality is sampling multiple repetitions and a shit-ton of dynamic layers. The best percussion sample libraries today will have like maybe 10-20 dynamic layers and 5+ to 10+ repetition samples sometimes. You don’t even need that many to make it sound convincing. But with instruments like vocals, violins flutes etc that’s not scratching the surface. These are complex on much higher dimensions and you need completely different techniques to capture them, and even then it’s still not quite right or it’s highly limited in it’s use

2

Verzingetorix t1_ito0fs6 wrote

The voice or the singing?

Like others said, computer generated speech mimicry has been demonstrated already.

7

ihateshadylandlords t1_ito65i7 wrote

I’m thinking a product that’s available for the masses will be available in 10 years. I wouldn’t be surprised if a proof-of-concept product hasn’t already been created.

6

techhouseliving t1_itq3647 wrote

10 years.. No way. Competition is a powerful driver. If it's not already available in beta it'll be available by a dozen vendors in 6 months to a year max

4

challengethegods t1_iu15ui7 wrote

That sounds a lot like: "yea, dalle-mini is neat but it'll be at least 20 years until it can make anything close to what a real artist can do"

1

Jmsvrg t1_ito3o0i wrote

Descript has a consumer product you can train with less than 1 hr of audio… i use a digital version of both my podcast hosts with this tool (sparingly - single words and short phrases ) the intonation needs work but its pretty solid

5

Primus_Pilus1 t1_itomevr wrote

A few years from now music composers will be able to make completely virtual works of music using anyone's voices (it's just a particular timbre of the human throat instrument) and instruments to create virtual studio performances.

3

clusterstage t1_itp1aad wrote

Well, I came across respeecher. You should really try it out.

2

-ZeroRelevance- t1_itpa60a wrote

From what I’ve seen, NVIDIA’s Tacotron2 can already be used to create some pretty convincing singing voices, though the examples I’ve seen are mostly rap, so I’m not sure how good they are at more complex singing styles.

2

DeviMon1 t1_itpbtm9 wrote

Damn thats pretty good. I've heard a few decent Juice WRLD ones too.

I think it'll take someone to make a convincing MJ cover of a popular new pop song to make this thing blow up and get everyone talking about it.

1

swampshark19 t1_itq90jp wrote

Can a singer claim copyright over the sound of their voice?

2

Recent-Fish-9233 t1_itr4q1m wrote

Hopefully, and probably yeah Music Industry is very strict with this sort of thing.

1

Lawjarp2 t1_ituow68 wrote

Hope they don't copyright voices. Something that can be easily achieved should never be copyrighted. If we can create artificial singer voices and enjoy creating our own music, it will be a mini musical revolution.

2

challengethegods t1_iu16dgd wrote

I can tell you now the AI overlords are not going to like all this copyright/trademark/patent bullshit that people are so obsessed with.

2

swiggidyswooner t1_itpehah wrote

Disney’s doing to Darth Vader’s voice so not far off

1