Viewing a single comment thread. View all comments

ishizako t1_itobyvg wrote

Singing is just speech that's modulated by the vocal cords to harmonize.

When we speak normally all that happens is also just notes sounding out in sequences that we are familiar with and have memorized those sequences as words. With singing the notes are just chosen in such an order that they sound good with each other.

Sampling singing voice is not different than sampling speaking voice. Although with current ai models the same person's singing voice and speaking voice would need to be trained on separately. As the common database of both spoken and sung voice would confuse the ai in terms of how to "pronounce" things. As it cannot naturally tell a difference between sung and spoken word.

9

Additional-Cap-7110 t1_its5who wrote

Singing is going to be much harder.There’s so much variation. Plus words just requires it to sound natural, singing requires much more of a performance and we have all kinds of other aspects. Like singing softly, loudly, vibrato, portamento, rhythm, not it mention notes themselves,

This might make it clear. We can do synthesized percussion much better than we can do synthesized tonal instruments like violins, flutes etc. sampling percussion has always been the easiest thing to get realistic and 100% synthesized instruments are no different.

If you want to sample percussion all you really need aside from recording quality is sampling multiple repetitions and a shit-ton of dynamic layers. The best percussion sample libraries today will have like maybe 10-20 dynamic layers and 5+ to 10+ repetition samples sometimes. You don’t even need that many to make it sound convincing. But with instruments like vocals, violins flutes etc that’s not scratching the surface. These are complex on much higher dimensions and you need completely different techniques to capture them, and even then it’s still not quite right or it’s highly limited in it’s use

2