Viewing a single comment thread. View all comments

Antique-Bus-7787 t1_j1upz6y wrote

I don't think the story being coherent is a problem. As you said you just need a bigger LLM that can hold a lot more data in its "cache" to create the overall scenario and then create every scene one after the other.

What I'm more skeptical of are the images and the voices. TTS are good but it's extremely complicated to add the right "emotions" and "ponctuations" to the generated voices for now. Voice conversions are better but you still need a starting voice.

The temporal coherence of videos are the second biggest problem I think.
(Cost to produce that also)
We'll see ! But 2025 seems way too soon for me!

1

coumineol t1_j1urjx2 wrote

I'll give it to you that emotions in TTS are difficult, though I still say what is needed is not a novel algorithm but an enhanced version of today's algorithms. For example the AI that will generate the movie's scenario can also add marks at the text that will indicate the correct emotion or punctuation, and I'm pretty convinced we can have a TTS algorithm that can reasonably abide by those in three years.

1