Viewing a single comment thread. View all comments

coumineol t1_j1tzpht wrote

Take a pen and paper (or open your favorite drawing software). With a top-down approach progressively identify the steps needed for fully automated movie generation, until you have the smallest ingredients. Think about what kind of tech is needed for each of them. You will notice that no paradigm-breaking discovery is needed - all of them are just advanced versions of existing tools and technology. Now extrapolate for when we can actually get to that level, using the recent speed of development in AI. I'm quite sure you won't be thinking it will take decades.

4

banuk_sickness_eater t1_j1ulf90 wrote

Exactly. People throw out "decades" like it's a salient point when it's really just un-thought through, kneejerk, and "safe" prediction.

2

Nintell OP t1_j1upn8h wrote

Never thought about it like that🤔

1

Antique-Bus-7787 t1_j1upz6y wrote

I don't think the story being coherent is a problem. As you said you just need a bigger LLM that can hold a lot more data in its "cache" to create the overall scenario and then create every scene one after the other.

What I'm more skeptical of are the images and the voices. TTS are good but it's extremely complicated to add the right "emotions" and "ponctuations" to the generated voices for now. Voice conversions are better but you still need a starting voice.

The temporal coherence of videos are the second biggest problem I think.
(Cost to produce that also)
We'll see ! But 2025 seems way too soon for me!

1

coumineol t1_j1urjx2 wrote

I'll give it to you that emotions in TTS are difficult, though I still say what is needed is not a novel algorithm but an enhanced version of today's algorithms. For example the AI that will generate the movie's scenario can also add marks at the text that will indicate the correct emotion or punctuation, and I'm pretty convinced we can have a TTS algorithm that can reasonably abide by those in three years.

1