ChronoPsyche t1_itg18gz wrote on October 23, 2022 at 11:22 AM

Reply to comment by No_Skin1273 in Given the exponential rate of improvement to prompt based image/video generation, in how many years do you think we'll see entire movies generated from a prompt? by yea_okay_dude

>where is your proof that it can't do more than 2 minutes for make a video

....I read the actual research paper...that's how I know. Only one of them can do minutes. The other two can only do seconds at the moment.

For Imagen Video:

>Imagen Video scales from prior
>
>work of 64-frame 128×128 videos at 24 frames per second to 128 frame 1280×768 high-definition

video at 24 frames per second.

128 frames/24 frames per second is a 5 second video.

For Meta

>Given input text x translated by the prior P into
>
>an image embedding, and a desired frame rate f ps, the decoder Dt generates 16 64 × 64 frames,
>
>which are then interpolated to a higher frame rate by ↑F , and increased in resolution to 256 × 256
>
>by SRt
>
>l
>
>and 768 × 768 by SRh, resulting in a high-spatiotemporal-resolution generated video yˆ.

16 frames which they interpolate between to create a few second video.

And then Phenaki, which can generate the longest at a few minutes.

>Generate temporally coherent and diverse videos conditioned on open domain prompts even
>
>when the prompt is a new composition of concepts (Fig. 3). The videos can be long (minutes)
>
>even though the model is trained on 1.4 seconds videos (at 8 fps).

>Even if compute intensive you could do a film with it.

...You clearly have no clue what you are talking about. I would suggesting doing some reading on the current state of the tech and also read the actual research papers.