visarga t1_itgu5bi wrote on October 23, 2022 at 3:32 PM

Reply to comment by ChronoPsyche in Given the exponential rate of improvement to prompt based image/video generation, in how many years do you think we'll see entire movies generated from a prompt? by yea_okay_dude

Not exponential, let's not exaggerate. It's quadratic. If you have a sequence of N words, then you can have NxN pairwise interactions. This blows up pretty fast, at 512 words -> 262K interactions, at 4000 words -> 16M interactions. See why it can't fit more than 4000 tokens? It's that pesky O( N^2 ) complexity.

There is a benchmark called "Long Rage Arena" where you can check to see the state of the art in solving the "memory problem".

https://paperswithcode.com/sota/long-range-modeling-on-lra

ChronoPsyche t1_itgunqx wrote on October 23, 2022 at 3:36 PM

Exactly what I am referring to. My bad, quadratic is what I meant.