Viewing a single comment thread. View all comments

visarga t1_itgu5bi wrote

Not exponential, let's not exaggerate. It's quadratic. If you have a sequence of N words, then you can have NxN pairwise interactions. This blows up pretty fast, at 512 words -> 262K interactions, at 4000 words -> 16M interactions. See why it can't fit more than 4000 tokens? It's that pesky O( N^2 ) complexity.

There is a benchmark called "Long Rage Arena" where you can check to see the state of the art in solving the "memory problem".

https://paperswithcode.com/sota/long-range-modeling-on-lra

1

ChronoPsyche t1_itgunqx wrote

Exactly what I am referring to. My bad, quadratic is what I meant.

1