Viewing a single comment thread. View all comments

arg_max t1_j9rt2ew wrote

The thing is that the theory behind diffusion models is at least 40-50 years old. Forward diffusion is a discretization of a stochastic differential equations that transforms the data distribution into a normal distribution. People figured out that it is possible to reverse this process, so to go from the normal distribution back to the data distribution using another sde In the 1970s. The thing is that this reverse SDE contains the score function, so the gradient of the log density of the data and people just didn't really know how to get that from data. Then some smart guys came along, found the ideas about denoising score matching from the 2000s and did the necessary engineering to make it work with deep nets.

The point I am making is that this problem was theoretically well understood a long time ago, it just took humanity lots of years to actually be able to compute it. But for AGI, we don't have such a recipe. There's not one equation hidden in some old math book that will suddenly get us AGI. Reinforcement learning really is the only approach I could think of but even there I just don't see how we would get there with the algorithms we are currently using.

7

SchmidhuberDidIt OP t1_j9rwh3i wrote

What about current architectures makes you think they won’t continue to improve with scale and multimodality, provided a good way of tokenizing? Is it the context length? What about models like S4/RWKV?

2

Veedrac t1_j9tnubd wrote

Ah, yes, those well-understood equations for aesthetic beauty from the 1970s.

0