Viewing a single comment thread. View all comments

HjalmarLucius t1_iqtl4yv wrote

Attention introduces multiplicative relationships, i.e. x*y whereas ordinary operations only have additive relationships.

9

graphicteadatasci t1_iqv0m4y wrote

This the one. A DNN may be a universal function approximator but only if data and n_parameters is infinite. When we have infinite data we can learn y as parameters and when we multiply the parameters with x we get x*y. But we don't have infinite data / infinite parameters and even if we did we don't have a stable method for training infinitely. So we need other stuff.

3

timonix t1_iqw49rz wrote

I saw a similar architecture were the outputs from one network was the filter coefficients of a second CNN.

1