Viewing a single comment thread. View all comments

Nameless1995 t1_iqus5e6 wrote

DNN weights are static (same for all inputs). Attention weights are dynamic (input-dependent). In this sense, attention weights are sorts of "fast weights".

5