HjalmarLucius t1_iqtl4yv wrote on October 3, 2022 at 12:04 AM Reply to [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187 Attention introduces multiplicative relationships, i.e. x*y whereas ordinary operations only have additive relationships. Permalink 9
HjalmarLucius t1_iqtl4yv wrote
Reply to [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187
Attention introduces multiplicative relationships, i.e. x*y whereas ordinary operations only have additive relationships.