029187 OP t1_iqud0ju wrote
Reply to comment by RobKnight_ in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187
true but the attention layers immediately overcome locality.
Viewing a single comment thread. View all comments