029187 OP t1_iqsz9t7 wrote
Reply to comment by hellrail in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187
Yeah I get why the non-locality is useful, as CNNs group data locally, which doesn't make sense in graph data (the relevant word could be very far away in the sentence)
But a densely connected deep neural network already should have what it needs to map out any arbitrary function relating nodes on a graph.
RobKnight_ t1_iqu7c54 wrote
Deeper layers in CNNs are not constrained to locality
029187 OP t1_iqud0ju wrote
true but the attention layers immediately overcome locality.
Viewing a single comment thread. View all comments