timonix t1_iqw49rz wrote
Reply to comment by HjalmarLucius in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187
I saw a similar architecture were the outputs from one network was the filter coefficients of a second CNN.
Viewing a single comment thread. View all comments