Submitted by abc220022 t3_100y331 in MachineLearning
lukeiy t1_j2luz7z wrote
Use another model to reduce this context to a vector, then append it to each token. This was the process used in Set Transformers (TSPN)
kdqg t1_j2oo4rl wrote
Also have a look at the slot attention mechanism, which does something similar but arguably more elegantly
Viewing a single comment thread. View all comments