Viewing a single comment thread. View all comments

lukeiy t1_j2luz7z wrote

Use another model to reduce this context to a vector, then append it to each token. This was the process used in Set Transformers (TSPN)

2

kdqg t1_j2oo4rl wrote

Also have a look at the slot attention mechanism, which does something similar but arguably more elegantly

1