Viewing a single comment thread. View all comments

advstra t1_iw6n49p wrote

So in the paper from a quick skim read they're suggesting a new method for data representation (pairwise similarities), and you suggest adding style vectors (which is another representation method essentially as far as I know) can improve it for multimodal tasks? I think that makes sense, reminds me of contextual word embeddings if I didn't misunderstand anything.

2