advstra

advstra t1_iw6n49p wrote

So in the paper from a quick skim read they're suggesting a new method for data representation (pairwise similarities), and you suggest adding style vectors (which is another representation method essentially as far as I know) can improve it for multimodal tasks? I think that makes sense, reminds me of contextual word embeddings if I didn't misunderstand anything.

2

advstra t1_iw6lyol wrote

Yeah I got lost a bit there but I think that part is them trying to find a metaphor for what they were saying in the first half, before the "for example". I thought essentially they were suggesting Diffie Hellman key exchange can help with multimodal or otherwise incompatible training data, instead of tokenizers (or feature fusion), I'm not sure how they're suggesting to implement that though.

1

advstra t1_iw4in0h wrote

People are making fun of you but this is exactly how CS papers sound (literally the first sentence of the abstract: Neural networks embed the geometric structure of a data manifold lying in a high-dimensional space into latent representations.). And from what I could understand more or less you actually weren't that far off?

6