Submitted by AmalgamDragon t3_yf73ll in MachineLearning

My search foo has failed me, so I'm wondering if anyone has heard of using a pair of NNs with identical architectures, but different seeds, to train a model for producing embedding vectors without defining a supervised learning task. The outputs of the NNs would be the embedding vector directly. The two NNs would be trained synchronously and use the output of the other as the label with the training continuing until the outputs of the NNs become sufficiently close.

Has anyone heard of such a thing?

1

Comments

You must log in or register to comment.

DaLameLama t1_iu2jo4l wrote

You need a way to prevent the training from collapsing to trivial solutions (like, both NNs output the same constant vector for all inputs).

Methods similar to your idea are Barlow Twins or VICReg.

6

AmalgamDragon OP t1_iu2qt5u wrote

Thanks! Your reference to the issue with constant vectors and the same reference in the relevant papers for those methods you mentioned completes my investigation on this (i.e. this isn't an approach worth pursuing).

0

DaLameLama t1_iu2rebs wrote

Why not worth pursuing? LeCun still believes VICReg is amazing. Feel free to come up with your own twist :)

1

AmalgamDragon OP t1_iu2souc wrote

I meant the general approach I laid out in my original post. That said, I'm also not working with image data (or audio or NLP) and generalizing VICReg seems like its more in theory then in practice at the moment.

2

A_Again t1_iu23iwf wrote

Hello friend, using the contrast in labels without supervised labels usually is referred to as a "contrastive predictive coding". Contrastive because you're using contrast between embeddings, predictive because...well you get it

For your consideration though the term "triplet loss" applies what you're describing...just in a supervised mode. And dual-encoder retrieval systems are used for semantic search

I'm on a bus rn and gotta run I hope these at least help you in your search :)

3

IntelArtiGen t1_iu24fuh wrote

I'm not sure how it would really learn something from the input if you don't define a more useful task. How would this model penalize a "collapse" situation where both models always predict 0 for example or any random value?

Contrastive learning algorithms try to build two different embeddings for different parts of the same input, penalize collapse, and train a model to make the two embeddings as close as possible, knowing they come from the same input, even if they are from different parts of that input. It looks a bit like what you said but I don't know an implementation that is like the one you said.

3

AmalgamDragon OP t1_iu25s20 wrote

> I'm not sure how it would really learn something from the input if you don't define a more useful task. How would this model penalize a "collapse" situation where both models always predict 0 for example or any random value?

Yeah, it may not work well. I haven't been able to track down if this is something that has been tried and been found wanting or not.

1