say_wot_again
say_wot_again t1_itn823q wrote
From the abstract, it seems very similar to common self supervised techniques in computer vision. The difference is that in the case of computer vision SSL, you use the model's confident outputs on normal data to train its performance on heavily augmented data, whereas here you use the model's performance on "chain of thought" prompts to train its performance on normal prompts. But either way, the principle of "use the model's high confidence outputs on easy examples to train it on hard examples" stays the same. It's always cool to see this sort of cross pollination between vision and NLP, though the title seems designed to conjure up images of Westworld or Ex Machina.
Edit: it appears one massive difference is that in vision, the augmentation come from the modeler, whereas here the chains of thought actually come from the model's outputs. So it's leveraging the inherent randomness in LLM outputs to generate new training data by relying on the idea that answers that frequently appear in the output are likelier to be correct. This IS pretty cool, and meaningfully different from the vision SSL case insofar as it requires much less manual intervention.
say_wot_again t1_itrmhsx wrote
Reply to comment by DeezNUTSampler in [R] Large Language Models Can Self-Improve by Lajamerr_Mittesdine
Here's an example of what I had in mind. Pseudolabels for unlabeled data are generated on the clean images, but the student model is trained on a strongly augmented version of the image. It's not contrastive learning because the objective is still explicitly object detection, but instead easy vs hard is the original image vs the strongly augmented one.