ayse_ww t1_ixgbva3 wrote on November 23, 2022 at 5:48 AM Reply to comment by bradenjh in [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh This is quite interesting. Is such self-training scheme similar to recurrent network? Permalink Parent 0
ayse_ww t1_ixgbva3 wrote
Reply to comment by bradenjh in [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh
This is quite interesting. Is such self-training scheme similar to recurrent network?