ayse_ww

ayse_ww t1_ixgbva3 wrote on November 23, 2022 at 5:48 AM

Reply to comment by bradenjh in [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh

This is quite interesting. Is such self-training scheme similar to recurrent network?