Pwhids t1_itn9glu wrote on October 24, 2022 at 10:01 PM

Reply to [R] Large Language Models Can Self-Improve by Lajamerr_Mittesdine

They show that the large LMSI models can be distilled into smaller models while maintaining accuracy, but I wonder what size model is necessary for the LMSI training itself to be viable. They only show results for 540B. Would be very curious to see a study here if there is a certain model size where this kicks in.