They show that the large LMSI models can be distilled into smaller models while maintaining accuracy, but I wonder what size model is necessary for the LMSI training itself to be viable. They only show results for 540B. Would be very curious to see a study here if there is a certain model size where this kicks in.
Pwhids t1_itn9glu wrote
Reply to [R] Large Language Models Can Self-Improve by Lajamerr_Mittesdine
They show that the large LMSI models can be distilled into smaller models while maintaining accuracy, but I wonder what size model is necessary for the LMSI training itself to be viable. They only show results for 540B. Would be very curious to see a study here if there is a certain model size where this kicks in.