Submitted by Lajamerr_Mittesdine t3_ycipui in MachineLearning
Pwhids t1_itn9glu wrote
They show that the large LMSI models can be distilled into smaller models while maintaining accuracy, but I wonder what size model is necessary for the LMSI training itself to be viable. They only show results for 540B. Would be very curious to see a study here if there is a certain model size where this kicks in.
[deleted] t1_itnl1x0 wrote
[deleted]
Viewing a single comment thread. View all comments