Viewing a single comment thread. View all comments

Pwhids t1_itn9glu wrote on October 24, 2022 at 10:01 PM

They show that the large LMSI models can be distilled into smaller models while maintaining accuracy, but I wonder what size model is necessary for the LMSI training itself to be viable. They only show results for 540B. Would be very curious to see a study here if there is a certain model size where this kicks in.

[deleted] t1_itnl1x0 wrote on October 24, 2022 at 11:28 PM

[deleted]