ackbladder_ OP t1_j14gx51 wrote on December 21, 2022 at 4:34 PM

Thanks for your reply. I assume that the trade of isn’t linear so hoping to find ‘Goldilocks’ point where the performance isn’t heavily affected or affected enough that it still passes a given task but not as well. I’ll look up knowledge distillation.

svantana t1_j14jwo4 wrote on December 21, 2022 at 4:53 PM

Yeah, "distillation" is a key term here. Also, paperswithcode has joint data on performance and parameter counts, which gives a nice overview of the current pareto front. rwightman's repos is another nice resource.