Submitted by ackbladder_ t3_zrpsfm in MachineLearning
ackbladder_ OP t1_j14gx51 wrote
Thanks for your reply. I assume that the trade of isn’t linear so hoping to find ‘Goldilocks’ point where the performance isn’t heavily affected or affected enough that it still passes a given task but not as well. I’ll look up knowledge distillation.
svantana t1_j14jwo4 wrote
Yeah, "distillation" is a key term here. Also, paperswithcode has joint data on performance and parameter counts, which gives a nice overview of the current pareto front. rwightman's repos is another nice resource.
Viewing a single comment thread. View all comments