Viewing a single comment thread. View all comments

ackbladder_ OP t1_j14gx51 wrote

Thanks for your reply. I assume that the trade of isn’t linear so hoping to find ‘Goldilocks’ point where the performance isn’t heavily affected or affected enough that it still passes a given task but not as well. I’ll look up knowledge distillation.

3

svantana t1_j14jwo4 wrote

Yeah, "distillation" is a key term here. Also, paperswithcode has joint data on performance and parameter counts, which gives a nice overview of the current pareto front. rwightman's repos is another nice resource.

4