Viewing a single comment thread. View all comments

trendymoniker t1_ivf84sd wrote

Easy answer is distillations like EfficientNet or DistillBERT. You can also get an intuition for the process by taking a small easy dataset β€” like MNIST or CIFAR β€” and running a big hyperparameter search over models. There will be small models which perform close to the best models.

These days nobody uses ResNet or Inception but there was a time they were the bleeding edge. Now it’s all smaller more precise stuff.

There other dimension you can win over big models is hardcoding in your priors.

11