Submitted by ackbladder_ t3_zrpsfm in MachineLearning
Deep-Station-1746 t1_j146uw3 wrote
Reply to comment by Deep-Station-1746 in Reduce paramter count in an NN without sacrificing performance [P] by ackbladder_
The laziest option is fp16 quantization. As easy as model.half()
on most torch-based models. Halves the physical size of the model. You could also try knowledge distillation (read up on how distilbert was made, for example). You can also do stuff that is more arch-specific, like if you have a transformer, you could use xformers
efficient attention for example. The list goes on and on.
Viewing a single comment thread. View all comments