chimp73

chimp73 t1_j45vsgb wrote

Bitter lesson 3.0: The entire idea of fine-tuning on a large pre-trained model goes out of the window when you consider that the creators of the foundation model can afford to fine-tune it even more than you because fine-tuning is extremely cheap for them and they have way more compute. Instead of providing API access to intermediaries, they can simply sell services to the customer directly.

52

chimp73 t1_j2vw251 wrote

I made a summary of the related work section with some help from ChatGPT:

> Pruning has been applied to smaller models, but has not been studied in large models like GPT with over 10 billion parameters. Previous pruning methods have required retraining the model after pruning, which is time-consuming and resource-intensive for large models like GPT. SparseGPT has been developed for pruning large GPT models without retraining. There has been significant research on post-training methods for quantizing GPT-scale models, which involve reducing the precision of the weights and activations in the model to reduce memory and computational requirements. The SparseGPT method can be used in conjunction with these quantization methods to further compress the model.

3