Viewing a single comment thread. View all comments

CKtalon t1_j62n9yw wrote on January 27, 2023 at 7:17 AM

Reply to comment by data-drone in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

About 10-12 times more then the tokens seen.

NoFairYouCheated t1_j68z10h wrote on January 28, 2023 at 4:34 PM

Are there any papers or blog posts discussing this undertraining?

CKtalon t1_j695owv wrote on January 28, 2023 at 5:20 PM

No. There are blog posts about it performing quite badly: https://www.surgehq.ai/blog/how-good-is-hugging-faces-bloom-a-real-world-human-evaluation-of-language-models

Then based on the Chinchilla paper, you can kind of infer that it's a result of undertraining.