anony_sci_guy t1_j6mr4k6 wrote on January 31, 2023 at 1:02 PM

Reply to comment by starfries in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

Glad it helped! The first thing I tried was just to re-initialize just like at the beginning of training, but I don't remember how much I dug into trying to modify it before moving on. That's great your seeing some improvements though! Would love to hear how the rest of your experiment goes!! =)

anony_sci_guy t1_j681trq wrote on January 28, 2023 at 11:43 AM

Reply to comment by starfries in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

Yeah, there is some stuff published out there. It's related to pruning (A link to a ton of papers on it); the lottery ticket method solves this one well, because you're re-training from scratch, just with "lucky" selection of the initialized weights. Results-wise, I never got anything to improve because of the distributional changes caused by trying to re-randomize a subset in the middle of training. Still saw the same level of performance as without re-randomizing, but that basically just showed that the way that I was re-randomizing wasn't helping or hurting b/c those neurons weren't important...

anony_sci_guy t1_j63nj0u wrote on January 27, 2023 at 2:12 PM

Reply to comment by nmfisher in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

This was exactly my first thought too - free up all those extra parameters & re-randomize them. Problem could be that the re-randomized parameters will have a big gap in distribution between the pre-tuned and re-randomized weights, so you'd want different step sizes for them. I've played with it before & ran into this problem, but got too lazy to actually implement a solution. (I'm actually a biologist, so don't really have bandwidth to dig into the ML side as much)..

anony_sci_guy t1_j3swstr wrote on January 10, 2023 at 9:14 PM

Reply to comment by [deleted] in [D] Found very similar paper to my submitted paper on Arxiv by [deleted]

I feel like everyone in the new generation has this happen once, learns from it, and has started pre-printing everything...