Historical_Ad2338 t1_is3636c wrote on October 12, 2022 at 11:30 PM

Genuinely shocking, Scaling laws for Neural Language Models figure 6 found that single layers weren't supposed to scale as well (with the same parameters) though ofc the fine details of this new paper are diff.

mrpogiface t1_is400t9 wrote on October 13, 2022 at 3:15 AM

Yeah, I don't think the OP paper did any scaling experiments, so I'm a bit sceptical long term, but it would be awesome for efficiency if it worked out.

Also, it turns out that the scaling laws in the paper you linked weren't quite right either (a la chinchilla) so who knows, maybe there is something that was missed when you move out of the infinite data regime