FirstOrderCat t1_it9b2t8 wrote on October 21, 2022 at 9:43 PM

Reply to comment by visarga in U-PaLM 540B by xutw21

>The new solution is 2x better than before.

Is it like 2 points only better? They just put very small portion (6 points) on Y axis..

Spoffort t1_itbjrj0 wrote on October 22, 2022 at 11:37 AM

I know what do you mean, look at the x axis where compute is. The model is not 2 times better (your point with y axis) but 2 times less compute for given outcome (x axis). If you want i can explain it further 😄

FirstOrderCat t1_itc6pne wrote on October 22, 2022 at 3:00 PM

It looks like they had point of diminishing return somewhere at 0.5*1e25 FLOPS.

After that model trains much slower. They could continue training farther, and say they "saved" another 20M TPU hours.