Viewing a single comment thread. View all comments

TemperatureAmazing67 t1_jbzcn6a wrote

>extensions of LLMs (like
>
>PALM-E
>
>) are a heck of a lot more than an abacus. I wonder what would happen if Google just said, "screw it", and scaled it from 500B to 50T parameters. I'm guessing there are reasons in the architecture that it would

The problem is that we have scaling laws for NN. We just do not have the data for 50T parameters. We need somehow to get these data. The answer on this question costs a lot.

3

Co0k1eGal3xy t1_jbzi8wc wrote

  1. Double Decent, more parameters are MORE data efficient.
  2. Most of these LLMs barely complete 1 epoch, so there is no concern about overfitting currently.
1