Viewing a single comment thread. View all comments

hebekec256 OP t1_jbz0mpm wrote

Yes, I understand that. but LLMs and extensions of LLMs (like PALM-E) are a heck of a lot more than an abacus. I wonder what would happen if Google just said, "screw it", and scaled it from 500B to 50T parameters. I'm guessing there are reasons in the architecture that it would just break, otherwise I can't see why they wouldn't do it, since the risk to reward ratio seems favorable to me

0

TemperatureAmazing67 t1_jbzcn6a wrote

>extensions of LLMs (like
>
>PALM-E
>
>) are a heck of a lot more than an abacus. I wonder what would happen if Google just said, "screw it", and scaled it from 500B to 50T parameters. I'm guessing there are reasons in the architecture that it would

The problem is that we have scaling laws for NN. We just do not have the data for 50T parameters. We need somehow to get these data. The answer on this question costs a lot.

3

Co0k1eGal3xy t1_jbzi8wc wrote

  1. Double Decent, more parameters are MORE data efficient.
  2. Most of these LLMs barely complete 1 epoch, so there is no concern about overfitting currently.
1

MinaKovacs t1_jbz2gqw wrote

I think the math clearly doesn't work out; otherwise, Google would have monetized it already. ChatGPT is not profitable or practical for search. The cost of hardware, power consumption, and slow performance are already at the limits. It will take something revolutionary, beyond binary computing, to make ML anything more than expensive algorithmic pattern recognition.

−1