Viewing a single comment thread. View all comments

porcenat_k t1_itssly1 wrote

The size of language models has been growing exponentially. We should expect 100 trillion parameter dense models by next year. https://i0.wp.com/silvertonconsulting.com/wp-content/uploads/2021/04/Screen-Shot-2021-04-15-at-3.18.31-PM.png?ssl=1

I think that is possible once firms begin using h100 gpus.

8

manOnPavementWaving t1_itt06eo wrote

With H100 the training time optimistically only improves a factor of 9. Not nearly enough to breach the 200x gap between the current largest model and 100 trillion parameter model, and thats in parameter scaling alone, ignoring data scaling. PaLM training took 1200 hours on 6144 tpu v4 chips, and an additional 336 hours on 3072 tpu v4 chips. A 100 trillion parameter model would literally be too big to train before the year 2023 comes to an end.

3

porcenat_k t1_itt56ff wrote

100 billion parameter models seemed impossible too, back when the size of neural networks was a few million. I'm expecting 10 trillion parameters to be human level AGI.

8

manOnPavementWaving t1_itt6vrn wrote

That wasn't 1 year before the prediction of a hundred billion parameters though. Im not doubting that they'll come, im doubting the timeline.

Interested in why you think a 10 trillion parameter would be human level AGI.

3

porcenat_k t1_ituc77f wrote

Artificial neural networks are sufficient mathematical representations of biological cortices. there a huge amount of evidence that concludes this is the case. All that’s left to do is compare human and animal brains to our Ai models. The human brain doesn’t use all 100 trillion parameters on any one task. In fact the brain is divided into regions that allocate compute resources to vision, language, audio etc.. Not even half our brain devotes that many resources to one major region. The upper bound would be 50 trillion parameters. 1 trillion is too small. There aren’t 100 different major cortical regions. There are 10 . All working on the same architecture but processing different modalities. Conservatively 10 trillion parameters are allocated to each major region. Lets take a language model with 10 trillion weights. At that capacity it should be understand language completely. Then, having read all of pub med for example, it would be more knowledgeable than all medical professionals on the planet. A 100 trillion parameter model, I’ve calculated, would be more than a billion times more intelligent than the 10 trillion parameter, in terms of iq, while also having the benefits of of all human knowledge and never being tired and being immortal.

5

manOnPavementWaving t1_itudq0y wrote

What study shows the equivalence of neural network parameters and connections in the brain? What calculations did you do to to get to "a billion times more intelligent"?

1

porcenat_k t1_itxcyl3 wrote

https://ai.facebook.com/blog/studying-the-brain-to-build-ai-that-processes-language-as-people-do/

Here is a link to the one of the most recent developments. There are plenty more.

>What calculations did you do to to get to "a billion times more intelligent"?

That's a long discussion based on assumptions I find to be very reasonable. If you insist, I can do go at length. To simplify see the empirical fact that the second most intelligent species, the chimpanzee, has a cortex just 3x smaller than human. The gap intelligence as a result of such an increase is breathtaking. Indeed, quantity leads to vast qualitative leaps. Chimpanzees and gorillas trillions of years from now have no chance of inventing even the simplest tools. If 3x above chimpanzee is human intelligence, what is 10x above human?

2

manOnPavementWaving t1_ityolvz wrote

They actually do invent tools, but that's not the important thing. What made humans intelligent is having a big brain, and having lots of time. If we were to put a newborn and a baby chimpanzee in a jungle and monitor them, they wouldn't seem all that different regarding intelligence.

Fine if you take that into your calculations, but it can't be attributed to just the bigger brain. Problem being, the 100 trillion parameter model won't have hundreds of thousands of years, and billions of copies of itself.

Cool reference, though! Interesting work

1