porcenat_k t1_itxcyl3 wrote on October 27, 2022 at 12:09 AM

Reply to comment by manOnPavementWaving in Where does the model accuracy increase due to increasing the model's parameters stop? Is AGI possible by just scaling models with the current transformer architecture? by elonmusk12345_

https://ai.facebook.com/blog/studying-the-brain-to-build-ai-that-processes-language-as-people-do/

Here is a link to the one of the most recent developments. There are plenty more.

>What calculations did you do to to get to "a billion times more intelligent"?

That's a long discussion based on assumptions I find to be very reasonable. If you insist, I can do go at length. To simplify see the empirical fact that the second most intelligent species, the chimpanzee, has a cortex just 3x smaller than human. The gap intelligence as a result of such an increase is breathtaking. Indeed, quantity leads to vast qualitative leaps. Chimpanzees and gorillas trillions of years from now have no chance of inventing even the simplest tools. If 3x above chimpanzee is human intelligence, what is 10x above human?

porcenat_k t1_ituc77f wrote on October 26, 2022 at 11:28 AM

Reply to comment by manOnPavementWaving in Where does the model accuracy increase due to increasing the model's parameters stop? Is AGI possible by just scaling models with the current transformer architecture? by elonmusk12345_

Artificial neural networks are sufficient mathematical representations of biological cortices. there a huge amount of evidence that concludes this is the case. All that’s left to do is compare human and animal brains to our Ai models. The human brain doesn’t use all 100 trillion parameters on any one task. In fact the brain is divided into regions that allocate compute resources to vision, language, audio etc.. Not even half our brain devotes that many resources to one major region. The upper bound would be 50 trillion parameters. 1 trillion is too small. There aren’t 100 different major cortical regions. There are 10 . All working on the same architecture but processing different modalities. Conservatively 10 trillion parameters are allocated to each major region. Lets take a language model with 10 trillion weights. At that capacity it should be understand language completely. Then, having read all of pub med for example, it would be more knowledgeable than all medical professionals on the planet. A 100 trillion parameter model, I’ve calculated, would be more than a billion times more intelligent than the 10 trillion parameter, in terms of iq, while also having the benefits of of all human knowledge and never being tired and being immortal.

porcenat_k t1_itu9urc wrote on October 26, 2022 at 11:02 AM

Reply to comment by ReasonablyBadass in Where does the model accuracy increase due to increasing the model's parameters stop? Is AGI possible by just scaling models with the current transformer architecture? by elonmusk12345_

Indeed. The few tweaks I’d say are continual learning and longer short term memory. Both are active research sub fields. All that’s left to do is scale model size which I consider to be way more than data. Human beings understand basic concepts and don’t need to read the entire internet for that. Because we have evolved bigger brains.

porcenat_k t1_itt56ff wrote on October 26, 2022 at 2:47 AM

Reply to comment by manOnPavementWaving in Where does the model accuracy increase due to increasing the model's parameters stop? Is AGI possible by just scaling models with the current transformer architecture? by elonmusk12345_

100 billion parameter models seemed impossible too, back when the size of neural networks was a few million. I'm expecting 10 trillion parameters to be human level AGI.

porcenat_k t1_itt4w3g wrote on October 26, 2022 at 2:44 AM

Reply to comment by manOnPavementWaving in Where does the model accuracy increase due to increasing the model's parameters stop? Is AGI possible by just scaling models with the current transformer architecture? by elonmusk12345_

>Why do you expect such a jump when the industry has been stuck at half a trillion for the past year? All previous jumps were smaller and cost significantly less.

A combination of software and hardware improvements being currently worked on using Nvidia GPUs. https://azure.microsoft.com/en-us/blog/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed/

With regard to Chinchilla, I don't think they disproved anything. See my comment history if you care enough. I've debated quite extensively on this topic.

porcenat_k t1_itssly1 wrote on October 26, 2022 at 1:07 AM

Reply to comment by TFenrir in Where does the model accuracy increase due to increasing the model's parameters stop? Is AGI possible by just scaling models with the current transformer architecture? by elonmusk12345_

The size of language models has been growing exponentially. We should expect 100 trillion parameter dense models by next year. https://i0.wp.com/silvertonconsulting.com/wp-content/uploads/2021/04/Screen-Shot-2021-04-15-at-3.18.31-PM.png?ssl=1

I think that is possible once firms begin using h100 gpus.

porcenat_k t1_itsrnjb wrote on October 26, 2022 at 1:00 AM

Reply to comment by manOnPavementWaving in Where does the model accuracy increase due to increasing the model's parameters stop? Is AGI possible by just scaling models with the current transformer architecture? by elonmusk12345_

"trends predict 5-10 trillion parameter dense models by now, bet your ass they don't exist), the data available is getting too few".

I beg to differ. Indeed, we should expect to see 10 to 20 trillion parameter models this year. Based on industry movements, I'm expecting Meta or Open AI to produce such a model by the end of this year, if not Q1 2023. We don't have enough data for chinchilla compute optimal models. Deep mind scaling laws are flawed in a number of fundamental ways. One of which is that as that sample efficiency, generality and intelligence increases in scale. Large vanilla models require less data in order to achieve better performance. We can train multi trillion parameter dense models with the same or better yet, less data that it took to train gpt 3. It is certainly possible with massive compute clusters running on thousands of A100 gpus to train such a model. Which is exactly what is being done right now. Cheap methods are being focused on right now are a temporary crutch which I'm projected will be put away once firms are able to adopt new gpus such as the H100s.