Submitted by xutw21 t3_ybzh5j in singularity
TFenrir t1_itmb3es wrote
Reply to comment by gibs in Large Language Models Can Self-Improve by xutw21
Hmmm, I would say that "prediction" is actually a foundational part of all intelligence, from my layman understanding. I was listening to a podcast (Lex Fridman) about the book... Thousand minds? Something like that, and there was an compelling explanation for why prediction played such a foundational role. Yann LeCun is also quoted as saying that prediction is the essence of intelligence.
I think this is fundamentally why we are seeing so many gains out of these new large transformer models.
gibs t1_itnalej wrote
I've definitely heard that idea expressed on Lex's podcast. I would say prediction is necessary but not sufficient for producing sentience. And language models are neither. I think the kinds of higher level thinking that we associate with sentience arise from specific architectures involving prediction networks and other functionality, which we aren't really capturing yet in the deep learning space.
TFenrir t1_itni82q wrote
I don't necessarily disagree, but I also think sometimes we romanticize the brain a bit. There were a lot of things we increasingly are surprised about achieving with language model and scale, and different training architecture. Like Chain of Thought seems to have become not just a tool to improve prompts, but to help with self regulated fine tuning.
I'm reading papers where Google combines more and more of these new techniques, architectures, and general lessons and they still haven't finished smushing them all together.
I wonder what happens when we smush more? What happens when we combine all these techniques, UL2/Flan/lookup/models making child models, etc etc.
All that being said, I think I actually agree with you. I am currently intrigued by different architectures that allow for sparse activation and are more conducive to transfer learning. I really liked this paper:
gibs t1_itnnx1y wrote
Just read the first part -- that is a super interesting approach. I'm convinced that robust continual learning is a critical component for AGI. It also reminds me of another of Lex Fridman's podcasts where he had a cognitive scientist guy (I forget who) whose main idea about human cognition was that we have a collection of mini-experts for any given cognitive task. They compete (or have their outputs summed) to give us a final answer to whatever the task is. The paper's approach of automatically compartmentalising knowledge into functional components I think is another critical part of the architecture for human-like cognition. Very very cool.
Viewing a single comment thread. View all comments