You must log in or register to comment.

belacscole t1_j8bf6ol wrote

I wonder if this is the ultimate path to reaching general intelligence. After all, humans evolved by learning to master tools.


big_gondola t1_j8biqv4 wrote

I might say we gain general intelligence by creating different models for different tasks and gain experience on when to call which. This has the when to call which, but not the creation of new models.


diviludicrum t1_j8bxeji wrote

I still think u/belacscole is right - this is analogical to the rudimentary use of tools, which can be done by some higher primates and a small handful of other animals. Tool use requires a sufficient degree of critical thinking to recognise a problem exists and select the appropriate tool for solving it. If done with recursive feedback, this would lead to increasingly skilful tool selection and use over time, resulting in better detection and solution of problems over time. Of course, if a problem cannot possibly be solved with the tools available, no matter how refined their usage is, that problem would never be overcome this way - humans have faced these sorts of technocultural chokepoints repeatedly throughout our history. These problems require the development of new tools.

So the next step in furthering the process is abstraction, which takes intelligence from critical thinking to creative thinking. If a tool-capable AI can be trained on a dataset that links diverse problems with the models that solve those problems and the process that developed those models, such that it can attempt to create and then implement new tools to solve novel problems, then assess its own success (likely via supervised learning, at least at first), we may be able to equip it with the “tool for making tools”, such that it can solve the set of all AI-solvable problems (given enough time and resources).


uristmcderp t1_j8db0gw wrote

The whole assessing its own success is the bottleneck for most interesting problems. You can't have a feedback loop unless it can accurately evaluate if it's doing better or worse. This isn't a trivial problem either, since humans aren't all that great at using absolute metrics to describe quality, once past a minimum threshold.


ksatriamelayu t1_j8ebpx4 wrote

Do people use things like evolutionary fitness + changing environments to describe those quality? Seems dynamic environment might be the answer?


Oat-is-the-Best t1_j8ef5x0 wrote

How do you calculate your fitness? That has the same problem of a model not being able to assess its own success


LetterRip t1_j8dpgxc wrote

There are plenty of examples of tool use in nature that don't require intelligence. For instance ants,

The tool use being demonstrated by toolformer can be purely statistical in nature, no need for intelligence.


thecodethinker t1_j8dpuru wrote

It is purely statistical, isn’t it?

LLMs are statistical models after all.


imaginethezmell t1_j8g4f64 wrote

there are apis for auto ml already

it can simply learn the task to use other ai to create models

its over


robotix_dev t1_j8cekxc wrote

I’ve long thought this is the next stepping stone in the path the path to AGI. The next big step IMO is dynamic, online model augmentation to enable learning new concepts.

Both of those combined seem like a basic approximation of what goes on in our brain.


yashdes t1_j8d8lf9 wrote

I've definitely wondered about this exact thing myself, especially when talking to chatgpt when it responds with insert x here, why couldn't that just be taken out and replaced with the appropriate API call


pyepyepie t1_j8dgah3 wrote

Did it learn to master tools though? I see it more as a neuro-symbolic system (is it the correct term?). It happens a lot in production.


Despacereal t1_j8d971u wrote

In a way yes. I think general intelligence (consciousness in most animals) developed evolutionarily to manage a wide variety of sensory inputs and tasks, and to bridge the gaps between them.

As we develop more individual areas of AI, we will naturally start to combine them to create more powerful programs, such as Toolformer combining the strengths of LLMs and other models. Once we have these connections between capabilities, it should be easier to develop new models that learn these connections more deeply and can do more things.

Some of the things that set us apart from other animals are our incredible language and reasoning capabilities which allow us to understand and interact with an increasingly complex world and augment our capabilities with tools. The perceived understanding that LLMs display using only patterns in text is insane. Combine that with the pace of developments in Chain of Thought reasoning, use of Tools, other areas handling visuals, sound, and motion, and multimodal AI, and the path to AGI is becoming clearer than the vision of a MrBeast™ cataracts patient.


thedude0425 t1_j8hzozd wrote

Intelligence and physical traits evolved in humans through random mutation that eventually allowed humans to use tools.


SnooStories4137 t1_j8lrsug wrote

Some reinforcement learning like algorithm seems like really interesting next step here. Observation = task (like qa or mask filling), actions = api call where the output updates the observation via concatenation as in the paper, environment is apis and database and python installation etc, state is network weights, reward is loss function before and after update to observation.

I feel like even if the only api is just generating text using itself to update the observation ('to help itself think') intuitively seems like it could help for some things. Rather than try to fill in the mask right away, it might recognize better to first 'think a little' to update its working memory (which is of course the observation here).


radi-cho OP t1_j8aora0 wrote


MustBeSomethingThere t1_j8fvp24 wrote

As far as I understand, many of those lucidrains repos doesn't contain the needed AI model. In this case too, that Toolformer AI model is not publicly available.


SleekEagle t1_j8ix4fz wrote

Authors publish papers on research, experiments, findings, etc. They do not always release the code for the models they are studying.

The lucidrains' repos implement the models, creating an open-source implementation for the research

The next step would then be to train the model, which requires a lot more than just the code (most notably, money). I assume you're referring to these trained weights when you say "the needed AI model". Training would require a huge amount of time and money for a team, never mind a single person, to train even one of these models let alone a whole portfolio of them

For this reason, it's not very reasonable to expect lucidrains or any other person to train these models - the open-source implementations are a great contribution on their own!


Taenk t1_j8ckvh2 wrote

Now what if the tool the LLM uses is the training API for itself …


extracensorypower t1_j8e1azu wrote

Every tool except Jira, of course. Nothing sentient could figure that out.


bballerkt7 t1_j8bimv8 wrote

AGI getting closer everyday


BenjaminJamesBush t1_j8c12it wrote

Technically this has always been true.


EducationalCicada t1_j8d5y9z wrote

Not if it's actually impossible.


BashsIash t1_j8djkk4 wrote

Can it be impossible? I'd assume it can't be impossible, otherwise we couldn't be intelligent in the first place.


cd_1999 t1_j8fmlej wrote

Have you heard of Searle's Chinese Room?

Some people (sorry I can't give you references off the top of my head) argue there's something special about the biological nervous system, so the material substrate is not irrelevant. (Sure you could reverse engineer the whole biological system, but that would probably take much longer).


pyepyepie t1_j8dvci2 wrote

I would have told you my opinion if I would know what is the definition of AGI xD


urbanfoh t1_j8elywk wrote

Isn't it almost certainly possible due to the universal approximation theorem?

Assuming consciousness is a function of external variables a large enough network with access to these variables should be able to approximate consciousness.


pyepyepie t1_j8dv3wv wrote

Why do you think it's a step in this direction? Did you read the paper (serious question, it's interesting)?


bballerkt7 t1_j8e6l5f wrote

Because AI being able to use APIs is a big step towards it being able to interact with the real world effectively, specifically the digital world. Imagine chatgpt being able to now do things for you in the digital world like go online shopping for you or trade stocks etc.


pyepyepie t1_j8e7gjp wrote

Thanks :) I agree it's useful but I don't see how it's related to AGI. Additionally, it was already done a long time ago, many "AI" agents used the internet before. I feel that the real challenge is to control language models using structured data, perform planning, etc., not to use language models to interact with the world (which seems trivial to me, sorry), but of course, it's just my opinion - which is probably not even that smart.


VelveteenAmbush t1_j8fusa5 wrote

> I feel that the real challenge is to control language models using structured data, perform planning, etc.

I think the promise of tool-equipped LLMs is that these tools may be able to serve that sort of purpose (as well as, like, being calculators and running wikipedia queries). Could imagine an LLM using a database module as a long-term memory, to keep a list of instrumental goals, etc.. You could even give it access to a module that lets it fine-tune itself or create successor LLMs in some manner. All very speculative of course.


bballerkt7 t1_j8eddln wrote

No worries I think you definitely have a valid take. I always feel not smart talking about AI stuff lol :)


farmingvillein t1_j8frv87 wrote

> not to use language models to interact with the world (which seems trivial to me, sorry),

The best argument here is that "true" intelligent requires "embedded" agents, i.e., agents that can interact with our (or, at least, "a") world (to learn).

Obviously, no one actually knows what will make AGI work, if anything...but it isn't a unique/fringe view OP is suggesting.


mycall t1_j8bjo05 wrote

Progress comes in a multitude of mysterious ways.


sam__izdat t1_j8bn58f wrote

I don't want to be that guy, but can y'all leave the doe-eyed ML mysticism to the more Ray Kurzweil themed subreddits?


Soundwave_47 t1_j8bpaqd wrote

Yes, please keep this sort of stuff in /r/futurology or something. We're here trying to formalize the n steps needed to even get to something that vaguely resembles AGI.


kaityl3 t1_j8d7hsw wrote

Do we even know what WOULD resemble an AGI, or exactly how to tell?


Soundwave_47 t1_j8fu3r6 wrote

Somewhat, and no.

We generally define AGI as an intelligence (which, in the current paradigm, would be a set of algorithms) that has decision making and inference capabilities in a broad set of areas, and is able to improve its understanding of that which it does not know. Think of it like school subjects, it might not be an expert in all of {math, science, history, language, economics}, but it has some notion of how to do basic work in all of those areas.

This is extremely vague and not universally agreed upon (for example, some say it should exceed peak human capabilities in all tasks).


swegmesterflex t1_j8d3t4r wrote

Had this idea and was planning to play around with it when I had more free time. Good to see some evidence it’s a promising direction. I speculate you can actually get a LOT out of this if you’re clever with it. A tool for long term memory could be done by having a lookup table with text embeddings as keys. A tool for vision could be made with an image captioning model + maybe some segmentation to get a richer text description of the image. Many more things you could come up with, that I think could work well if you find some clever way of turning them into text.


MysteryInc152 t1_j8ppoiq wrote

I'd rather the basic senses at least (vision as well as audio) be pretrained as well. We know from Multimodal chain of thought as well as scaling laws for generative mixed modal language models that multimodal models far outperform single modal models on the same data and scale. You won't get that kind of performance gain leveraging those basic senses to outside tools.


drcopus t1_j8cn7av wrote

It would be interesting if it learned which API to use from a description of the API so as to allow it to generalise to new ones!


lucidrage t1_j8kewo9 wrote

> allow it to generalise to generate new ones!

FTFY, that's how you get skynet!


ksatriamelayu t1_j8ebhn4 wrote

Keep in mind that our current theories in Neuroscience broadly agrees something similar is going on with mammalian, even reptilian brains. Hell, maybe even worm brains.

There's autonomous systems everywhere that calls each other for updates and in some certain brains, enough complexity that something that can called thinking occurs.

Practically, offloading calculations to a python REPL, machine translation to GTranslate API call, and knowledge search to Wikipedia corpus is going to let LLMs do what they do best - mask users intent and generate believable enough corpus. Let the facts stay factual and the hallucination stay hallucination.


Varpie t1_j8cftrx wrote

I'm surprised this hasn't been done before. This paper mostly cites works from the last 2-3 years, but surely, something similar was done previously (maybe not using the same kind of model)? In fact, isn't it pretty close to what search engines do to provide instant results when given an equation or an address for instance? Does anyone know of such work?


clex55 t1_j8d2oe3 wrote

The next step must be creating and programming those tools and incorporating them on the fly.


flamonster92 t1_j8gz5yk wrote

Imagine an AI that could write another AI.


UnderstandingDry1256 t1_j8ev9bx wrote

An obvious idea is to connect gpt to browser api and let it go and learn 😄


Ok-Variety-8135 t1_j8l9g5j wrote

If we treat the output of transformer as inner monolog and only perform real output when it calls <action> say: something </action>.

It can speak proactively, and hiding their inner thought, just like human does.


dgrsmith t1_j8n4kwe wrote

From a cognitive point of view, humans and animals have modules that they rely on for certain tasks. For Human Neuropsych assessment, the combination of the function of these modules gives you a score for general intelligence, with each module contributing toward the whole. Having a removed or changed “module” for one reason or another will sometimes cause localized task failures (e.g., neurodegenerative disease or brain injury) or approach to tasks that is atypical (e.g., atypical brain development). Maybe we can think of specific cognitive functions as being API calls to a modules in this “tool use” paradigm? This is likely not an original thought, and if anyone has references or has heard of this idea, please let me know!


leepenkman t1_j8co3gr wrote

Also checkout its a multi modal model so visits any input links, downloads web pages and images are analyzed with NNs to make better text.

Also does speech to text/text to speech so can talk

As many have said lots of these things will likely/hopefully come together into something big, needs a few things like the when to train new tools/model zoo thing, but internally Text Generator is based on multiple models too and has some internal decision making for which model is best on every request (so you dont need to pick a code/text model it does it automatically) which is similar but it's not training new nets.


Reasonable_Ad_6572 t1_j8cvoim wrote

BuT GpTChAT iS nO BuENo - Yann LeCunn


marcus_hk t1_j8ejn0n wrote

Which part do you disagree with here:

My unwavering opinion on current (auto-regressive) LLMs

  1. They are useful as writing aids.
  2. They are "reactive" & don't plan nor reason.
  3. They make stuff up or retrieve stuff approximately.
  4. That can be mitigated but not fixed by human feedback.
  5. Better systems will come


TheRealMichaelScoot t1_j8c17ym wrote

This is a bs paper. Simply calling APIs


currentscurrents t1_j8c51f0 wrote

...and getting radically improved performance across several important tasks because of calling those APIs.

Plus, calling APIs is very important for integration into real systems because they can trigger real-world actions. Imagine a Siri that calls a bunch of different APIs based on complex instructions you give it.


sloganking t1_j8cculc wrote

It's not just calling APIs. This model is independently teaching itself how to use new APIs and when to use them. The process is pretty much the same for any API, and doesn't require much extra effort by the programmer to add a new one.

This paper also states it is one of the first to have models learn to use APIs in an unsupervised way, meaning they teach themselves instead of relying on a ton of human annotated data.


tetelestia_ t1_j8cde0k wrote

And if we can extend this to creating synthetic training data with a set of known APIs, this could be a big step forward to indexing external information