You must log in or register to comment.

currentscurrents OP t1_je631oa wrote


  • This is a survey paper. The authors summarize a variety of arguments about whether or not LLMs truly "understand" what they're learning.

  • The major argument in favor of understanding is that LLMs are able to complete many real and useful tasks that seem to require understanding.

  • The major argument against understanding is that LLMs are brittle in non-human ways, especially to small changes in their inputs. They also don't have a real-world experience to ground their knowledge in (although multimodal LLMs may change this).

  • A key issue is that no one has a solid definition of "understanding" in the first place. It's not clear how you would test for it. Tests intended for humans don't necessarily test understanding in LLMs.

I tend to agree with their closing summary. LLMs likely have a type of understanding, and humans have a different type of understanding.

>It could thus be argued that in recent years the field of AI has created machines with new modes of understanding, most likely new species in a larger zoo of related concepts, that will continue to be enriched as we make progress in our pursuit of the elusive nature of intelligence.


Purplekeyboard t1_je8m61n wrote

> LLMs likely have a type of understanding, and humans have a different type of understanding.

Yes, this is more of a philosophy debate than anything else, hinging on the definition of the word "understanding". LLMs clearly have a type of understanding, but as they aren't conscious it is a different type than ours. Much as a chess program has a functional understanding of chess, but isn't aware and doesn't know that it is playing chess.


dampflokfreund t1_je8zlkp wrote

We don't have a proper definition of consciousness nor a way to test it either, by the way.


TitusPullo4 t1_je959tq wrote

Consciousness is having a subjective experience. It is well defined. Though we do lack ways to test for it.


trashacount12345 t1_jeddoei wrote

This is the agreed upon definition in philosophy. I’m not sure what another definition would be besides “it’s not real”.


ninjasaid13 t1_jeh2s4o wrote

>Consciousness is having a subjective experience.

and what's the definition of subjective?


Amster2 t1_je9h981 wrote

Im not sure they arent conscious. They can clearly reference themselves, and seem to undeestand they are a LLM with information cutoof in 21, etc.

He behaves like he is self conscious. How can we determine if they really are or not?


braindead_in t1_je8ucis wrote

In Nonduality, understanding or knowledge is the nature of pure consciousness, along with existence and bliss. I think of it as an if-then statement in programming. Once a program enters into an if condition, it understands and knows what has to be done next.


throwaway957280 t1_je7du40 wrote

Worth noting this paper predates ChatGPT (3.5) by a few months.


currentscurrents OP t1_je7faup wrote

This seems to be the delay of the publishing process; it went up on arxiv in October but is getting attention now because it was finally published March 21st.

I think the most interesting change since October is that GPT-4 is much better at many of the tricky sentences that linguists used to probe GPT-3. But it's still hard to prove the difference between "understanding" and "memorization" if you don't know what was in the training data, and we don't.


sdmat t1_je7pvwm wrote

We can test with things that are highly unlikely to be in the training data.

Model: GPT-4

How can Mars colonists use jeans to improve habitat survivability?

Mars colonists can use jeans to improve habitat survivability by repurposing them in several creative and resourceful ways. Here are some ideas on how jeans can be utilized to enhance the living conditions on Mars:

Insulation: Jeans can be cut into pieces and used as insulation material for habitats, providing an additional layer of thermal protection against the harsh Martian environment.

Air and Water Filtration: Denim fibers can be used to create simple air and water filters, helping to remove dust particles and impurities from both air and water resources.

Gardening and Agriculture: Jeans can be shredded and used as a component of a growth medium for plants, providing a source of organic material that can help retain moisture and support plant growth in Martian greenhouses.

Patching Material: The durable fabric of jeans can be used to patch holes or reinforce weak areas in habitat structures or protective suits.

Textile-Based Electronics: By weaving conductive threads into the denim fabric, jeans can be transformed into flexible, wearable electronic devices or sensors. These could be employed for monitoring vital signs, environmental conditions, or providing communication capabilities.

Emergency Tourniquets: In case of injuries, jeans can be cut into strips and used as improvised tourniquets to control bleeding.

Storage Solutions: Jeans can be fashioned into storage bags or pouches for organizing tools, equipment, and other essential items within the habitat.

Crafting and Construction: Denim fibers can be combined with other materials to create a variety of items, such as ropes, mats, or even structural components for the habitat.

While these suggestions highlight the potential for repurposing jeans in a Mars colony, it's essential to note that relying solely on jeans for habitat survivability would not be sufficient. However, creative reuse of available resources can undoubtedly contribute to the overall sustainability and resilience of a Martian habitat.


midasp t1_je80uot wrote

And exactly what does that prove?


sdmat t1_je83jw4 wrote

Objectively prove? Nothing. But subjectively there is a stark difference in the quality of suggestions and apparent depth of understanding from earlier LLMs. E.g. 3.5 suggested using jeans for radiation shielding "because denim is a thick material".

I did try a web search and directly asking the model for references. Unsurprisingly jeans for Mars colonization doesn't seem to be an existing concept, so it's almost certainly not in the training set.


currentscurrents OP t1_je83z1p wrote

I think these are all ideas from the internet, but it did understand that they would be appropriate for the task of making jeans useful on mars.

It seems to have understood the instructions and then pulled relevant information out of its associative memory to build the response.


Purplekeyboard t1_je8l78y wrote

The point is that GPT-3 and GPT-4 can synthesize information to produce new information.

One question I like to ask large language models is "If there is a great white shark in my basement, is it safe for me to be upstairs?" This is a question no one has ever asked before, and answering the question requires more than just memorization.

Google Bard answered rather poorly, and said that I should get out of the house or attempt to hide in a closet. It seemed to be under the impression that the house was full of water and that the shark could swim through it.

GPT-3, at least the form of it I used when I asked it, said that I was safe because sharks can't climb stairs. Bing Chat, using GPT-4, was concerned that the shark could burst through the floorboards at me, because great white sharks can weigh as much as 5000 pounds. But all of these models are forced to put together various bits of information on sharks and houses in order to try to answer this entirely novel question.


NotDoingResearch2 t1_je8l7oz wrote

Is this accurate though? Serious question as I’m not an expert on the use of jeans on Mars.


sdmat t1_je8p59n wrote

I think we can safely say this is:

> it's essential to note that relying solely on jeans for habitat survivability would not be sufficient.

I don't have a degree in exojeanology, but the ideas all seem to at least be at the level of smart generalist brainstorming.


  • Contextually appropriate - these are plausibly relevant to the needs of a Martian colony
  • Nontrivial - no "wear them as clothing" (GPT3.5 did this)
  • Logical and well articulated - each is a clearly expressed and internally consistent concept
  • Passes the "common sense" test - no linguistically valid statements that are ridiculous if you have general knowledge of the world. E.g. "Use jeans to signal ships in orbit", or GPT3.5's suggestion to use jeans as radiation shielding because denim is a thick fabric.

They aren't necessarily good ideas in the sense that NASA should be writing this down. But that isn't the point.

I would argue that behaviourally GPT4 demonstrates a great deal of understanding here and a notable lack of the "brittleness and unhumanlike errors" that 3.5 shows on the same question.


3_Thumbs_Up t1_je8tvtc wrote

>We can test with things that are highly unlikely to be in the training data.

We can also test things where theres an infinite amount of alternatives so that memorization would be impossible.

If GPT could solve every arithmetic problem thrown at it, then it's obvious that it has developed some understanding of arithmetic, as it's simply impossible to memorize the answer for every possible problem.

However, the fact that it fails on arithmetic of large numbers could be an indication that it doesn't understand, but failure could also be caused by other factors, such as lack of enough working memory or similar (humans would fail at multiplying large numbers in their head as well).

So I think one could prove understanding, but proving lack of understanding seems harder.


sdmat t1_je93qz6 wrote

The arithmetic thing is interesting, GPT4 usually fails at multiplying a pair of 3 digit numbers as pure token prediction:

> Multiply 438 * 939 > > The product of 438 and 939 is 411,182.

But it does actually know how to perform long form multiplication and with the right prompt can get the correct answer this way:

>Multiply 438 by 939 using long form multiplication. Check each step for correctness as you perform it and revise if there is an error. Add trailing zeros to the parts. Perform the final addition in steps by successively adding each part. > >Alright, let's multiply 438 by 939 using long-form multiplication. I'll write out the steps and verify their correctness as I go along. > > 438 > x 939 > _________ >Step 1: Multiply 438 by 9 (the digit in the ones place of 939) > > 438 > x 9 > _________ > 3942 (9 * 438 = 3942, Correct) >Step 2: Multiply 438 by 30 (3 in the tens place of 939, so we multiply by 3 and add a trailing zero) > > 438 > x 3 > _________ > 1314 (3 * 438 = 1314, Correct) >Now, add a trailing zero to this product: 13140 > >Step 3: Multiply 438 by 900 (9 in the hundreds place of 939, so we multiply by 9 and add two trailing zeros) > > 438 > x 9 > _________ > 3942 (9 * 438 = 3942, Correct) >Now, add two trailing zeros to this product: 394200 > >Step 4: Add the three products from steps 1, 2, and 3 > > 3942 (Step 1) > +13140 (Step 2) > +394200 (Step 3) > __________ >Step 5: Add the products in steps > >First, add Step 1 and Step 2: > > 3942 > +13140 > _________ > 17082 >Now, add this sum to the product from Step 3: > > 17082 > +394200 > _________ > 411282 >So, the product of 438 and 939 is 411,282.

It still often makes mistakes doing this, but it clearly has a good grasp of the concept.

It's able to correctly perform the high level long multiplication procedure for large numbers (haven't had the patience to see just how large) but is let down by the reliability of the arithmetic in constituent steps.

A lot of humans have the same problem.


planetofthemapes15 t1_je7efmz wrote

This should basically disqualify it IMO, thanks for bringing up that point

Edit: There are other suggestions that GPT-4 has abstract understanding. This paper is based on data collected before the release GPT-4 or even GPT-3.5 (October 22). For those drive-by downvoting my comment, explain why this paper is valuable in the face of contrary evidence such as that in which is actually based on the bleeding-edge technology which has generated all the recent interest in LLM's.


pengo t1_je99h3k wrote

There are two meanings of understanding:

  1. My conscious sense of understanding which I can experience and I have no ability to measure in anyone else, unless someone solves the hard problem.
  2. Demonstrations of competence, which we say "show understanding", which can be measured, such as exam results. Test results might be a proxy for measuring conscious understanding in humans, but do not directly test is, and have no connection to it whatsoever in machines.

That's it. They're two different things. Two meanings of understanding. The subjective experience and the measurement of understanding.

Machines almost certainly have no consciousness, but can demonstrate understanding. There's no contradiction in that because showing understanding does not imply having (conscious) understanding. A tree falling doesn't mean someone has to experience the sensation of hearing it, that doesn't mean it didn't fall. And if you hear a recording of a tree falling, then no physical tree fell. They're simply separate things. A physical thing, and a mental state of mind. Just like conscious understanding and demonstrations of understanding.

Why pretend these are the same thing and quiz people about? Maybe the authors can write their next paper on the "debate" over whether season means a time of year or something you do with paprika.

Really sick of this fake "debate" popping up over and over.


Barton5877 t1_jeadc3o wrote

On 2:

Competence is used sociologically to describe ability to perform, such as speak or act, in a manner demonstrating some level of mastery - but isn't necessarily a sign of understanding.

I'd be loathe to have to design a metric or assessment by which to "measure" understanding. One can measure or rate competence - the degree to which the person "understands" what they are doing, why, how, for what purpose and so on is another matter.

In linguistics, there's also a distinction between practical and discursive reason that can be applied here: ability to reason vs ability to describe the reasoning. Again, understanding escapes measurement, insofar as what we do and how we know what we are doing isn't the same as describing it (which requires both reflection on our actions and translation into speech that communicates them accurately).

The long and short of it being that "understanding" is never going to be the right term for us to use.

That said, there should be terminology for describing the conceptual connectedness that LLMs display. Some of this is in the models and design. Some of it is in our projection and psychological interpretation of their communication and actions.

I don't know to what degree LLMs have "latent" conceptual connectedness, or whether this is presented only in the response to prompts.


pengo t1_jechdk0 wrote

> The long and short of it being that "understanding" is never going to be the right term for us to use.

Yet still I'm going to say "Wow, ChatGPT really understands the nuances of regex xml parsing" and also say, "ChatGPT has no understanding at all of anything" and leave it to the listener to interpret each sentence correctly.

> I don't know to what degree LLMs have "latent" conceptual connectedness, or whether this is presented only in the response to prompts.

concept, n.

  1. An abstract and general idea; an abstraction.

  2. Understanding retained in the mind, from experience, reasoning and imagination

It's easy to avoid using "understanding" for being imprecise but it's impossible not to just pick other words which have the exact same problem.


Barton5877 t1_jee1fw4 wrote

That the definition of concept you're citing here uses the term "understanding" is incidental - clearly it's a definition of concept in the context of human reasoning.

Whatever terminology we use ultimately for the connectedness of neural networks pre-trained on language is fine by me. It should be as precise to the technology as possible whilst conveying effects of "intelligence" that are appropriate.

We're at the point now where GPT-4 seems to produce connections that come from a place that's difficult to find or reverse engineer - or perhaps which simply come from token selections that are surprising.

That's what I take away from a lot of the discussion at the moment - I have no personal insight into the model's design, or the many parts that are stitched together to make it work as it does (quoting Altman here talking to Lex).


bgighjigftuik t1_je6yfrg wrote

Associative memory on steroids. That's my bet on LLMs' "understanding"