**currentscurrents**
OP
wrote

Reply to comment by **throwaway957280** in **[R] The Debate Over Understanding in AI's Large Language Models**

This seems to be the delay of the publishing process; it went up on arxiv in October but is getting attention now because it was finally published March 21st.

I think the most interesting change since October is that GPT-4 is much better at many of the tricky sentences that linguists used to probe GPT-3. But it's still hard to prove the difference between "understanding" and "memorization" if you don't know what was in the training data, and we don't.

#
**sdmat**
wrote

We can test with things that are *highly unlikely* to be in the training data.

Model: GPT-4

How can Mars colonists use jeans to improve habitat survivability?

Mars colonists can use jeans to improve habitat survivability by repurposing them in several creative and resourceful ways. Here are some ideas on how jeans can be utilized to enhance the living conditions on Mars:

Insulation: Jeans can be cut into pieces and used as insulation material for habitats, providing an additional layer of thermal protection against the harsh Martian environment.

Air and Water Filtration: Denim fibers can be used to create simple air and water filters, helping to remove dust particles and impurities from both air and water resources.

Gardening and Agriculture: Jeans can be shredded and used as a component of a growth medium for plants, providing a source of organic material that can help retain moisture and support plant growth in Martian greenhouses.

Patching Material: The durable fabric of jeans can be used to patch holes or reinforce weak areas in habitat structures or protective suits.

Textile-Based Electronics: By weaving conductive threads into the denim fabric, jeans can be transformed into flexible, wearable electronic devices or sensors. These could be employed for monitoring vital signs, environmental conditions, or providing communication capabilities.

Emergency Tourniquets: In case of injuries, jeans can be cut into strips and used as improvised tourniquets to control bleeding.

Storage Solutions: Jeans can be fashioned into storage bags or pouches for organizing tools, equipment, and other essential items within the habitat.

Crafting and Construction: Denim fibers can be combined with other materials to create a variety of items, such as ropes, mats, or even structural components for the habitat.

While these suggestions highlight the potential for repurposing jeans in a Mars colony, it's essential to note that relying solely on jeans for habitat survivability would not be sufficient. However, creative reuse of available resources can undoubtedly contribute to the overall sustainability and resilience of a Martian habitat.

#
**midasp**
wrote

And exactly what does that prove?

#
**sdmat**
wrote

Objectively prove? Nothing. But subjectively there is a stark difference in the quality of suggestions and apparent depth of understanding from earlier LLMs. E.g. 3.5 suggested using jeans for radiation shielding "because denim is a thick material".

I did try a web search and directly asking the model for references. Unsurprisingly jeans for Mars colonization doesn't seem to be an existing concept, so it's almost certainly not in the training set.

#
**currentscurrents**
OP
wrote

I think these are all ideas from the internet, but it did understand that they would be appropriate for the task of making jeans useful on mars.

It seems to have understood the instructions and then pulled relevant information out of its associative memory to build the response.

#
**Purplekeyboard**
wrote

The point is that GPT-3 and GPT-4 can synthesize information to produce new information.

One question I like to ask large language models is "If there is a great white shark in my basement, is it safe for me to be upstairs?" This is a question no one has ever asked before, and answering the question requires more than just memorization.

Google Bard answered rather poorly, and said that I should get out of the house or attempt to hide in a closet. It seemed to be under the impression that the house was full of water and that the shark could swim through it.

GPT-3, at least the form of it I used when I asked it, said that I was safe because sharks can't climb stairs. Bing Chat, using GPT-4, was concerned that the shark could burst through the floorboards at me, because great white sharks can weigh as much as 5000 pounds. But all of these models are forced to put together various bits of information on sharks and houses in order to try to answer this entirely novel question.

#
**NotDoingResearch2**
wrote

Is this accurate though? Serious question as I’m not an expert on the use of jeans on Mars.

#
**sdmat**
wrote

I think we can safely say this is:

> it's essential to note that relying solely on jeans for habitat survivability would not be sufficient.

I don't have a degree in exojeanology, but the ideas all seem to at least be at the level of smart generalist brainstorming.

Specifically:

- Contextually appropriate - these are plausibly relevant to the needs of a Martian colony
- Nontrivial - no "wear them as clothing" (GPT3.5 did this)
- Logical and well articulated - each is a clearly expressed and internally consistent concept
- Passes the "common sense" test - no linguistically valid statements that are ridiculous if you have general knowledge of the world. E.g. "Use jeans to signal ships in orbit", or GPT3.5's suggestion to use jeans as radiation shielding because denim is a thick fabric.

They aren't necessarily *good* ideas in the sense that NASA should be writing this down. But that isn't the point.

I would argue that behaviourally GPT4 demonstrates a great deal of understanding here and a notable lack of the "brittleness and unhumanlike errors" that 3.5 shows on the same question.

#
**3_Thumbs_Up**
wrote

>We can test with things that are *highly unlikely* to be in the training data.

We can also test things where theres an infinite amount of alternatives so that memorization would be impossible.

If GPT could solve every arithmetic problem thrown at it, then it's obvious that it has developed some understanding of arithmetic, as it's simply impossible to memorize the answer for every possible problem.

However, the fact that it fails on arithmetic of large numbers could be an indication that it doesn't understand, but failure could also be caused by other factors, such as lack of enough working memory or similar (humans would fail at multiplying large numbers in their head as well).

So I think one could prove understanding, but proving lack of understanding seems harder.

#
**sdmat**
wrote

The arithmetic thing is interesting, GPT4 usually fails at multiplying a pair of 3 digit numbers as pure token prediction:

> Multiply 438 * 939 > > The product of 438 and 939 is 411,182.

But it does actually know how to perform long form multiplication and with the right prompt can get the correct answer this way:

>Multiply 438 by 939 using long form multiplication. Check each step for correctness as you perform it and revise if there is an error. Add trailing zeros to the parts. Perform the final addition in steps by successively adding each part. > >Alright, let's multiply 438 by 939 using long-form multiplication. I'll write out the steps and verify their correctness as I go along. > > 438 > x 939 > _________ >Step 1: Multiply 438 by 9 (the digit in the ones place of 939) > > 438 > x 9 > _________ > 3942 (9 * 438 = 3942, Correct) >Step 2: Multiply 438 by 30 (3 in the tens place of 939, so we multiply by 3 and add a trailing zero) > > 438 > x 3 > _________ > 1314 (3 * 438 = 1314, Correct) >Now, add a trailing zero to this product: 13140 > >Step 3: Multiply 438 by 900 (9 in the hundreds place of 939, so we multiply by 9 and add two trailing zeros) > > 438 > x 9 > _________ > 3942 (9 * 438 = 3942, Correct) >Now, add two trailing zeros to this product: 394200 > >Step 4: Add the three products from steps 1, 2, and 3 > > 3942 (Step 1) > +13140 (Step 2) > +394200 (Step 3) > __________ >Step 5: Add the products in steps > >First, add Step 1 and Step 2: > > 3942 > +13140 > _________ > 17082 >Now, add this sum to the product from Step 3: > > 17082 > +394200 > _________ > 411282 >So, the product of 438 and 939 is 411,282.

It still often makes mistakes doing this, but it clearly has a good grasp of the concept.

It's able to correctly perform the high level long multiplication procedure for large numbers (haven't had the patience to see just how large) but is let down by the reliability of the arithmetic in constituent steps.

A lot of humans have the same problem.

