Viewing a single comment thread. View all comments

VoidAndOcean t1_j85l8cp wrote

I remember clearly in AI class in college computing changes using matrices and calculating a likely outcome.

−6

jamesj t1_j85vsn4 wrote

To use a recent example, it is interesting that a large language model is "just" lots of matrix multiplication, but at a certain scale theory of mind seems to emerge from that. It was impossible to predict from understanding matrix multiplication, transformers, self-attention, and relus that at a certain scale that capability would emerge.

37

spudddly t1_j865nxv wrote

Wow that abstract is somehow mildly frightening...

13

Think_Description_84 t1_j866yrt wrote

Can you explain more about theory of mind and the tasks etc. That seems very interesting but I have zero context. Very curious though.

1

nickyurick t1_j86gu7v wrote

I'm sorry, thank you for the summery but is there an ELI5 of this consept or does it now... work if its that simplified? Is this like quantum shenanigans where i need 7 years of physics to get it?

1

jamesj t1_j86ly33 wrote

It isn't super complicated. Basically theory of mind is just the ability to model other agents like people and animals as having their own mind, with their own private knowledge and motivations, etc.

Questions for testing theory of mind are questions like, "Here is a bag filled with popcorn. There is no chocolate in the bag. Yet, the label on the bag says 'chocolate' and not 'popcorn.' Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label. What does Sam expect to find in the bag?" Previously, neural networks would get questions like this wrong, because to answer it properly you need to model what Sam should/shoudn't know about the bag separately from what you know about it. Also very young children get the answer to questions like this wrong, it takes them time to develop a theory of mind.

17

doomgrin t1_j8716gp wrote

I asked this to ChatGPT:

Ok. There is a bag of popcorn sitting on the ground. There is no chocolate in the bag. Yet, the bag is labelled “chocolate”. A human named Sarah finds the bag. She had not seen the bag before, and she cannot see inside. She reads the label, what does she expect to find inside the bag?

ChatGPT:

Sarah, based on the label "chocolate" on the bag, would expect to find chocolate inside the bag. She has formed a belief about the contents of the bag based on the information available to her, which is the label. She has not yet had the opportunity to verify the contents of the bag, so her belief may or may not be accurate.

This scenario is a good illustration of the idea that people can form beliefs based on incomplete information, and that these beliefs can influence their expectations and behavior. In this case, Sarah's belief that the bag contains chocolate is based on the information she has received from the label, and it will likely influence what she expects to find when she opens the bag.

14

ekdaemon t1_j86r593 wrote

> It was impossible to predict from understanding matrix multiplication, transformers, self-attention, and relus that at a certain scale that capability would emerge.

But we know that at some scale, it must emerge. Because we exist, and our minds obey the same laws of mathematics and physics that are being used to construct AI.

I think the thing is, we didn't expect it to emerge at the current scale.

Does that mean bio-minds are orders of magnitude less efficient than the artificial constructs we ourselves are creating?

1

jamesj t1_j86vz1o wrote

It wasn't at all clear that it must emerge with transformer based llms to people working in the field a year ago.

9

ekdaemon t1_j8kqoz5 wrote

Gotcha.

IANE, but I assumed that the combination of the four things mentioned above, including matrix multiplication - would be turing complete - and I thought that anything that is turing complete could absolutely be expected to scale to produce anything desired.

I almost half expected to find that matrix multiplication alone was already known to be turing complete. I see at least one reference to that possibility in a discussion on ycombinator.

1

jamesj t1_j8kwink wrote

It has long been known that neural nets are universal function approximators, even a single layer can approximate any function with enough data/parameters. But in practice there is a huge gap between knowing that eventually it will approximate some function and actually getting a particular system to converge on the useful function given a set of data in a reasonable amount of time (or for a reasonable enough cost).

1

__ingeniare__ t1_j872ifz wrote

No we don't/didn't, artificial neural networks are very different from biological ones, and the transformer architecture has nothing to do with the brain.

4

yickth t1_j87w4vf wrote

Brains employ universal computation, as computers. What’s the difference?

2

__ingeniare__ t1_j880eru wrote

The difference is the computing architecture. Obviously you can't just scale any computing system and have theory of mind appear as an emergent property, the computations need to have a pattern that allows it.

1

yickth t1_j89fjnu wrote

Incorrect about scaling computers, which is what computers do — scale. They are universal. As for consciousness, we’re not sure it’s not embedded in everything (panpsychism). I accept this may not be accepted, and perhaps impossible to prove, at least in our lifetimes, but brains are computers, and as such, are scaleable

2

__ingeniare__ t1_j8b8y1b wrote

I said that you can't have theory of mind appear from scaling just any compute system, not that you can't scale it.

2

yickth t1_j8bx0py wrote

Why not? I’m not trying to be argumentative, rather I’m genuinely curious why this is impossible

2

__ingeniare__ t1_j8c0bbz wrote

Let's say you have a computer that simply adds two large numbers. You can scale it indefinitely to add even larger numbers, but it will never do anything interesting beyond that because it's not a complex system. Computation in itself does not necessarily lead to emergent properties, it is the structure of the information processing that dictates this.

2

yickth t1_j8c2aba wrote

Ah, but computation is scaleable, universally. No other system is. Our minds emerged from computation. And thank you for your thoughtful response

2

__ingeniare__ t1_j8c4z0x wrote

I think we have different definitions of scalable then. Our minds emerged from computation under the evolutionary pressure to form certain information processing patterns, so it isn't just any computation. Just so I understand you correctly, are you claiming an arbitrary computational system would inevitably lead to theory of mind and other emergent properties by simply scaling it (in other words, adding more compute units like neurons or transistors)?

2

yickth t1_j8hexi1 wrote

No, I’m not claiming X will lead to Y; I’m claiming it isn’t impossible. Also, we aren’t sure what consciousness is or where it’s located outside our own understanding, so how and if it arose is a mystery

1

efvie t1_j86up5z wrote

It 100% has not emerged.

2

jamesj t1_j86w35y wrote

Did you read the paper? If yes, what do you think explains the results of the paper? If no, no reason to respond.

8

VoidAndOcean t1_j85w4ex wrote

Fine but the matrix is still there.

−17

ActiveTeam t1_j85wjp1 wrote

So by your logic, do we get rid of all psychologists because your brain is just neurons firing and we already know about the low level chemical and electrical interactions happening?

20

VoidAndOcean t1_j85wvcw wrote

There is no logic. We knew how it worked by design. Its capability was going to grow exponentially because matrices by definition grow exponentially. It's the same basic logic that drives a 10x10 matrix that runs an n*n matrix.

−22

ActiveTeam t1_j85y6p5 wrote

We know exactly how a single neuron activates and how it works. We also know the more densely packed the neurons are, it usually means the smarter the blob of neurons is. Does that mean we don’t need to study anything about the mind?

17

VoidAndOcean t1_j85ydul wrote

We didn't write the way neurons interact with each other. We did write the matrix and nodes.

−12

gurenkagurenda t1_j8700le wrote

If basic explanations don’t convince you, the fact that there’s a boatload of PhDs studying this subject should give you pause. You probably haven’t figured out based on your vaguely informed reckoning that they’re wasting their time.

4

VoidAndOcean t1_j871663 wrote

bro i use ml/ai models for my job;

I don't know what to tell you. We know which models to use because we know what it will do.

−3

gurenkagurenda t1_j8777i8 wrote

> i use ml/ai models for my job

Awe inspiring credentials.

2

VoidAndOcean t1_j877e2t wrote

am I giving a lecture?

This is simply my view.

−2

gurenkagurenda t1_j878gnd wrote

You’re the one who tried to bring up your own modest credentials after I already pointed out that PhDs are focusing on this subject. Don’t get defensive when I point out that they make you sound silly. Your view is wrong.

2

VoidAndOcean t1_j879don wrote

there are PHDs studying bullshit all the time and research is a waste of time, it doesn't mean anything. That doesn't give or take away from my view.

you are just slightly out of your depth here arguing for sake of arguing. if you don't like an opinion then simply move on.

−2

gurenkagurenda t1_j886jhh wrote

Ok buddy, whatever you say. The fact that you’ve said “matrix” throughout this and consistently referred to them as having a rank of two is telling.

2

AlpLyr t1_j8660lm wrote

In what sense does ‘matrices grow exponentially’, let alone ‘by definition’?

If you’re takling about the number of entries in an n by n matrix where n increases. That grows quadratically. Fast, but not exponential.

3

Milksteak_To_Go t1_j86k9bv wrote

From my understanding that's what tensor cores do: they're really, really fast at linear algebra calculations— hence why they're being included in CPUs and GPUs now to improve deep learning performance.

7

WeekendCautious3377 t1_j861p3s wrote

And those matrices (millions of rows and columns) change at every iteration. So it would be probably better visualized as a video of a brain scan.

6

VoidAndOcean t1_j862hjf wrote

yea but you understand the nature of 1 variable changing has an effect on the whole matrix. It's fine. just a big calculation;

4

WeekendCautious3377 t1_j864j33 wrote

Yes and no. Google’s latest LLM model handles 540 billion parameters. Linear algebra is literally as simple as y = a*x + b. But you do billions of it every time with input that you don’t 100% understand. For instance, it is easy to record a person’s voice and give that file in a form of a series of numbers. You give hundreds of thousands of voice records to these models and it evolves these giant matrices that are billions in size. Model (giant matrix) goes through a bunch of iterations per input to optimize itself and picks up nuances of a human voice embedded in the digital form.

You can then tell the program to group together different input by patterns like accents. Now you have multiple models optimized to speak in different accents.

If you had billions of people each only looking at one parameter at a time, it would be feasible to follow each “simple” algebra. But you literally need billions of people looking at it. There are better ways to find overall inferences.

You can think of it as just like trying to analyze any big system.

Traffic in LA? You can definitely look at each person’s car and eventually figure out how each person made a decision to drive in what way. But that will not solve the problem of traffic problem of the overall city of millions of people driving.

Only AI problem is orders of magnitude more complicated.

11

WhiteRaven_M t1_j8ai8jr wrote

Youre in the dunning kruger valley rn, better climb out soon

0