currentscurrents

currentscurrents OP t1_je7faup wrote

This seems to be the delay of the publishing process; it went up on arxiv in October but is getting attention now because it was finally published March 21st.

I think the most interesting change since October is that GPT-4 is much better at many of the tricky sentences that linguists used to probe GPT-3. But it's still hard to prove the difference between "understanding" and "memorization" if you don't know what was in the training data, and we don't.

17

currentscurrents OP t1_je631oa wrote

TL;DR:

  • This is a survey paper. The authors summarize a variety of arguments about whether or not LLMs truly "understand" what they're learning.

  • The major argument in favor of understanding is that LLMs are able to complete many real and useful tasks that seem to require understanding.

  • The major argument against understanding is that LLMs are brittle in non-human ways, especially to small changes in their inputs. They also don't have a real-world experience to ground their knowledge in (although multimodal LLMs may change this).

  • A key issue is that no one has a solid definition of "understanding" in the first place. It's not clear how you would test for it. Tests intended for humans don't necessarily test understanding in LLMs.

I tend to agree with their closing summary. LLMs likely have a type of understanding, and humans have a different type of understanding.

>It could thus be argued that in recent years the field of AI has created machines with new modes of understanding, most likely new species in a larger zoo of related concepts, that will continue to be enriched as we make progress in our pursuit of the elusive nature of intelligence.

81

currentscurrents t1_je34ui9 wrote

This is code for running the LLaMa model, sort of like llama.cpp.

It's a reimplementation of Facebook's original GPL-licensed open source client under a more permissive Apache license. The GPL requires all your other code to also be GPL, so you can't use it in closed-source projects.

This doesn't affect the license for the model weights, which you will still have to download from somewhere else.

12

currentscurrents t1_je1ai1i wrote

I asked it for a parody and got something similar to, but different from Weird Al's song: https://pastebin.com/FKrZiEi9

When I asked it to be original I got quite different lyrics: https://pastebin.com/uwpqAnyz

Here's the actual lyrics for reference. This reminds me of how you can get LLMs to be less toxic/biased just by telling them to treat people fairly.

2

currentscurrents t1_je14pi5 wrote

Clearly, the accuracy is going to have to get better before it can replace Google. It's pretty accurate when it knows what it's talking about, but if you go "out of bounds" the accuracy drops off a cliff without warning.

But the upside is that it can integrate information from multiple sources and you can interactively ask it questions. Google can't do that.

3

currentscurrents t1_je12d3k wrote

Nobody knows exactly what it was trained on, but there exist several datasets of published books.

>I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

They still might. But they don't have a strong motivation; it doesn't really directly impact their revenue because nobody's going to sit in the chatgpt window and read a 300-page book one prompt at a time.

6

currentscurrents t1_jdrt3gv wrote

I think all tests designed for humans are worthless here.

They're all meant to compare humans against each other, so they assume you don't have the ability to read and remember the entire internet. You can make up for a lack of reasoning with an abundance of data. We need synthetic tests designed specifically for LLMs.

11

currentscurrents t1_jdrpl3u wrote

I'm not really surprised. Anybody who's extensively used one of these tools has probably already run into their reasoning limitations.

Today's entire crop of self-supervised models can learn complex ideas, but they have a hard time manipulating them in complex ways. They can do a few operations on ideas (style transfer, translation, etc) but high-level reasoning involves many more operations that nobody understands yet.

But hey, at least there will still be problems left to solve by the time I graduate!

39

currentscurrents t1_jdn7spo wrote

Bigger models are more sample efficient for a given amount of data.

Scale is a triangle of three factors; model size, data size, and compute size. If you want to make more efficient use of data, you need to increase the other two.

In practice LLMs are not data limited right now, they're limited by compute and model size. Which is why you see models like LLaMa that throw huge amounts of data at smaller models.

4

currentscurrents t1_jdn0opn wrote

The Nvidia H100 marketing material does advertise a configuration for linking 256 of them to train trillion-parameter language models:

>With NVIDIA NVLink® Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads. The GPU also includes a dedicated Transformer Engine to solve trillion-parameter language models.

Doesn't necessarily mean GPT-4 is that big, but it's possible. Microsoft and Nvidia were working closely to build the new Azure GPU cloud.

7