visarga

visarga t1_j202im8 wrote

> Co-founder of Neeva

Ok, so direct competition for search is commenting on Google. Maybe they want to imply they also have a language model that is special and closed, and worthy of receiving investments.

I don't believe what he says, there are no signs of that happening. On the contrary, it would seem the head of the pack is just 6-12 months ahead. Everything trickles down pretty quickly. There are still many roadblocks to AGI and no lab is within striking distance.

We already have nice language models, now we need something else - validation systems. So we can use our language models without worrying they would catastrophically hallucinate or miss a trivial thing. We want to keep the useful 90% and drop the bad 10%. It is possible to integrate web search, knowledge bases and python code execution into the model to keep it from messing up. This is what I see ahead, not the end of open research.

4

visarga t1_j1u0k93 wrote

It works both ways - to generate bullshit, and to estimate the social impact. In a recent paper researchers prompted GPT-3 with detailed persona descriptions. Then they asked a bunch of questions, like in a phone poll. They discovered that GPT-3 has detailed knowledge of the biases and likely responses of various populations and can estimate the real poll.

> Researchers with Brigham Young University have written a paper which I think is among the most significant things I’ve ever covered in this newsletter. Specifically, they do three social science experiments on GPT-3 and discover that GPT-3 has biases that are “fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups.”

source: GPT-3 can simulate people very, very well – social science might change

Soon, every influencer, politician, activist or busy body will optimise their messages before posting to attain maximum damage. All they need is to profile the target population and select the right personas for GPT-3.

5

visarga t1_j1my2tt wrote

If you want chatGPT to incorporate information from sources, you have to paste search results into the context. This can easily get 4000 tokens long. For each interaction afterwards, you pay the same 4000 tokens price as the history is very long. You would have to pay $1 after 10 replies.

You would need to do this when you want to summarise, or ask questions based on a reference article, or just use chatGPT as your top level above search, like you.com/chat

It's not cheap enough to use in bulk, for example to validate Wikipedia references. You'd need to call the model for millions of times.

12

visarga t1_j1jrold wrote

When a student is having difficulty understanding a concept, the AI can provide limitless assistance. It can repeat concepts that may not have been fully grasped, give concrete examples that may help the student understand the concept better, or provide graphical illustrations. It can also answer any questions the student might have. The AI can give the student as much practice as needed, with exercises tailored to their level, taking away the fear of difficulty. Over time, the AI will be able to track which strategies work or don't work for different students, and use language to motivate the student. This AI will be more pedagogical than a human can be, with the vast amount of experience it has to draw from - as it can learn from many more students than a human. It can also be fun, there's no reason the lesson can't be in pirate accent or played in VR. AI is pretty good at imagination, so I guess it won't be repetitive and boring.

5

visarga t1_j1hwxat wrote

Reply to comment by overlordpotatoe in Hype bubble by fortunum

Yes, it is possible for a model to have understanding, to the extent to which the model can learn the validity of its outputs. That would mean to create an agent-environment-goal setup and let it learn to win rewards. Grounding speech in experience is the key.

Evolution through Large Models

> This paper pursues the insight that large language models (LLMs) trained to generate code can vastly improve the effectiveness of mutation operators applied to programs in genetic programming (GP). Because such LLMs benefit from training data that includes sequential changes and modifications, they can approximate likely changes that humans would make. To highlight the breadth of implications of such evolution through large models (ELM), in the main experiment ELM combined with MAP Elites generates hundreds of thousands of functional examples of Python programs that output working ambulating robots in the Sodarace domain, which the original LLM had never seen in pre training. These examples then help to bootstrap training a new conditional language model that can output the right walker for a particular terrain. The ability to bootstrap new models that can output appropriate artifacts for a given context in a domain where zero training data was previously available carries implications for open endedness, deep learning, and reinforcement learning. These implications are explored here in depth in the hope of inspiring new directions of research now opened up by ELM.

3

visarga OP t1_j1dh3br wrote

Reply to comment by visarga in Will we run out of data? by visarga

You can generate junk data, but it is hard to generate quality data. Human text is diverse and interesting. But in the last 1-2 years there are many teams generating data - math, code, diverse prompted tasks, and not generating just solutions, but sometimes also new problems and tests.

For example, it used to be necessary to label thousands of responses to tasks in order to train the human feedback model that is used to fine-tune GPT-3. So only OpenAI had a very good dataset, developed in-house. And for that reason GPT-3 ruled.

But more recently "Constitutional AI" will take a list of behavioural rules, the so called constitution, and using then will generate its own feedback data, and reach almost the same effect with human labeled feedback model. So it is automating AI alignment.

1

visarga OP t1_j1ad4ku wrote

Reply to comment by Ne_Nel in Will we run out of data? by visarga

We're just beginning the journey of generating language data, until 2 years ago it was unthinkable. Today we have a bunch of generated datasets for math, code, multi-task tuning and behaviour based on rules. The trick is to validate whatever is generated.

2

visarga OP t1_j18n1i1 wrote

An interesting fact, the current dataset size is 1T words. All the skills of language models come from this one TeraWord. We can get 10 TWords after we finish scraping everything, after that it depends on finding other sources. Speech data is 10,000 TWords though.

4

visarga t1_j15tcrf wrote

> like a mathematical proof that using one intelligence to design another intelligence of equal complexity is an undecidable problem

No, it's not like that. Evolution is not a smart algorithm, but it created us and all life. Even though it is not smart, it is a "search and learn" algorithm. It does massive search, and the result of massive search is us.

AlphaGo wasn't initially smart. It was just a dumb neural net running on a dumb GPU. But after playing millions of games in self-play, it was better than humans. The way it plays is by combining search + learning.

So a simpler algorithm can create a more advanced one, given a massive budget of search and ability to filter and retain the good parts. Brute forcing followed by learning is incredibly powerful. I think this is exactly how we'll get from chatGPT to AGI.

3

visarga t1_j15s5tw wrote

It's not "complex patterns between neurons" we should care about, what will drive AI is more and better data. We have to beef up datasets of step by step problem solving in all fields. It's not enough to get the raw internet text, and we already used a big chunk of it, there is no 100x large version coming up.

But I agree with you here:

> whatever problems that remain over the horizon, there's a sort of exponential space that we are now in where those unknowns will quickly be reeled in

We can use language models to generate more data, as long as we can validate it to be correct. Fortunately problem validation is more reliable than open ended text generation.

For example, GPT-3 in its first incarnations didn't have chain-of-thought abilities, so no multi-step problem solving. Only after training on a massive dataset of code did this ability emerge. Code is problem solving.

The ability to execute novel prompts comes from fine-tuning on a dataset of 1000 supervised tasks. So they are Question-Answer pairs of many kinds. After seeing 1000 tasks, the model can combine and solve countless more tasks.

So it matters what kind of data is in the dataset. By discovering what data is missing and what are the ideal mixing proportions AI will advance further. This process can be largely automated, it mostly costs GPU and electricity. That is why it could solve the data problem. It is not dependent on us creating more data.

2

visarga t1_j0whrvn wrote

Dall-E 1, Flamingo and Gato are like that. It is possible to concatenate the image tokens with the text tokens and have the model learn cross-modality inferencing.

Another way is to use a very large collection of text-image pairs and train a pair of models to match the right text to the right image (CLIP).

They both display generalisation, for example CLIP is a zero-shot image classifier, so so convenient. And it can guide diffusion to generate images.

The BLIP model can even generate captions - used to fix low quality captions in the training set.

4

visarga t1_j0u0fag wrote

> it will learn on it's own.

For example, in any scientific field from time to time "literature review" papers get published. They cover everything relevant to a specific topic, trying to offer a quick overview with jumping points. We can ask GPT-3 to summarise and write review papers automatically.

We can also think of Wikipedia - 5 million topics, each one has its own article. We could use GPT-3 to write one article for each scientific concept, no matter how obscure, one review for each book, one article about each character in any book, and so on. We could have 1 trillion articles extracting all the known things. Then we'd have AI analyse these topics for contradictions, which comes naturally when you put together all the known information about a topic.

This would be a kind of wikiGPT, a model that learns all the facts from a generated corpus of reviews. It only costs electricity to make.

7

visarga t1_j0tz7zh wrote

What current AIs are lacking is a playground. The AI playground needs to have games, simulations, code execution, databases, search engines, other AIs. Using them the AI would get to work on solving problems. Initially we collect and then we generate more and more problems - coding, math, science, anything that can be verified. We add the experiments to the training set and retrain the models. We make models that can invent new tasks, solve them, evaluate the solution for errors and significance, and do this completely on their own, using just electricity and GPUs.

Why? This will add into the mix something the AI lacks - experience. AI is well read but has no experience. If we allow the model to collect its own experience then it would be a different thing. For example, after training on a thousand tasks, GPT-3 learned to solve any task at first sight, and after training on code it learned multi-step reasoning (chain of thought). Both of these - supervised multi task data and code are collections of solved problems, samples of experience.

27

visarga t1_j0pakor wrote

> AI is fundamentally just predicting text

So it is a 4 stage process. Each of these stages has its own dataset, and produces its own emerging skill.

  • stage 1 - next word prediction, data: web text, skills: general knowledge, hard to control
  • stage 2 - multi-task supervised training, data: 2000 NLP tasks, skills: learn to execute prompts at first sight, doesn't ramble off topic anymore
  • stage 3 - training on code, data: Github + Stack Overflow + arXiv, skills: multi-step reasoning
  • stage 4 - human preferences -> fine tuning with reinforcement learning, data: collected by OpenAI with labellers, skills: the model obeys a set of rules and caters to human expectations (well behaved)

I don't think "pretend you're an AGI" is sufficient, it will just pretend but not be any smarter. What I think it needs is "closed loop testing" done on a massive scale. Generate 1 million coding problems, solve them with a language model, test the solutions, keep the correct ones, teach the model to write better code.

Do this same procedure for math, sciences where you can simulate the answer to test it, logic, practically any field that has a cheap way to test. Collect the data, retrain the model.

This is the same approach taken by Reinforcement Learning - the agents create their own datasets. AlphaGo created its Go dataset by playing games against itself, and it was better than the best human. AlphaTensor beat the best human implementation for matrix multiplication. This is the power of learning from a closed loop of testing - can easily go super human.

The question is how can we enable the model to perform more experiments and learn from all that feedback.

6

visarga t1_j0mzjox wrote

Let me tell you one weird trick all artists hate. It's actually averages of gradients collected from training examples, not averages of the training examples themselves. Gradients represent what has been learned from each example, and can be added together regardless of the content of the examples without becoming all jumbled up.

For instance, one can add the gradient derived from an image of a duck to that derived from an image of a horse. This is only possible in the space of gradients, as opposed to the space of images. If it weren't for this trick we would not be discussing art in this sub.

But are gradients derived from an image subject to copyright restrictions, even when all mixed up over billions of examples? All individual influences are almost "averaged out" by the large numbers of examples. That's how SD breaks training examples into first principles and then can generate an astronaut on a horse even though it has never seen that - only possible if you go back to all the way to basic concepts.

3

visarga t1_j0m7kn4 wrote

I can confirm this. I did NER and most tokens are not names entities, so they are "other". It's really hard to define what "other" means, even with lots of text the model is unsure. No matter how much "other" I provide, I couldn't train a negative class properly.

2

visarga t1_j0k46hz wrote

It's a hard problem, nobody has a definitive solution. From my lectures and experience:

  • interval calibration (easy)

  • temperature scaling (easy)

  • ensembling (expensive)

  • Monte Carlo dropout (not great)

  • using prior networks or auxiliary networks (for OOD detection)

  • error correcting output codes (ECOC)

  • conformal prediction (slightly different task than confidence estimation)

Here's my Zeta-Alpha confidence estimation paper feed.

6