Viewing a single comment thread. View all comments

visarga t1_j0pakor wrote

> AI is fundamentally just predicting text

So it is a 4 stage process. Each of these stages has its own dataset, and produces its own emerging skill.

  • stage 1 - next word prediction, data: web text, skills: general knowledge, hard to control
  • stage 2 - multi-task supervised training, data: 2000 NLP tasks, skills: learn to execute prompts at first sight, doesn't ramble off topic anymore
  • stage 3 - training on code, data: Github + Stack Overflow + arXiv, skills: multi-step reasoning
  • stage 4 - human preferences -> fine tuning with reinforcement learning, data: collected by OpenAI with labellers, skills: the model obeys a set of rules and caters to human expectations (well behaved)

I don't think "pretend you're an AGI" is sufficient, it will just pretend but not be any smarter. What I think it needs is "closed loop testing" done on a massive scale. Generate 1 million coding problems, solve them with a language model, test the solutions, keep the correct ones, teach the model to write better code.

Do this same procedure for math, sciences where you can simulate the answer to test it, logic, practically any field that has a cheap way to test. Collect the data, retrain the model.

This is the same approach taken by Reinforcement Learning - the agents create their own datasets. AlphaGo created its Go dataset by playing games against itself, and it was better than the best human. AlphaTensor beat the best human implementation for matrix multiplication. This is the power of learning from a closed loop of testing - can easily go super human.

The question is how can we enable the model to perform more experiments and learn from all that feedback.

6

archpawn t1_j0r7z6c wrote

> I don't think "pretend you're an AGI" is sufficient, it will just pretend but not be any smarter.

You're missing my point. Pretending can't make it smarter, but it can make it dumber. If we get a superintelligent text prediction system, we'll still have to trick it into predicting someone superintellgent, or it will just pretend to be dumb.

1