Submitted by These-Assignment-936 t3_10y2mu0 in MachineLearning

I just finished reading the paper "Pre-Trained Language Models for Interactive Decision Making" (https://arxiv.org/abs/2202.01771). As I understand it, the authors are using a language model to "generate" an optimal path to an objective, in test environments like VirtualHome and BabyAI. Reinforcement and imitation learning are evaluated as ways for the model to self-improve.

This is the first time I've seen a language model being used to "solve a problem" that isn't a language one. It seems to open up so many new possibilties. Has this been done before? Are there other examples of LMs being used as decision engines? What's the state of the art? Any interesting applications you've seen?

Side question: I imagine there were AI approaches to navigating VirtualHome and BabyAI that were NOT language-model based. What is the standard modeling approach to these kinds of problems?

117

Comments

You must log in or register to comment.

currentscurrents t1_j7wf3u0 wrote

>What is the standard modeling approach to these kinds of problems?

The standard approach is reinforcement learning. It works, but it's not very sample-efficient and takes many iterations to train.

LLMs are probably so good at this because of their strong meta-learning abilities; during the process of pretraining they not only learn the task but also learn good strategies for learning new tasks.

This has some really interesting implications. Pretraining seems to drastically improve sample efficiency even if the pretraining was on a very different task. Maybe we could pretrain on a very large amount of synthetic, generated data before doing our real training on our finitely-sized real datasets.

40

avocadoughnut t1_j7xvd0p wrote

Makes me wonder if pretraining makes the model converge on essentially a more efficient architecture that we could be using instead. I'm hoping this thought has already been explored, it would be interesting to read about.

11

Sm0oth_kriminal t1_j7y6wv6 wrote

This is probably only the case in which there’s a very low “compression ratio” of model parameters to learned entropy.

Basically, if the model has “too many” parameters it can be distilled but we’ve found that, empirically, until that point is hit, transformers scale extremely well and are generally better than any other known architecture.

Another topic is sparsificafion, which takes a trained model and tries to cut out some percentage of weights that have a minimal output effect, then fine tuning that model. You can check out Neural Magic online and associated works… they can run models on CPUs that normally require GPUs

4

avocadoughnut t1_j7yaq8w wrote

I'm considering a higher level idea. There's no way that transformers are the end-all-be-all model architecture. By identifying the mechanisms that large models are learning, I'm hoping a better architecture can be found that reduces the total number of multiplications and samples needed for training. It's like feature engineering.

8

nikgeo25 t1_j7yjicm wrote

Know any papers related to their work? Magic sounds deceptive...

1

Borrowedshorts t1_j7xz19r wrote

Yes, and this example actually isn't all that impressive. Google demonstrated a LLM can significantly improve decision making for a real world robot, giving it a type of 'common sense'. Check out Palm-SayCan which is a collaboration of two models that can perform real world robotic tasks through the assistance of a language model.

18

___luigi t1_j7wf94b wrote

I guess that this is similar to trend that we have seen with “Decision Transformers” and how they have been proposed as a practical alternative for RL.

3

memberjan6 t1_j7z16sz wrote

Alpha family of ai was created to be a decision engine.

I would think the monolithic llm wouldn't be as effective as using a delegation to a decision ai model.

2

Live4evil t1_j7x4wea wrote

I wonder if a LLM model would be able to play chess any better than a RL model. I imagine you'd need to serialize the board into a natural language format.

1

pseudonerv t1_j7xo8ml wrote

We can train with the algebraic notation. Feed it all the games and see how it goes. Or make the LLM self-play. ChatGPT can emit the notations, but all it generates are illegal moves.

Edit: after some corrections, it actually proposed the correct moves. Here are the prompts:

ME: The following is a series of chess moves after opening: 1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 0-0 5. Bf4 d5 What is the best next move?

ChatGPT: The next move depends on the playing style and strategy of both players. However, here are a few options that are often played in this position:

  • e3 - This is a flexible move that allows the player to develop their bishop or castle kingside. It can also be useful to control the center.
  • dxc5 - Taking the pawn on c5 can lead to a tactical or strategic advantage for White. This move can open up the c-file and create a strong pawn structure for White.
  • Be2 - Developing the bishop to e2 can help to control the center and prepare to castle kingside.
  • h3 - This is a prophylactic move to prevent Black from playing Bh4, which can be a nuisance for White.

It's important to note that all these moves have potential drawbacks and should be evaluated based on the player's individual preferences, the position of the other pieces on the board, and Black's response.

ME: Some of the moves are illegal. Look at the board again.

ChatGPT: I apologize for the mistake. Here are the corrected moves:

  1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 d5 5. cxd5 Nxd5

In this position, some of the most common moves are:

  • e3 - This is a flexible move that allows the player to develop their bishop or castle kingside. It can also be useful to control the center.
  • Bd3 - Developing the bishop to d3 can help to control the center and prepare to castle kingside.
  • Qc2 - Developing the queen to c2 can help to control the center and prepare to castle kingside.
  • 0-0 - Castling kingside is a common move that helps to get the king to a safer square and connect the rooks.

It's important to keep in mind that these moves should be evaluated based on the player's individual preferences, the position of the other pieces on the board, and Black's response.

ME: No. The opening moves are: 1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 0-0 5. Bf4 d5

ChatGPT: Thank you for correcting me. Here are the moves as given:

  1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 0-0 5. Bf4 d5

In this position, some of the most common moves for White are:

  • e3 - This is a flexible move that allows the player to develop their bishop or castle kingside. It can also be useful to control the center.
  • Bg5 - Developing the bishop to g5 can put pressure on the Black Knight on f6.
  • Qb3 - Developing the queen to b3 can put pressure on the Black pawn on d5 and prepare to castle queenside.
  • cxd5 - Taking the pawn on d5 can open up the center and create a strong pawn structure for White.

It's important to keep in mind that these moves should be evaluated based on the player's individual preferences, the position of the other pieces on the board, and Black's response.

3

cthorrez t1_j7yjxdr wrote

That's not reasoning. It's spitting out semi-random moves. If you keep giving it more and more chances it increases the probability of getting a set which has some legal moves.

5

These-Assignment-936 OP t1_j7xmugq wrote

I’m not clever enough to fully understand what is required, but I’d be curious to see how an LM would behave in a DOTA setting. IIRC the OpenAI bot was RL-based and (1) took a lot of training cycles, (2) didn’t generalize all that well to different styles of adversarial play. Some of the qualities suggested by the papers linked by other commenters would seem advantageous…

Anybody else getting 90s text adventure flashbacks?

1

Better_Ad4061 t1_j7xyb2r wrote

I'm trying to make a decision transformer, but I can't quite figure out how to prompt it. I trained it on a chess dataset of (state, reward, move) but I don't know how to "prompt" it with the reward I would like.

1

visarga t1_j7yc08k wrote

You prompt it by reward. Let's say your top reward is 1.

you predict model(past history, state, 1) -> move

1