Viewing a single comment thread. View all comments

thiru_2718 t1_j83kkbh wrote

Poker depends on looking far enough ahead to be able to play game theory optimal (GTO) moves that maximize the expected value over a long run of hands. You can train a transformer on a ton of data, and get it to predict context-specific plays, but if the number of possible decision-branches is growing exponentially, is this enough?

But honestly, I don't know much about these types of RL-type problems. How is AlphaGo structured?

3

lmtog OP t1_j84vc3x wrote

Thats what I'am not quite sure about. I assume the result would not be close to the nash equilibrium.

But I don't know since I have not worked with transformers before.

I think it comes down to, can we train a transformer with feedback on what hands were good and which ones were not. Looking at other responses it seems like that is not possible.

1