Looking at the current research it seems like Monte Carlo CFR is the defacto standard (Pluribus).

But are transformers able to be trained on poker as well?

Lets say we encode hands into something like 5h (5 of hearts) and also pass along info of the current game state like p1:raise:2bb, p2:fold and p3:call:2bb. Would the Model be able to predict what hands I should be playing? Lets say we train the model by playing against itself and feed back the result to train the model this way.

This is just an idea and I have not dove into transformers too much so there might be something that I'am missing.

What are your thoughts on this?

Comments

You must log in or register to comment.

thiru_2718 t1_j83kkbh wrote on February 11, 2023 at 11:10 AM

Poker depends on looking far enough ahead to be able to play game theory optimal (GTO) moves that maximize the expected value over a long run of hands. You can train a transformer on a ton of data, and get it to predict context-specific plays, but if the number of possible decision-branches is growing exponentially, is this enough?

But honestly, I don't know much about these types of RL-type problems. How is AlphaGo structured?

[deleted] t1_j84odzp wrote on February 11, 2023 at 4:33 PM

[removed]

lmtog OP t1_j84vc3x wrote on February 11, 2023 at 5:21 PM

Thats what I'am not quite sure about. I assume the result would not be close to the nash equilibrium.

But I don't know since I have not worked with transformers before.

I think it comes down to, can we train a transformer with feedback on what hands were good and which ones were not. Looking at other responses it seems like that is not possible.

IronRabbit69 t1_j84njph wrote on February 11, 2023 at 4:28 PM

Tabular CFR can be approximated with a neural network, as Noam Brown (1st author of Pluribus) and co-authors show in follow-up work: https://arxiv.org/abs/1811.00164

But you're comparing apples to oranges a bit asking if transformers can replace CFR. Transformers are a neural net architecture. You could of course encode poker stuff in text and feed that to a transformer which predicts the right move to play. But how do you train that network? CFR is a self-play learning algorithm (sort of like Alphago's MCTS) which learns good policies.

lmtog OP t1_j84uk0n wrote on February 11, 2023 at 5:16 PM

I think the training part is what I was missing.

I thought you would train a transformer like a normal neural net in the sense that you tell it what output you like and what is wrong.

Looking into it a bit more I assume you could get an output but nothing close to the nash equilibrium.

Thank you for the feedback.

bubudumbdumb t1_j84mygn wrote on February 11, 2023 at 4:24 PM

The strength of transformers lies in the transfer of representations learned over large corpuses of text or images. Those are less likely to bring capabilities that generalise to pocker so traditional RL and Monte Carlo approaches are likely to have the upper hand. Pocker's challenges are not linguistic or visual perspective challenges.

lmtog OP t1_j84uw2j wrote on February 11, 2023 at 5:18 PM

But technically it should be possible to train the model on hands, in the mentioned representation, and get an output that would be a valid poker play?

bubudumbdumb t1_j84w7r2 wrote on February 11, 2023 at 5:27 PM

Correct but the goal is not to train but to infer. I am not saying it wouldn't work just that I don't see why the priors of a transformer model would work better than RNNs or LSTMs in modeling the rewards of each play. Maybe there is something that I don't get about pocker that maps the game to graphs that can be learned through self attention.

[deleted] t1_j84ppri wrote on February 11, 2023 at 4:42 PM

[removed]