Submitted by Character_Bluejay601 t3_ypatwb in MachineLearning

Any ML/DL peeps interested in taking a stab at implementing the Rebel algorithm for poker by Facebook AI group, along with maybe making a proper game api for training, all as an open source? If you're not familiar with the paper, this is the link: Rebel

Let Me Know!

48

Comments

You must log in or register to comment.

flapflip9 t1_ivj6h5q wrote

The academic work on Poker AI is pretty extensive, so definitely start there. Heads-up no-limit Nash equilibrium solvers have been available for years. I've even seen 6-max Nash equilibrium solvers.. The only limiting factor seems to be the need for discretization: in order for the game tree not to branch out uncontrollably, decisions get reduced to things like fold/call/raising 20%/raising 50%/going all-in.. perfectly fine from an academic standpoint, just not very interoperable with human play.

All this to say that the challenging part of such a project isn't the framework/AI part, but rather the efficient implementation of the decision tree, hand range bucketing, hand ranking implementation, etc. If the output of such work is just another heads-up equilibrium solver, then it isn't particularly novel unfortunately. Even if it outperforms its peers/prior published work, it would be <1% improvement, as all these Nash solvers are pretty close to optimal today.

11

abstractcontrol t1_ivjjbfq wrote

Poker really brings out all the weaknesses of deep learning, it is hardly a solved thing. For example, if you log into Stars and do a HU SNG, you'll see that you start with 1,000 stacks and 10/20 blinds. That means you have 960 different raises + call + fold different actions to account for just in that small game. You also have large reward variance that deep RL algorithms can't deal with properly. Some algos like categorical DRL are just too memory inefficient to be used even on moderately large games. You'd be amazed at how much memory having around 1,000 different actions takes up once you start using mini-batches.

The academic SOTA is to just stick a tabular algorithm on top of some deep net, which is hardly elegant. All these algorithms are just hacks and I wouldn't use them for real money play.

6

icosaplex t1_ivjuazh wrote

That's actually what makes Rebel interesting. It's far from the first Poker AI to achieve similar levels of accuracy at equilibrium approximation in these various settings. But because of the particular way it's integrated neural function approximation into doing the heavy lifting that prior agents haven't done, it apparently gets away with only doing discretization on bet sizes. A lot of the other common stuff is absent: no hand abstraction (i.e. manually coding in when superficially different hands are equivalent or almost equivalent), no discretization of the probabilities for different actions, no hand range bucketing or ranking, no special heuristics for the particular round of the game you're on, etc. The neural net apparently just learns it all.

No doubt it would still be a serious project to re-implement from scratch and get the training to work.

2

suedepaid t1_ivjwvxc wrote

I would be super interested !!! I’ve run their Liar’s Dice implementation locally, and tried to adapt the algo for my own game, but would love to push into poker.

Where/how would you want to collaborate?

2

LetterRip t1_ivk4gfy wrote

> The academic SOTA is to just stick a tabular algorithm on top of some deep net, which is hardly elegant. All these algorithms are just hacks and I wouldn't use them for real money play.

They absolutely crush the best players in the game, and beat less than the best by absurd amounts.

While there are is a huge action space, it turns out that very few bet sizes are needed on early streets (4 is generally adequate), and the final street can be solved on the go.

5

flapflip9 t1_ivkuf7s wrote

Wouldn't the gametree get too large to store on GPU memory for poker? Unless of course you start making abstractions and compromises to fit into hardware constraints. I used to rely on PioSolver (a commercially available Nash equilibrium solver) a lot in my younger years, a shallow stacked post-flop tree could maybe be squeezed into 64GB ram and could be computed in a few seconds. But the entirety of the game tree, with preflop gameplay.. my superstitious peasant brain is telling me you can't trim your model down to small enough size to make it work. On the flip side.. Given how crazy well these large NLP/CV models are doing, learning poker should be a breeze.

1

flapflip9 t1_ivkw5cs wrote

For HU poker specifically: from what I recall, researchers were having rough upper bounds on how far their model is off from Nash equilibrium. It was down to something like less than 1 big blind(BB) per 100 hands or so, implying someone playing perfect NE could exploit such a bot by no more than that amount. So if you're playing 100$ BB HU, you'd hope to make at most 100$/100 hands - so not exactly a get rich quick scheme. I'm most likely off about the exact upper limit here, but I recall it being so small that for all practical purposes, HU poker is considered solved (the NE, that is).

Chess is different as the game tree is way bigger, you can't just lump possible moves together, some tactical lines only reward the player 7+ moves in the future, etc. No limit holdem has a lot of leaf nodes (all-in & folds) and a tree depth of about 7 on average or so. It's crazy how much more complex chess is. Think of how to this day we don't even know what's 'the best' opening; there are a few hundred perfectly playable openings (as ranked by AI), leading you down on a completely different gamepath each time.

1

icosaplex t1_ivn5lha wrote

Yep, it would be very large if you stored the entire game tree. But as I understand it, using a neural net in the right way, you don't have to any more, the same way that AlphaZero doesn't have to store the entire astronomically large game tree for Chess. Instead you rely on the neural net to learn and generalize across states.

Doing this in imperfect information games like Poker in a theoretically sound way (i.e. one that would converge to a true equilibrium in the limit of infinite model capacity and training time) obviously requires a lot more care, and plus you presumably also get the other practical challenges of neural function approximation - e.g. having to make sure it explores widely enough, doesn't overfit, etc. But it's still good enough apparently to be superhuman, and apparently if done right you can throw away practically all abstractions and just let the neural net learn on its own how to generalize between between all those states.

1

suedepaid t1_ivs1fm3 wrote

Still want to give it a shot?

2

Character_Bluejay601 OP t1_ivs80lj wrote

Seems like this post has gotten some attention. This is a challenging problem, and I believe that we don't have to create the most efficient implementation, we are just taking a stab at it. Whoever is interested in it, shoot me a message or reply here.

1

SerialPoopist t1_iy6br95 wrote

Private discord server for this could be nice

2