Submitted by Character_Bluejay601 t3_ypatwb in MachineLearning
icosaplex t1_ivjuazh wrote
Reply to comment by flapflip9 in [Project] Rebel Poker AI by Character_Bluejay601
That's actually what makes Rebel interesting. It's far from the first Poker AI to achieve similar levels of accuracy at equilibrium approximation in these various settings. But because of the particular way it's integrated neural function approximation into doing the heavy lifting that prior agents haven't done, it apparently gets away with only doing discretization on bet sizes. A lot of the other common stuff is absent: no hand abstraction (i.e. manually coding in when superficially different hands are equivalent or almost equivalent), no discretization of the probabilities for different actions, no hand range bucketing or ranking, no special heuristics for the particular round of the game you're on, etc. The neural net apparently just learns it all.
No doubt it would still be a serious project to re-implement from scratch and get the training to work.
flapflip9 t1_ivkuf7s wrote
Wouldn't the gametree get too large to store on GPU memory for poker? Unless of course you start making abstractions and compromises to fit into hardware constraints. I used to rely on PioSolver (a commercially available Nash equilibrium solver) a lot in my younger years, a shallow stacked post-flop tree could maybe be squeezed into 64GB ram and could be computed in a few seconds. But the entirety of the game tree, with preflop gameplay.. my superstitious peasant brain is telling me you can't trim your model down to small enough size to make it work. On the flip side.. Given how crazy well these large NLP/CV models are doing, learning poker should be a breeze.
icosaplex t1_ivn5lha wrote
Yep, it would be very large if you stored the entire game tree. But as I understand it, using a neural net in the right way, you don't have to any more, the same way that AlphaZero doesn't have to store the entire astronomically large game tree for Chess. Instead you rely on the neural net to learn and generalize across states.
Doing this in imperfect information games like Poker in a theoretically sound way (i.e. one that would converge to a true equilibrium in the limit of infinite model capacity and training time) obviously requires a lot more care, and plus you presumably also get the other practical challenges of neural function approximation - e.g. having to make sure it explores widely enough, doesn't overfit, etc. But it's still good enough apparently to be superhuman, and apparently if done right you can throw away practically all abstractions and just let the neural net learn on its own how to generalize between between all those states.
Viewing a single comment thread. View all comments