abstractcontrol

abstractcontrol t1_ivjjbfq wrote

Poker really brings out all the weaknesses of deep learning, it is hardly a solved thing. For example, if you log into Stars and do a HU SNG, you'll see that you start with 1,000 stacks and 10/20 blinds. That means you have 960 different raises + call + fold different actions to account for just in that small game. You also have large reward variance that deep RL algorithms can't deal with properly. Some algos like categorical DRL are just too memory inefficient to be used even on moderately large games. You'd be amazed at how much memory having around 1,000 different actions takes up once you start using mini-batches.

The academic SOTA is to just stick a tabular algorithm on top of some deep net, which is hardly elegant. All these algorithms are just hacks and I wouldn't use them for real money play.

6