Viewing a single comment thread. View all comments

gaymuslimsocialist t1_j45gwsl wrote

Any of the papers in the AlphaZero family do this: they combine tree search with learned policy and value functions.


FallUpJV t1_j45sb9f wrote

Would you recommend any of those or is there one in particular that introduces well such techniques?

Edit : I'm asking this because I'm not very familiar with what tree search means in the first place


cdsmith t1_j460nf2 wrote

Tree search means precisely that: searching a tree. In the context of AlphaZero, the tree is the game tree. That is:

  • I can move my pawn to e4. Then:
    • You could move your knight to c6
      • ...
    • Or you could move your pawn to e6
      • ...
    • Or ...
  • Or, I could move my pawn to d4. Then:
    • You could take my pawn with your pawn on c5.
      • ...
    • Or you could move your knight to c6.
      • ...
    • Or you could move your pawn to d5.
      • ...
    • Or ...
  • Or, I could ...

That's it. The possible moves at each game state, and the game states that they lead to, form a tree. (Actually more like a DAG, since transpositions are possible, but it's often simplified by calling it a tree.) Searching that tree up to a certain depth amounts to thinking forward that many moves in the game. The way you search the tree is some variation on minimax: that is, you want to choose the best move for yourself now, but that means at the next level down, you want to pessimistically only consider the best move for your opponent (which is the worst one for you), etc. Variations come in terms of what order you visit the various nodes of the tree. You could just do a straight-forward depth-first traversal up to a certain depth, in which case this is traditional minimax search. You can refuse to ever visit some nodes, because you know they can't possibly matter, and that's alpha-beta pruning. You could even visit nodes in a random order, changing the likelihood of visiting each node based on a constantly updated estimate of how likely it is to matter, and that's roughly what happens in monte carlo tree search. Either way, you're just traversing that tree in some order.

AlphaZero combines this with machine learning by using two empirically trained machine learning algorithms to tweak the traversal order of the tree, by identifying moves that seem likely to be good, as well as to evaluate partially completed games to estimate how good they look for each player. But ultimately, the machine learning models just plug into certain holes in the tree search algorithm.


dr3aminc0de t1_j46y5o7 wrote

How does this translate to the other Alpha* AIs? I’m thinking of AlphaCraft (star craft) which has too many move options to model it as a tree.