Viewing a single comment thread. View all comments

MetaAI_Official OP t1_izfcvy5 wrote

We disentangle the complexity of the action space from the complexity of the planning algorithm by using a policy proposal network. For each game state we sample a few actions from the network - sets of unit-order pairs - and then do planning only among these actions. Now, in case of continuous actions we will have modify the policy proposal network, but that was already explored for other games with continuous action space such as StarCraft. - AB