Sunday, December 24, 2017

Monte Carlo Tree Search / Upper Confidence Bounds Introduction

Monte Carlo Tree Search is the search tree representation I decided to go for in this Puyo AI project.

First: you must have an understanding of game trees. A game AI "thinks" by exploring the Game Tree. Each node represents a board state, with the root-node being a representation of the current board. The children of this node are all of the immediate actions the AI could take. For example, in Chess, a child node could be "Pawn to E5", or in Puyo it could be "Rotate Twice and Drop in Column 3". The specifics of how this game tree is built up depends on the algorithm.

The "value" of a node is a heuristic that the programmer makes up. In chess, this could be "Queens are 9 points and Rooks are 5 points". In Go, most AIs try to evaluate a probability of which player will win from a particular board position. (This move has a 40% chance of winning, or that move has a 60% chance of winning).

In the case of Monte Carlo Tree Search, there are four steps that are tightly looped.

  1. Selection -- A process where the most "interesting" node is chosen. In the case of UCT (Upper Confidence Bounds applied to Trees), selection searches the nodes with the highest value and the fewest visits.
  2. Expansion -- Once a node is selected, a new child-node is added to the tree at that location.
  3. Simulation -- This calculates the "value" of a particular node. In the case of UCT, you play a random game all the way to the "end". For my Puyo AI, I usually cap the simulation to 20-moves (subject to future research). 
  4. Backpropagation -- Once statistics are calculated from the simulation, you change the value of the tree to match the simulation results. From there, the process begins at #1: Selection and the loop continues.

In the animation above, I assume that backpropagation is a simple "maximum" function, but every step of this process has been customized by various researchers. In any case, the growth of this tree is the AI pondering about future steps. In the case of AlphaGo, a self-trained neural network was used as both the "Selection" and "Simulation" algorithms. In the case of UCT, simple-random searches serve as simulation, while selection is a simple function. So there are lots of different ways to implement this methodology.

The overall concept of MCTS is just the above four steps, and many Google Searches will get you this far. But in my opinion, the devil is in the details... details that don't seem to be talked about too much yet. I'll elaborate on these details in a later post.

No comments:

Post a Comment