monte carlo tree search

1. setting

Explore the state space of an environment, e.g. a boardgame
We will build a tree where the root is the starting state, the children of the root are the states reachable in one move, and so on
- each node in this tree will store a ratio of the number of winning playouts to the number of total playouts
  - a playout is a random simulation of a game starting from that state

selection – starting at the root, choose children nodes until we reach a leaf. How should we make our choices? One heuristic is to choose nodes that have many successful playouts (exploitation). Another heuristic is to choose nodes which lead to parts of the state space we haven't visited (exploration)
expansion – once we reach a leaf node, if the game state can be continued, select a child node to add to the tree
simulate – simulate a playout of the game from that state, e.g. by randomly selecting moves until the game ends
backpropagate – according to the result of the playout, update the leaf node and all nodes along the path back to the root