Search

Table of Contents

Definitions

Minimax

Minimax is a decision rule used for minimizing the possible loss of the worst case (maximum loss) scenario.

Tree search

alpha-beta pruning

Alpha-beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree.

It stops completely evaluating a move when at least one possibility has been found that proves the move to be worse than a previously examined move. Such moves need not to be evaluated further. When applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision.

It is an adverserial search algorithm.

Psuedo-code

def alpha_beta(node, depth, alpha, beta, is_maximizing_player):
    if depth == 0 or node.is_terminal_node:
        return node
    if is_maximizing_player:
        v = - np.inf
        for child in node.children:
            v = max(v, alpha_beta(child, depth - 1, alpha, beta, False))
            a = max(alpha, v)
            if beta <= alpha:
                break
        return v
    else:
        v = np.inf
        for child in node.children:
            v = min(v, alpha_beta(child, depth - 1, alpha, beta, True))
            beta = min(beta, v)
            if beta <= alpha:
                break
        return v

Monte Carlo Tree Search (MCTS)

Overview

  • Each node holds:
    • search_e10e2b430f95617381cdd6d6b52aed29fb971dff.png - the number of times this game-state has been visited
    • search_fe7108f38f2db55db9cb7eee3f63077437a2e510.png - the value of this game-state, which is computed by some function and the accumulated value of it's child-nodes

mcts_tic_tac_toe.png

Algorithm

  1. Start at some root state search_e492c7e9a676f3e91761745b1b16c14bee661dc3.png
  2. Perform action search_8d44cd5b3aaa1d39a7bc0113d3715143f965f94b.png to obtain the search_97eb714dfbd8abb06c6ee1fb2cb049cdaa7defd1.png "child-state"
  3. Compute values search_ab46261adcaca18ea00f0604f3ca4aa48b4feb58.png for each search_97eb714dfbd8abb06c6ee1fb2cb049cdaa7defd1.png child
  4. Update the parent state search_e492c7e9a676f3e91761745b1b16c14bee661dc3.png with the number of times we played in this state, i.e. search_56116e82ff6e28885f78d4078798d1681c76d32d.png where search_7ce2b3ed6d0fd41d3ffcdfc5015d78866e6c2969.png denotes the search_e10e2b430f95617381cdd6d6b52aed29fb971dff.png of the each of the child states
  5. Unroll child-nodes in a smiliar way, always propagating the counts and values up the tree.
  6. When some terminating condition has been reached, choose the optimal child of the root state search_e492c7e9a676f3e91761745b1b16c14bee661dc3.png as the next move.