# Notes on: Lanctot, M., Lis\`y, Viliam, & Bowling, M. (2014): Search in imperfect information games using online monte carlo counterfactual regret minimization

## Table of Contents

## 1 Notation

**Chance**\(c\) refers to the fixed stochastic strategy which is the environment (i.e. which card comes up on the flop in Texas Hold'em)- \(h \in H\) denotes a
*history* - \(i \in N\) indexes the players, and \(P(h)\) denotes the player who's turn it is
*next* - \(z \in Z \subseteq H\) denotes a
*terminal history*, which is a full game from start to end - \(u_i(z)\) denotes the
*payoff*for player \(i\) from the terminal history \(z\) - \(a \in \mathcal{A}\) denotes the action space and \(A(I)\) denotes the set of possible actions given infoset \(I \in \mathcal{I}_i\)
\(ha\) denotes the

\begin{equation*} h' = ha \end{equation*}*history*\(h\) followed by action \(a\), and \(h \sqsubset h'\) is notation for "sucessor history"Probability of terminal history under \(\sigma\)

\begin{equation*} \pi^{\sigma}(z) = \prod_{i \in N} \pi_i(z) \end{equation*}where

\begin{equation*} \pi_i(z) := \prod_{ha \sqsubset z, P(h) = i} \sigma_i \Big( I(h), a \Big) \end{equation*}I.e. the joint probability of the strategies by each player \(i\) in the case where the final action was taken by player \(i\)

- \(\sigma_{-i}\) and \(\pi_{-i}^\sigma\) refer to player i's
*opponent strategy*and products of the*opponent's*and*chance's*actions Behavioral strategy for player \(i\) is

\begin{equation*} \begin{split} \sigma_i: \quad & \mathcal{I}_i \to \mathcal{P} (\mathcal{A}) \\ & I \mapsto \sigma_i(I) := p \Big( A(I) \Big) \end{split} \end{equation*}**Counterfactual value**:where

\begin{equation*} Z_I = \left\{ (h, z) \mid z \in Z,\ h \in I, \ h \sqsubset z \right\} \end{equation*}

## 2 Stuff

An \(\epsilon \text{-Nash equilibrium}\), \(\sigma\), is a ordered tuple \((\sigma_1, \dots, \sigma_N)\) such that for any given player \(i\), the benefit of switching strategy from \(\sigma_i\) to some \(\sigma_i'\) is bounded by \(\epsilon\), i.e.

\begin{equation*} \Big( \max_{\textcolor{red}{\sigma_i'} \in \Sigma_i} \underset{\sigma_1, \dots, \textcolor{red}{\sigma_i'}, \dots, \sigma_N}{\mathbb{E}} [u_i(z)] \Big) - \underset{\sigma_1, \dots, \textcolor{green}{\sigma_i} , \dots, \sigma_N}{\mathbb{E}} [u_i(z)] \le \epsilon \end{equation*}
If \(\epsilon = 0\), it's called a **Nash equilibrium**.

A **zero-sum game** is a game such that

for some constant \(k\).

The **exploitability** of a profile \((\sigma_1, \dots, \sigma_N)\) is defined

i.e. difference between the optimial strategy for each player given the strategies of the other players.