# Notes on: Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012): Foundations of machine learning

## Table of Contents

## 1 Notation

- \(\mathcal{X}\) denotes the input space
- \(\mathcal{Y}\) denotes the output space
- \(c: \mathcal{X} \to \mathcal{Y}\) denotes a
*concept* - \(C\) denotes a
*class of concepts* - \(h \in H\) denotes a
*hypothesis* - \(D\) denotes the
*underlying distribution* - \(S = \big( x_1, \dots, x_m \big)\) are i.i.d. drawn acoording to \(D\) with labels \(\big( c(x_1), \dots, c(x_m) \big)\) for some concept \(c \in C\)
- \(\text{size}(c)\) denotes the maximal cost of the
*computational representation*of \(c \in C\)

## 2 PAC Learning

A concept class \(C\) is said to be **PAC-learnable** if there exists an algorithm \(\mathcal{A}\) and a polynomial function \(\text{poly}(\cdot, \cdot, \cdot, \cdot)\) such that for any \(\varepsilon > 0\) and \(\delta > 0\), for all distributions \(D\) on \(\mathcal{X}\) for any target concept \(c \in C\), the followign holds

for any sample size \(m\) satisfying

\begin{equation*} m \ge \text{poly} \big( 1 / \varepsilon, 1 / \delta, n, \text{size}(c) \big) \end{equation*}
If \(\mathcal{A}\) further *runs* in \(\text{poly} \big( 1 / \varepsilon, 1 / \delta, n, \text{size}(c) \big)\), then \(C\) is said to be **efficiently PAC-learnable**. When such an algorithm \(\mathcal{A}\) exists, it is called a **PAC-learning algorithm** for \(C\).

We often call

- \(1 - \delta\) the
*confidence* - \(1 - \varepsilon\) the
*accuracy*

A concept class \(c\) is thus PAC-learnable if the hypothesis retured by the algorithm after observing a number of points polynomial in \(1 / \varepsilon\) and \(1 / \delta\) is **approximately correct** (with at most error \(\varepsilon\)) with **high probabiliy** (at least \(1 - \delta\)), which justifies the PAC-terminology.