Bias-Variance Tradeoff

Bias
- Wikipedia definition
- ESL definition
Variance
Bias-Variance tradeoff

Bias

Wikipedia definition

Defined as:

$\begin{equation*} \text{Bias}(\hat{\theta}) = \mathbb{E} \big[ \hat{\theta}} - \theta \big] \end{equation*}$ 1

where the expectation is taken over $p(x | \theta)$ , i.e. averaging over all possible observations.

Here we have assumed that the real model follows the same model as we want to use as an estimator, and then we're looking at how the exected value of our estimated parameter for this assumed model differs from the real parameter for the assumed model.

ESL definition

From The Elements of Statistical Learning, we have the following definition:

$\begin{equation*} \begin{split} Err(x_0) &= \mathbb{E}_{\tau} \Big[ \big(Y - \hat{f}(x) \big)^2 \Big] \\ &= \Big( \mathbb{E} \big[ \hat{f}(x) \big] - f(x) \Big)^2 + \mathbb{E} \Big[ \big( f(x) - \mathbb{E}[\hat{f}(x)] \big)^2 \Big] + \sigma_e^2 \\ &= \text{Bias}^2 + \text{Variance} + \text{Irreducible Error} \end{split} \end{equation*}$ 2

where $x$ is a single observation, therefore we interpret $\mathbb{E}[\hat{f}(x)]$ as creating the same model over and over, and then taking the expectation of the predictions of all these models, basically like we do in bagging (Bootstrap Aggregation).

Notice how this differs from the Wikipedia definition where we assume that the estimator follows the same model as the real model, but simply using (potentially) different parameters.

Variance