# Bias-Variance Tradeoff

## Table of Contents

## Bias

### Wikipedia definition

Defined as:

where the expectation is taken over , i.e. averaging over all possible observations.

Here we have assumed that the *real* model follows the same *model*
as we want to use as an estimator, and then we're looking at how the exected value of our
estimated parameter for this assumed model differs from the *real* parameter for
the assumed model.

### ESL definition

From The Elements of Statistical Learning, we have the following definition:

where is a *single* observation, therefore we interpret as creating
the same model over and over, and then taking the expectation of the predictions
of all these models, basically like we do in *bagging* (Bootstrap Aggregation).

Notice how this differs from the Wikipedia definition where we assume that
the estimator follows the same model as the *real* model, but simply using
(potentially) different parameters.

## Variance

## Bias-Variance tradeoff

From what I understand, this all makes sense when using models where we can analytically determine the bias and variance of our estimator, but how about more complex models which do not have a clear way of performing a bias-variance decomposition?

From what I can tell, people then use the following not-so-rigorous "definitions":

**Bias**relates to*underfitting*, which can be observed when the training-loss does not decrease further with more training, with the error still being quite large. This might be due to one of the following:- Our model is not be "complex" enough to handle the target function → increase complexity
- We're stuck in a local minima → might be worth changing optimizer (something with momentum, e.g. Adam)

**Variance**relates to*overfitting*, which can be observed when the difference between the loss on the train and test data is quite large, i.e. don't generalize well. This usually means we need to make use of some**regularization**methods.