# Exercises

## Table of Contents

## Information Theory, Inference and Learning Algorithms

### 6.14

Since there is no *covariance* between the different dimensions, i.e. the are independent of all with , we know that

where each of the are following , hence

Hence,

The variance is then given by

where

But since there is no covariance between the different , the second sum *vanishes*, and since

(which we knew from the hint 6.14 in the book). Hence

This all means that for large , we will have

And since will be neglible for large , compared to (of course assuming is finite), then

as watned. The "thickness" will simply be the , i.e. twice the variance of .

Either by:

- Computing an dimensional integral :)
- Empirically looking at for some and making use of the symmetry of the Gaussian to infer that all with same radius have the same probability, and that decreases when moves away (in whatever "direction" / dimension) from the mean

We can observe that the majority of the probability mass is clustered about this "shell".

## Bibliography

- [mackay2003information] MacKay, Kay & Cambridge University Press, Information Theory, Inference and Learning Algorithms, Cambridge University Press (2003).