# Nonparametric Bayes

## Table of Contents

## Concepts

- Infinitively exchangeable
- order of data does not matter for the joint distribution.

## Beta distribution

### Overview

- Distribution over parameters for a binomial-distribution!
- So in a sense you're "drawing distributions"
- Like to think of it as simply putting some rv. parameters on the model itself, instead of simply going straight for estimating in a binomial distribution.

- Remember the function is a when
is an
*integer*.

## Dirichlet distribution

### Overview

- Generialization of Beta distribution, i.e. over multiple categorical variables,
i.e.
**distribution over parameters for a multionomial distribution.** - So if you say were to plot the Dirichlet distribution of some parameters
we obtain the simplex/surface of allowed values for these parameters
- "Allowed" meaning that they satisfy being a probability within the multinomial model, i.e.

- Got nice conjugacy properties, where it's conjugate to itself, and also multinomial distributions

### Generating Dirichlet from Beta

We can draw from a Beta by *marginalizing* over

This is what we call *stick braking*.

## Dirichlet process

### Overview

- Taking the number of parameters to go to .
- Allows arbitrary number of clusters => can grow with the data

### Taking

We do what we do in Generating Dirichlet from Beta, the "stick braking".
But in the Dirichlet *process* stick braking we do

And then we just continue doing this, drawing as follows:

Resulting distribution of is then

where is called the **Griffiths-Engen-McCloskey (GEM) distribution**.

To obtain a *Dirichlet process* we then do:

where can be any *probability measure*.

### Dirichlet process mixture model

Start out with Gaussian Mixture Model

Where our and are our priors of the Gaussian clusters. Which is the same as saying .

So, is a sum over dirac deltas and so will only take non-zero values where corresponds to some . That is, it just indexes the probabilities somehow. Or rather, it describes the probability of each cluster being assigned to.

i.e. , which means that drawing an assignment cluster for our nth data point, where the drawn cluster has mean , is equivalent of drawing the mean itself from .

i.e. the nth data point is then drawn from a normal distribution with the sampled mean and some variance .

The shape / variance could also be dependent on the cluster if we wanted to make the model a bit more complex. Would just have to add some draw for in our model.

## Lecture 2

### Notation

- which sums to 1 with probability one.
- is the
*dirac delta*for the element

### Stuff

- can be described as follows:
- Take a stick of length
- "Break" stick at the point corresponding to :
- "Break" the
*rest*of the stick by : - "Break" the
*rest*of the stick: - …
Then

We let

where is some underlying distribution

The we define the random variable

where is the

*dirac delta*for the element- The can even be
*functions*, if is a distribution on a separable Banach space!

- The can even be
Then

where denotes a

**Dirichlet process**Observe that defines a

*measure*!hence a is basically a

*distribution over measures*!So we have a

**random measure**where the σ-algebra is defined bywhere is the original σ-algebra

There's a very interesting property of the distribution.

Suppose is Brownian motion. Then consider the maximal points (i.e. new "highest" or "lowest" peak), then the *time* between these new peaks follow a !

We say a that a sequence of random variables is **infinitely exchangable** if and only if there exists an *unique* random measure such that

Then observe that what's known as the *Chinese restaurant process* is just our previous where we've *marginalized* over all the !

### Dirichlet as a GEM

Suppose we have *finite* number of samples from a *GEM* distribution .

Then,

Stochastic process on a σ-algebra.

A **complete random measure** is a random measure such that the draws are *independent*:

## Appendix A: Vocabulary

- categorical distribution
- distribution with some probability for the the class/label indexed by . So a multinomial distribution?
- random measure