Nonparametric Bayes

Table of Contents

Concepts

Infinitively exchangeable
order of data does not matter for the joint distribution.

Beta distribution

nonparameteric_bayes_95c4da9245329a0bad6bd11b2b7d4797ec2a71c8.png

Overview

  • Distribution over parameters for a binomial-distribution!
    • So in a sense you're "drawing distributions"
    • Like to think of it as simply putting some rv. parameters on the model itself, instead of simply going straight for estimating nonparameteric_bayes_f7bd8817ac01e2129d060dceac597e5232346933.png in a binomial distribution.
  • Remember the nonparameteric_bayes_12b2a1240e0f4c77cdd07ea0cd6ddc2b4e98482a.png function is a nonparameteric_bayes_f72d234611e0ec17510061c4846f2a73ffafe11a.png when nonparameteric_bayes_1dbee3eb7b7bd0453a691130ee5d0d2bc8d46f23.png is an integer.

Dirichlet distribution

nonparameteric_bayes_471b3e92649bf30259baffeff9223dcd14e84280.png

Overview

  • Generialization of Beta distribution, i.e. over multiple categorical variables, i.e. distribution over parameters for a multionomial distribution.
  • So if you say were to plot the Dirichlet distribution of some parameters nonparameteric_bayes_00ca5ebcaace663a08aa91a68d44ab6cb399608c.png we obtain the simplex/surface of allowed values for these parameters
    • "Allowed" meaning that they satisfy being a probability within the multinomial model, i.e. nonparameteric_bayes_029ace913cf1916f5ee562a01538ef378cb4569e.png
  • Got nice conjugacy properties, where it's conjugate to itself, and also multinomial distributions

Generating Dirichlet from Beta

We can draw nonparameteric_bayes_f7bd8817ac01e2129d060dceac597e5232346933.png from a Beta by marginalizing over nonparameteric_bayes_e321ad2201235b9ae2b33f9ebaaecde8d0e6acb5.png

nonparameteric_bayes_4b2f66d5edd9feb52d61b407a3c53255ca10072f.png

This is what we call stick braking.

nonparameteric_bayes_17aacfe6484112f0071478274c2a2d6c6f567be9.png

Dirichlet process

Overview

  • Taking the number of parameters nonparameteric_bayes_ad63c81b09a45c2afd6c497ff38a542e26c1423a.png to go to nonparameteric_bayes_f6f02b10995dc1353548c5ff27838dfc549df734.png.
  • Allows arbitrary number of clusters => nonparameteric_bayes_ad63c81b09a45c2afd6c497ff38a542e26c1423a.png can grow with the data

Taking nonparameteric_bayes_7cefbe24f3eb2168f4d11038d91b33157f30c2a9.png

We do what we do in Generating Dirichlet from Beta, the "stick braking". But in the Dirichlet process stick braking we do

nonparameteric_bayes_f1964a65b9340825053fd04d8c706139dff47f5c.png

And then we just continue doing this, drawing nonparameteric_bayes_41d0b247aac64c6f032d572447c22d4f142669e1.png as follows:

nonparameteric_bayes_f3b4b1f5e50908fe2417075c46f2b371be41618b.png

Resulting distribution of nonparameteric_bayes_41d0b247aac64c6f032d572447c22d4f142669e1.png is then

nonparameteric_bayes_36d7b699f9f1c01eedcdda09fb6a2bd8aa8c4ce3.png

where nonparameteric_bayes_50ac2a765917a1c8cbc2cbc054d3bd666de869cb.png is called the Griffiths-Engen-McCloskey (GEM) distribution.

To obtain a Dirichlet process we then do:

nonparameteric_bayes_be6fb715828f15ab75f1091227b304c9ccea21f0.png

where nonparameteric_bayes_8b46b89ea0fbe16704b2d053ca55ad40d48c8b33.png can be any probability measure.

Dirichlet process mixture model

Start out with Gaussian Mixture Model

nonparameteric_bayes_202438fd3cf875bbb73f5f3dd1269eee05813814.png

Where our nonparameteric_bayes_dc3dda5fbf9d111629db86b3e3690b394b7d7d32.png and nonparameteric_bayes_f992e42c2c120dd425fbadf62f2573acb2187637.png are our priors of the Gaussian clusters. Which is the same as saying nonparameteric_bayes_81ffacf76c4714070d883619618843c4a87321f0.png.

So, nonparameteric_bayes_8b46b89ea0fbe16704b2d053ca55ad40d48c8b33.png is a sum over dirac deltas and so will only take non-zero values where nonparameteric_bayes_cad26af896131c012a95632661192eddbf452a2c.png corresponds to some nonparameteric_bayes_41d0b247aac64c6f032d572447c22d4f142669e1.png. That is, it just indexes the probabilities somehow. Or rather, it describes the probability of each cluster nonparameteric_bayes_ad63c81b09a45c2afd6c497ff38a542e26c1423a.png being assigned to.

nonparameteric_bayes_05a0198e5e5d0f3664f9b9fbc3da23f1ad59f18c.png

i.e. nonparameteric_bayes_b1a3023b6a6535b340ac3b81b276890954e5a1f1.png, which means that drawing an assignment cluster for our nth data point, where the drawn cluster has mean nonparameteric_bayes_6fb77db428aab9095bbd47a71c5dbc7ee332a78c.png, is equivalent of drawing the mean itself from nonparameteric_bayes_8b46b89ea0fbe16704b2d053ca55ad40d48c8b33.png.

nonparameteric_bayes_7c348eff13705a422c4d364981202ab4c1012318.png

i.e. the nth data point is then drawn from a normal distribution with the sampled mean nonparameteric_bayes_1b596d1319beec296201260a1edac92d4f2ef4b0.png and some variance nonparameteric_bayes_017948b866be67b1a8e56a6b5f8848f823410c34.png.

The shape / variance nonparameteric_bayes_017948b866be67b1a8e56a6b5f8848f823410c34.png could also be dependent on the cluster if we wanted to make the model a bit more complex. Would just have to add some draw for nonparameteric_bayes_017948b866be67b1a8e56a6b5f8848f823410c34.png in our model.

Lecture 2

Notation

  • nonparameteric_bayes_5c40f22e2d9b77644ffdad199cf6e1235b80f558.png which sums to 1 with probability one.
  • nonparameteric_bayes_28ad3dedabcf1d85a322dde0cbc8974e3a8436c0.png
  • nonparameteric_bayes_051b45afe1e7d245f86a502141390b40e094843c.png is the dirac delta for the element nonparameteric_bayes_7704c33b2fbd3b93ee8f13ea48747b917fe49053.png

Stuff

  • nonparameteric_bayes_50ac2a765917a1c8cbc2cbc054d3bd666de869cb.png can be described as follows:
    • Take a stick of length nonparameteric_bayes_3be1dc2a46fc7eaac476e904328db4f8c1c7a1ba.png
    • nonparameteric_bayes_5de5b707c1694a82ca63d7809e83e143780c2835.png
    • "Break" stick at the point corresponding to nonparameteric_bayes_33f9efbb525c53462308678777f108558cef40f5.png:
      • nonparameteric_bayes_f0a97f527355a9a75d457ada71220e7ed4377821.png
    • nonparameteric_bayes_f9595f85c74026afdaf0e738a4f3bf4637b39276.png
    • "Break" the rest of the stick by nonparameteric_bayes_913efb8aa2df67d16a1136df65ae3feb8747e7a5.png:
      • nonparameteric_bayes_23b288cc43e43c93aa83be633cebbbd29074974a.png
    • nonparameteric_bayes_d715470541367b3db20ade2bbac600f24a0c1155.png
    • "Break" the rest of the stick:
      • nonparameteric_bayes_fd1becc866da716e0813d3429e52688f519fe035.png
    • Then

      nonparameteric_bayes_83049c3a0b50eb97fbdd69476e82503bf480f110.png

  • We let

    nonparameteric_bayes_3529c7576f21ef738798620115b8b709fcd4415a.png

    where nonparameteric_bayes_79913c70fd907334e4e91f062c27c809c0554248.png is some underlying distribution

  • The we define the random variable

    nonparameteric_bayes_08892338e8590a3a81234b856e8d550f16601d56.png

    where nonparameteric_bayes_051b45afe1e7d245f86a502141390b40e094843c.png is the dirac delta for the element nonparameteric_bayes_7704c33b2fbd3b93ee8f13ea48747b917fe49053.png

    • The nonparameteric_bayes_7704c33b2fbd3b93ee8f13ea48747b917fe49053.png can even be functions, if nonparameteric_bayes_79913c70fd907334e4e91f062c27c809c0554248.png is a distribution on a separable Banach space!
  • Then

    nonparameteric_bayes_73f7b366edc7fd1ee2c799c549072cc32104b27d.png

    where nonparameteric_bayes_4d750f12406c7332e4823811167cf1d7a53e9fed.png denotes a Dirichlet process

  • Observe that nonparameteric_bayes_8b46b89ea0fbe16704b2d053ca55ad40d48c8b33.png defines a measure!

    nonparameteric_bayes_5e9160cee783e5a901d7627c5a699a78e66e8423.png

    hence a nonparameteric_bayes_4d750f12406c7332e4823811167cf1d7a53e9fed.png is basically a distribution over measures!

  • So we have a random measure where the σ-algebra is defined by

    nonparameteric_bayes_fc3217f31b5e1b5bef74c51ba75158aa2466bcd4.png

    where nonparameteric_bayes_37d3d2a40172b2cf7b0e8ea91ec033c3e1f9ecbd.png is the original σ-algebra

There's a very interesting property of the nonparameteric_bayes_50ac2a765917a1c8cbc2cbc054d3bd666de869cb.png distribution.

Suppose nonparameteric_bayes_088d013d20e3c51e084d70450ce75d794cd3da91.png is Brownian motion. Then consider the maximal points (i.e. new "highest" or "lowest" peak), then the time between these new peaks follow a nonparameteric_bayes_50ac2a765917a1c8cbc2cbc054d3bd666de869cb.png!

We say a that a sequence of random variables nonparameteric_bayes_f17054973cc824591823c22fce7eedb3da2564bf.png is infinitely exchangable if and only if there exists an unique random measure nonparameteric_bayes_8b46b89ea0fbe16704b2d053ca55ad40d48c8b33.png such that

nonparameteric_bayes_dc6a780d7ddd97f2487528784a60e041cae5709b.png

Then observe that what's known as the Chinese restaurant process is just our previous nonparameteric_bayes_0000e3ca1a46cf75809c03a15c6e7fb8659b0c9e.png where we've marginalized over all the nonparameteric_bayes_fca38fea0e5afc5e59a249d534e11e7bf1c9beb6.png!

Dirichlet as a GEM

Suppose we have finite number of samples from a GEM distribution nonparameteric_bayes_66e223945b1e9984936660af08de144689e6a3c9.png.

Then,

nonparameteric_bayes_aeff291f52ec3d8d47662b84f357798966115826.png

Stochastic process on a σ-algebra.

A complete random measure is a random measure such that the draws are independent:

nonparameteric_bayes_b881b1a94e3ba9de51f1a63598094d57cb9c118c.png

Appendix A: Vocabulary

categorical distribution
distribution with some probability nonparameteric_bayes_41d0b247aac64c6f032d572447c22d4f142669e1.png for the the class/label indexed by nonparameteric_bayes_ad63c81b09a45c2afd6c497ff38a542e26c1423a.png. So a multinomial distribution?
random measure