Notes on: Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch\"olkopf, Bernhard, & Smola, A. (2012): A kernel two-sample test

Table of Contents

Notation

  • gretton_2012_f5137804281d3ed956eb8079171af344aa367cfc.png denotes a space of functions
  • gretton_2012_a5c63e104990f9832da7bb8b7c03defcd4552eda.png denotes a metric space
  • gretton_2012_a0b5d9cdf21d921c9f474ba47ff5ea047bbc7ec2.png are Borel probability measures defined on gretton_2012_76879b948635123ecd29d7cd65a1060145ba040e.png
  • gretton_2012_04016c2db0a754f30f8b3b0b87bb4fa9ea1528a8.png denotes a Hilbert space
  • gretton_2012_4b268f331764bf905b0c00b471b9071ba4cd7470.png and gretton_2012_429c15df78a8d0a037e41bf9c34da9eb6cff56c0.png are i.i.d. observations from gretton_2012_fefe9e556d399665a26a37824ec578cbffb0cabe.png and gretton_2012_ab437e1f9b3376761b155efe111c9860607c4b86.png, respectively
  • gretton_2012_1fa9ed5ff73b5eb87d8907c8ca94690dcbfee6ee.png and gretton_2012_6c1dd541564ce9815dda28d6cfb5eec4a580bcf6.png
  • gretton_2012_b8d46f0d6f108e8360b81d1caae260dbb659458e.png denotes the "feature mapping" where

    gretton_2012_02f7ee9db7ed7949e43983310aef4530899290a7.png

    where gretton_2012_4824fc1e4be75bef4827a4f5bc60b7d58d5cdce3.png is positive-definite and is the reproducing kernel on the RKHS, whos existence is guaranteed by the Riesz representation theorem.

Maximum Mean Discrepancy (MMD)

Let gretton_2012_f5137804281d3ed956eb8079171af344aa367cfc.png be a class of functions gretton_2012_cc722f4926a494a59674698aafd72b7b2d379882.png and let:

  • gretton_2012_a0b5d9cdf21d921c9f474ba47ff5ea047bbc7ec2.png be Borel probability measures
  • gretton_2012_4b268f331764bf905b0c00b471b9071ba4cd7470.png and gretton_2012_429c15df78a8d0a037e41bf9c34da9eb6cff56c0.png are i.i.d. observations from gretton_2012_fefe9e556d399665a26a37824ec578cbffb0cabe.png and gretton_2012_ab437e1f9b3376761b155efe111c9860607c4b86.png, respectively

We define the maximum mean discrepancy (MMD) as

gretton_2012_eaaa031e385f5eb2ce42a8c50a71787bcf2ff16c.png

In the statistics literature, this is known as an integral probability metric.

A biased emprical estimate of the gretton_2012_d357aa65c9a7f225122ad7e5fab27a5ca6abbcd8.png is obtained by replacing the population expectations with empirical expectations computed on the samples gretton_2012_0207be880056b9a69e22e729dd37bced29cd174a.png and gretton_2012_3ca302b80fb078e124ac6b194794815069394f1a.png:

gretton_2012_262b4584bd028c38677670c2c66ac8be1eec7a76.png

We must therefore identify a function class that is rich enough to uniquely identify whether gretton_2012_5c1debcdb3821e5c531e97e4ce3b247ddd2fe207.png, yet restrictive enough to provide useful finite sample estimates.

MMD in Reproducing Kernel Hilbert Spaces

We extend the notion of feature map to the embedding of a probability distribution we will define an element gretton_2012_0036bc3e2cdd7bae6d85d3a8a2024304d15d02a8.png such that

gretton_2012_39891c7c46f30ec7c624eef73f08b3da90d64ce2.png

which we call the mean embedding of gretton_2012_fefe9e556d399665a26a37824ec578cbffb0cabe.png.