# Notes on: Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch\"olkopf, Bernhard, & Smola, A. (2012): A kernel two-sample test

## Notation

• denotes a space of functions
• denotes a metric space
• are Borel probability measures defined on
• denotes a Hilbert space
• and are i.i.d. observations from and , respectively
• and
• denotes the "feature mapping" where

where is positive-definite and is the reproducing kernel on the RKHS, whos existence is guaranteed by the Riesz representation theorem.

## Maximum Mean Discrepancy (MMD)

Let be a class of functions and let:

• be Borel probability measures
• and are i.i.d. observations from and , respectively

We define the maximum mean discrepancy (MMD) as

In the statistics literature, this is known as an integral probability metric.

A biased emprical estimate of the is obtained by replacing the population expectations with empirical expectations computed on the samples and :

We must therefore identify a function class that is rich enough to uniquely identify whether , yet restrictive enough to provide useful finite sample estimates.

### MMD in Reproducing Kernel Hilbert Spaces

We extend the notion of feature map to the embedding of a probability distribution we will define an element such that

which we call the mean embedding of .