# Notes on: Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch\"olkopf, Bernhard, & Smola, A. (2012): A kernel two-sample test

## Table of Contents

## Notation

- denotes a space of functions
- denotes a metric space
- are Borel probability measures defined on
- denotes a Hilbert space
- and are i.i.d. observations from and , respectively
- and
denotes the "feature mapping" where

where is positive-definite and is the reproducing kernel on the RKHS, whos existence is guaranteed by the Riesz representation theorem.

## Maximum Mean Discrepancy (MMD)

Let be a class of functions and let:

- be Borel probability measures
- and are i.i.d. observations from and , respectively

We define the **maximum mean discrepancy (MMD)** as

In the statistics literature, this is known as an integral probability metric.

A *biased* emprical estimate of the is obtained by replacing the population expectations with empirical expectations computed on the samples and :

We must therefore identify a function class that is rich enough to uniquely identify whether , yet restrictive enough to provide useful finite sample estimates.

### MMD in Reproducing Kernel Hilbert Spaces

We extend the notion of feature map to the embedding of a probability distribution we will define an element such that

which we call the **mean embedding** of .