# Notes on: Jitkrittum, W., Xu, W., Szabo, Z., Fukumizu, K., & Gretton, A. (2017): A Linear-Time Kernel Goodness-Of-Fit Test

## Table of Contents

## TODO Checkout notebook

## DONE Prequisites

`[X]`

Stein's method`[X]`

Sobolev space`[X]`

U-statistic`[X]`

Mean-shift`[ ]`

Bahadur efficiency

Also, this is the "final" part of a series of three papers, hence you also ought to read the first two papers:

## Overview

- Linear time (wrt. # of data points)
- Learn
*test features*that best indicate differences between observed samples and a reference model, by minimizing the*false negative rate* - Features constructed via "Stein's method" => not necessary to compute normalization constant
- Proves that (under a mean-shift alternative), their test always has greater relative Bahadur efficiency than a previous lineratime kernel test, regardless of choice of parameters for that test
- For a lot of useful asymptotic theorems in probability theory and statistics, checkout serfling2009approximation. It's used a ton in this paper.

## Notation

- fits an observed sample from unknown distribution
- is tested against
- for the function is the unit-norm ball in a RKHS
^{1} - is an RKHS associated with a postive-definite kernel
denotes a

*feature-map*such that- Assume that so that so that
Kernelized Stein operator

where

and is due to the

*reproducing property*of .**Stein witness function**:

## Measuring discrepancy wrt. a model

- Stein operator for may be applied to a class of test functions, yielding functions with zero expectations under
- Classes of test functions can include the Sobolev space and RKHS
- Minimal variance unbiased estimate of KSD is a U-statistic, with computation cost quadratic in the number of samples from

### Kernel Stein Discrepancy (KSD) Test

- Assume data is connected open set
Consider Stein's operator that takes in the function

and constructs function

Constructed function has key property that

if and only if .

- Use this expectation to construct a statistic for goodness of fit

Statistical tests based on classes of Stein transformed RKHS functions, where is the norm of the *smoothness-constrained* function with largest expectation under ; referred to as **Kernel Stein Discrepancy (KSD)**.

where at we used is Bochner integrable as long as and is what we refer to as the **Stein witness function**.

The **KSD** can also be written

where

and

And unbiased estimate of is then denoted

which is a *degenerate* U-statistic under .

### Finite Set Stein Discrepancy (FSSD)

Let be random vectors drawn i.i.d. from a distribution which has a density.

Let be a connected open set in .

Assume that

- is universal (don't know what this means) and real analytic, i.e. , is a real analytic function on (an example of such a is the Gaussian kernel)

Then, for any , almost surely if and only if .

Hence, we have a linear-time way of computing an approximation to the *Stein discrepancy*!

## Relative Efficiency and Bahadur Slope

### Notation

- Hypothesis testing problem on parameter
- is a test statistic to be computed from sample of size , such that large vlaues of provide an evidence to reject

### Stuff

- Two given tests can be compared by computing the
**Bahadur efficiency**which is given by the ratio of the slopes of the two tests.

## Bibliography

- [gorham15_measur_sampl_qualit_with_stein_method] Gorham & Mackey, Measuring Sample Quality With Stein's Method,
*CoRR*, (2015). link. - [chwialkowski16_kernel_test_goodn_fit] Chwialkowski, Strathmann, & Gretton, A Kernel Test of Goodness of Fit,
*CoRR*, (2016). link. - [serfling2009approximation] Serfling, Approximation theorems of mathematical statistics, John Wiley & Sons (2009).

## Footnotes:

^{1}

RKHS are spaces of *bounded* linear operators, hence any has for some constant , we can just define and have $ {~{}} ≤ 1$ which would be in the unit ball.