Notes on: Jitkrittum, W., Xu, W., Szabo, Z., Fukumizu, K., & Gretton, A. (2017): A Linear-Time Kernel Goodness-Of-Fit Test

Table of Contents

TODO Checkout notebook

DONE Prequisites

Also, this is the "final" part of a series of three papers, hence you also ought to read the first two papers:

  1. [X] gorham15_measur_sampl_qualit_with_stein_method
  2. [ ] chwialkowski16_kernel_test_goodn_fit

Overview

  • Linear time (wrt. # of data points)
  • Learn test features that best indicate differences between observed samples and a reference model, by minimizing the false negative rate
  • Features constructed via "Stein's method" => not necessary to compute normalization constant
  • Proves that (under a mean-shift alternative), their test always has greater relative Bahadur efficiency than a previous lineratime kernel test, regardless of choice of parameters for that test
  • For a lot of useful asymptotic theorems in probability theory and statistics, checkout serfling2009approximation. It's used a ton in this paper.

Notation

  • jitkrittum17_linear_time_kernel_goodn_of_fit_test_1ed25cf746e36b58c8d980488e7f49fe0ccb1b39.png fits an observed sample jitkrittum17_linear_time_kernel_goodn_of_fit_test_4074db09d56b9766003a20c59d9b016ba25d6d44.png from unknown distribution jitkrittum17_linear_time_kernel_goodn_of_fit_test_85682d79718bd8157314a7ad4906f89fa00dbe96.png
  • jitkrittum17_linear_time_kernel_goodn_of_fit_test_52f1b9bd97a8bdae4eb6361fa1f23994440995e7.png is tested against jitkrittum17_linear_time_kernel_goodn_of_fit_test_ed52b9c7d9369e54360c3ee5322a96e26334f388.png
  • jitkrittum17_linear_time_kernel_goodn_of_fit_test_09ce3636afb059b763243ddcd85077f4c36ce3d8.png for the function jitkrittum17_linear_time_kernel_goodn_of_fit_test_97c3cb721cd6d76849d88aff3353c3110783a9ae.png is the unit-norm ball in a RKHS1
  • jitkrittum17_linear_time_kernel_goodn_of_fit_test_f5137804281d3ed956eb8079171af344aa367cfc.png is an RKHS associated with a postive-definite kernel jitkrittum17_linear_time_kernel_goodn_of_fit_test_4824fc1e4be75bef4827a4f5bc60b7d58d5cdce3.png
  • jitkrittum17_linear_time_kernel_goodn_of_fit_test_6c24ec894d32d53ee318abb2cd998a50e122450f.png denotes a feature-map such that

    jitkrittum17_linear_time_kernel_goodn_of_fit_test_e2353e3454ef333e53692795414e8d05cfc35a9a.png

  • Assume that jitkrittum17_linear_time_kernel_goodn_of_fit_test_3becaa79911af5d67f74ca356416f5aa9416cbae.png so that so that jitkrittum17_linear_time_kernel_goodn_of_fit_test_595fcc96559172cfee53a6613cac7870dba17739.png
  • jitkrittum17_linear_time_kernel_goodn_of_fit_test_697a42947dd514df987649b687a3b3ac93acfceb.png
  • Kernelized Stein operator

    jitkrittum17_linear_time_kernel_goodn_of_fit_test_4e3a5dca1f3fe477333e3d74cbe504bd0a171096.png

    where

    jitkrittum17_linear_time_kernel_goodn_of_fit_test_834b1506a138c54aac07a148c997e750e3a76455.png

    and jitkrittum17_linear_time_kernel_goodn_of_fit_test_7b0c13a8ba4b8ebda1f72501d43966c7cc499c13.png is due to the reproducing property of jitkrittum17_linear_time_kernel_goodn_of_fit_test_f5137804281d3ed956eb8079171af344aa367cfc.png.

  • Stein witness function:

    jitkrittum17_linear_time_kernel_goodn_of_fit_test_0793f2825b087f4e81176e8395980d75d05ca5da.png

Measuring discrepancy wrt. a model

  • Stein operator for jitkrittum17_linear_time_kernel_goodn_of_fit_test_fefe9e556d399665a26a37824ec578cbffb0cabe.png may be applied to a class of test functions, yielding functions with zero expectations under jitkrittum17_linear_time_kernel_goodn_of_fit_test_fefe9e556d399665a26a37824ec578cbffb0cabe.png
  • Classes of test functions can include the jitkrittum17_linear_time_kernel_goodn_of_fit_test_432a501de53d0444385d37eef1627e38097aaa30.png Sobolev space and RKHS
  • Minimal variance unbiased estimate of KSD is a U-statistic, with computation cost quadratic in the number of jitkrittum17_linear_time_kernel_goodn_of_fit_test_f07b932b12c91bca424fd428d383495413b5b6a9.png samples from jitkrittum17_linear_time_kernel_goodn_of_fit_test_ab437e1f9b3376761b155efe111c9860607c4b86.png

Kernel Stein Discrepancy (KSD) Test

  • Assume data is connected open set jitkrittum17_linear_time_kernel_goodn_of_fit_test_c52ecff14791b684ecc66f437cb2bb86a52ab4ab.png
  • Consider Stein's operator jitkrittum17_linear_time_kernel_goodn_of_fit_test_21b0c8a5ebaf36d2a85aef6af6dea09b5bf25774.png that takes in the function

    jitkrittum17_linear_time_kernel_goodn_of_fit_test_23989fb7c5a5d2d9b311d98ce355b22f9d6f0124.png

    and constructs function

    jitkrittum17_linear_time_kernel_goodn_of_fit_test_e09a3be97fe3109150c216fa8083bf4010ac9109.png

  • Constructed function has key property that

    jitkrittum17_linear_time_kernel_goodn_of_fit_test_ebeee91d324ce978fe7f0b54423443861cc00591.png

    if and only if jitkrittum17_linear_time_kernel_goodn_of_fit_test_5c1debcdb3821e5c531e97e4ce3b247ddd2fe207.png.

  • Use this expectation to construct a statistic for goodness of fit

Statistical tests based on classes of Stein transformed RKHS functions, where jitkrittum17_linear_time_kernel_goodn_of_fit_test_b3731d37cd447bd6c31809075a6be43b3d0b04ec.png is the norm of the smoothness-constrained function with largest expectation under jitkrittum17_linear_time_kernel_goodn_of_fit_test_ab437e1f9b3376761b155efe111c9860607c4b86.png; referred to as Kernel Stein Discrepancy (KSD).

jitkrittum17_linear_time_kernel_goodn_of_fit_test_80234e48678bead77e91371153fb9c320f645aa4.png

where at jitkrittum17_linear_time_kernel_goodn_of_fit_test_7b0c13a8ba4b8ebda1f72501d43966c7cc499c13.png we used jitkrittum17_linear_time_kernel_goodn_of_fit_test_66e944a994aa045a677aa58c4f497790b0d22899.png is Bochner integrable as long as jitkrittum17_linear_time_kernel_goodn_of_fit_test_d4ea8c5458ccd78a852b6cdd1386cc0f9b3d7573.png and jitkrittum17_linear_time_kernel_goodn_of_fit_test_abb6fbf28519d09eb422aec2d07f4aefca3f44b7.png is what we refer to as the Stein witness function.

The KSD jitkrittum17_linear_time_kernel_goodn_of_fit_test_18b3e63427439652f3861f019ed12bf94c3a6d7e.png can also be written

jitkrittum17_linear_time_kernel_goodn_of_fit_test_0e9dad0e5bd6bbe838f6520aade273f9a26aebe0.png

where

jitkrittum17_linear_time_kernel_goodn_of_fit_test_3350827608bf124c11660678440d4a3b156562ec.png

and

jitkrittum17_linear_time_kernel_goodn_of_fit_test_efd881e34a6d0ad73daab79b154d459428a91e83.png

And unbiased estimate of jitkrittum17_linear_time_kernel_goodn_of_fit_test_73d838832010b79373047e700c961e1f579625fd.png is then denoted

jitkrittum17_linear_time_kernel_goodn_of_fit_test_1893e2734889f2960fc545cb83b70f29d408a4d5.png

which is a degenerate U-statistic under jitkrittum17_linear_time_kernel_goodn_of_fit_test_8fcd1d67ba4f659e8d70b2a3d29954143abfc3aa.png.

Finite Set Stein Discrepancy (FSSD)

Let jitkrittum17_linear_time_kernel_goodn_of_fit_test_257e89fc3152e549a704cf6797e434d2602a0648.png be random vectors drawn i.i.d. from a distribution jitkrittum17_linear_time_kernel_goodn_of_fit_test_08a192acb8f7139212671c401a66093b728bdccb.png which has a density.

Let jitkrittum17_linear_time_kernel_goodn_of_fit_test_76879b948635123ecd29d7cd65a1060145ba040e.png be a connected open set in jitkrittum17_linear_time_kernel_goodn_of_fit_test_de0b57391ed8b0d9becf399c343de2bf83467ce1.png.

jitkrittum17_linear_time_kernel_goodn_of_fit_test_787af8f22ce3bf9ef12048001f7cfb818b067574.png

Assume that

  1. jitkrittum17_linear_time_kernel_goodn_of_fit_test_4824fc1e4be75bef4827a4f5bc60b7d58d5cdce3.png is jitkrittum17_linear_time_kernel_goodn_of_fit_test_2e9c147aaa6f40af3e0ad585e74eb88b02840705.png universal (don't know what this means) and real analytic, i.e. jitkrittum17_linear_time_kernel_goodn_of_fit_test_4467712b3035d5720677544d2b6da5ea275410c6.png, jitkrittum17_linear_time_kernel_goodn_of_fit_test_50982347e42e22290d6920368a510266fb59939e.png is a real analytic function on jitkrittum17_linear_time_kernel_goodn_of_fit_test_76879b948635123ecd29d7cd65a1060145ba040e.png (an example of such a jitkrittum17_linear_time_kernel_goodn_of_fit_test_094b02afce734f4ce51933d0093ef3d2da9f8123.png is the Gaussian kernel)
  2. jitkrittum17_linear_time_kernel_goodn_of_fit_test_ca7f025c5a7dbbee3586b77c198c0896c36cacc4.png
  3. jitkrittum17_linear_time_kernel_goodn_of_fit_test_43afbfbd77b0e552af15560e875ff00bc70d8df3.png
  4. jitkrittum17_linear_time_kernel_goodn_of_fit_test_34e0de333550fb66a781d9b5a7bfceeb34af5576.png

Then, for any jitkrittum17_linear_time_kernel_goodn_of_fit_test_23749ee875bad47f9bc30e575834135ce2b55d0f.png, jitkrittum17_linear_time_kernel_goodn_of_fit_test_08a192acb8f7139212671c401a66093b728bdccb.png almost surely jitkrittum17_linear_time_kernel_goodn_of_fit_test_a52ed1bc7351473b8ea2d93e69768702ef68976d.png if and only if jitkrittum17_linear_time_kernel_goodn_of_fit_test_5c1debcdb3821e5c531e97e4ce3b247ddd2fe207.png.

Hence, we have a linear-time way of computing an approximation to the Stein discrepancy!

Relative Efficiency and Bahadur Slope

Notation

  • Hypothesis testing problem on parameter jitkrittum17_linear_time_kernel_goodn_of_fit_test_e7a86163f39fa21d4a2ed66946369cdeb900ef42.png
  • jitkrittum17_linear_time_kernel_goodn_of_fit_test_007232100176ddbde2bfaef8230744380782546c.png
  • jitkrittum17_linear_time_kernel_goodn_of_fit_test_dfdce9004d5caac33aadf29f74f315e3b70c922a.png
  • jitkrittum17_linear_time_kernel_goodn_of_fit_test_64d5e863c69b92630868e9a837e007cb5e2ca162.png is a test statistic to be computed from sample of size jitkrittum17_linear_time_kernel_goodn_of_fit_test_f07b932b12c91bca424fd428d383495413b5b6a9.png, such that large vlaues of jitkrittum17_linear_time_kernel_goodn_of_fit_test_64d5e863c69b92630868e9a837e007cb5e2ca162.png provide an evidence to reject jitkrittum17_linear_time_kernel_goodn_of_fit_test_8fcd1d67ba4f659e8d70b2a3d29954143abfc3aa.png
  • jitkrittum17_linear_time_kernel_goodn_of_fit_test_c0ebb845805cbf66aea662534027913eda944360.png

Stuff

  • Two given tests can be compared by computing the Bahadur efficiency which is given by the ratio of the slopes of the two tests.

Bibliography

Footnotes:

1

RKHS are spaces of bounded linear operators, hence any jitkrittum17_linear_time_kernel_goodn_of_fit_test_433221f85495e34280e1c0d4a377575d82635cd9.png has jitkrittum17_linear_time_kernel_goodn_of_fit_test_6ddf68329cdd49b1f044ffc2fdaafc29fc9847da.png for some constant jitkrittum17_linear_time_kernel_goodn_of_fit_test_cdac8af94d987eed36834007cd80a0376b2c605e.png, we can just define jitkrittum17_linear_time_kernel_goodn_of_fit_test_cdd343eddacae3cd7a7c3f3ebfc3a07cc4422c27.png and have $ {~{}} ≤ 1$ which would be in the unit ball.