Gaussian Processes

Table of Contents

IMPORTANT

As of right now, must of the notes regarding this topic can be found in the notes for the book Guassian Proccesses for Machine Learning.

These will be moved in the future.

Automatic Relevance Determination

Consider the covariance function:

gaussian_processes_568924ee7447e7192ca649f3b7736eb133097524.png

The parameter gaussian_processes_a976e6585b88b8ac32c25aa40601bae67652003b.png is the length scale of the function along input dimension gaussian_processes_24db75342343379bc61072ed218d34613269537e.png. This implies that as gaussian_processes_2cada8817910caef7b43921ccd054e9105e0bfb6.png the function gaussian_processes_cdd1cc131da6040eca078917132a377727053c44.png varies less and less as a function gaussian_processes_3da554c79f5e1162f837ce2ad92b1907ddabdbf7.png, that is, the dth dimension becomes irrelevant.

Hence, given data, by learning the lengthscales gaussian_processes_77cc39a474e6b754bfb730ad2c5bb3282942b581.png is is possible to do automatic feature selection.

Resources

  • A Tutorial on Gaussian Processes (or why I don't use SVMs) by Zoubin Ghahramani A short presentation, providing an overview and showing how the objective function of a SVM is quite similar to a GP, but GP also has other nicer properties. He makes the following notes when comparing GPs with SVMs:
    • GP incorporates uncertainty
    • GP computes gaussian_processes_e03bc5885e49e52fb54b5f42248a1694443fcc87.png, not gaussian_processes_19df9a3aa81b05ad6664aaeb3bcc557d37228e73.png as SVM
    • GP can learn the kernel parameters automatically from data, no matter how flexible we make the kernel
    • GP can learn the regularization parameter gaussian_processes_e20dcda5f035650122343c61053a7c3ad6acacaa.png without cross-validation
    • Can combine automatic feature selection with learning using automatic relevance determination (ARD)