Herrick

Gaussian Processes For Data Discovery

  • Created: 18 Dec 2017
  • Modified: 19 Mar 2019
  • Status: deprecated
  • Importance: 2

Table of Contents

    This is an excellent tool for pattern discovery!!

    Prerequisites

    • Understand Multivariate Normals
    • Covariance Matrices and Kernel Functions

    Introduction

    Here, I am taking some notes on Gaussian Processes from Gaussian Processses for Machine Learning by Rasmussen and Williams.

    We can model the correlation of data using Gaussian Processes. Essentially, we assume that for some fixed inputs, we can estimate an output with general mean and variance. R&W say that we can interpret Gaussian Processes (GP) from a function-space view, a distribution of functions with the inference taking place directly in the space of functions, and a weight-space view.

    Weight-Space

    To interpret a GP in the weight-space view, we consider Bayesian Linear Regression. We see that we can estimate the weights using Bayes’ rule

    where the likelihood, prior, and marginal likelihood (normalization constant) are given by

    As we can see, this simplifies to

    where ignoring the marginal likelihood.

    We can make estimations on the data by MAP estimation in which we predict the mode of the distribution.Then, for inferring on a test case, we can average over all parameter values such that our predictive destribution is given by

    To add non-linearity to our dataset, we can project our inputs int a feature space by applying some function onto our input data, which leads to idea that we use the kernel trick to rewrite our mapping such that we have , which allows us to focus more on the kernels to produce our predictions.

    Function Space View

    Definition 1   A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

    We typically write gaussian process as

    where the mean and the covariance function are denoted by

    Normally, we want to move to joint Gaussian distributions such that

    where the properties are determined by the kernel function. Hence, we would using conditioning and marginalization of a Gaussian to obtain the desired parameters for improving our estimate.

    RBF kernel In Practice

    Sources: Gaussian Processses for Machine Learning by Rasmussen and Williams