Gaussian Processes

1. Definition

A stochastic process is a set of random variables indexed by time: \(\{X_{t}; t\in T\}\)

A stochastic process is Gaussian if, every finite set \(\mathbf{X}_{t_1,...,t_k} = \{X_{t_1},...,X_{t_k}\}\) is a multivariate Gaussian.

1.1. Example

Brownian motion describes a stochastic process. Where the distance moved in \(\Delta t\) time is sampled according to: \(\Delta d \sim \mathcal{N}(0, \Delta t)\).

(Note that, samples that are closer together in time are more correlated)

It turns out that this is a Gaussian process. One thing that confused me at first was trying to understand how it could be possible that any set of random variables from the process could be jointly Gaussian. Isn't that a very specific thing to hope for? It helped me to think about conditioning on the multivariate Gaussian. So, conditioning on \(X_{t_1}\) gets us the distribution \(P(\mathbf{X}_{t_2,...,t_k} \mid x_{t_1})\), which is a "slice", where \(x_{t_1}\) is part of the realization. This conditional distribution is also multi-variable Gaussian, and this makes a little sense, because we know that the points are sampled according to a Gaussian, but the mean of that Gaussian depends on where the points from previous timesteps lie.

1.2. Gaussian Processes as distributions over functions

We can sample from the distribution of Brownian walks to get a realization. Each realization is really just a function from time to position. In this sense, sampling a Brownian walk should be thought of as sampling from a distribution of functions.

What defines one of these distributions?

First, we need a mean function \(m(t)\)
Second, we need a covariance function \(k(t, t')\) defined for every pair \((t,t')\)

For every set of random variables \(\mathbf{X}_{t_1,...,t_k}\), \[ \mathbf{X}_{t_1,...,t_k} \sim \mathbf{N}(m(t_1,...,t_k), K(X_{t_1}, ..., X_{t_k})) \]

where \(K(X_{t_1}, ..., X_{t_k})\) is the covariance matrix between the variables.

2. As a way to do regression

Here are some thoughts now that I've seen GPs used in practice:
- It's a very flexible, non-parametric, form of regression (see also parametric vs non-parametric models)
- You have your input, these are a few fixed data points, and you want to draw all possible curves through these data points
- What's your hypothesis space? We want all functions that do something reasonable, where reasonable is a little like asking for continuity. That is, the functions should stick close to the fixed data points when they are close to the fixed points
- One way you could imagine describing this sort of distribution around a fixed point is by saying that among all such reasonable functions, you get a normal distribution, that centers on the fixed point. Or at least this is how I would motivate the usage of GPs

Gaussian Processes

1. Definition

1.1. Example

1.2. Gaussian Processes as distributions over functions

2. As a way to do regression

3. Sources