Member-only story
Gaussian Process Regression
A conceptual guide
Gaussian processes (GPs) are a flexible class of nonparametric machine learning models commonly used for modeling spatial and time series data. A common application of GPs is regression. For example, given incomplete geographical weather data, such as temperature or humidity, how can one recover values at unobserved locations? If one has good reason to believe the data is normally distributed, then a using a GP model could be a judicious choice. In what follows, we introduce the mechanics behind the GP model and then illustrate its use in recovering missing data.
The GP Model
Formally, a GP is a stochastic process, or a distribution over functions. The premise is that the function values are themselves random variables. When modeling a function as a Gaussian process, one makes the assumption that any finite number of sampled points form a multivariate normal distribution.
Why is this assumption useful? It turns out that Gaussian distributions are nice to work with because of their tractability. Indeed, the Gaussian family is self-conjugate and enjoys a number of properties such as being closed under marginalization and closed under conditioning. As such, GPs naturally jibe with Bayesian machine learning, which usually involves specification of priors — asserting one’s prior belief…