Processing math: 100%

Gaussian Processes

  • In Gaussian processes we treat each test point as a random variable
  • A multivariate Gaussian distribution has the same number of dimensions as the number of random variables
  • Can derive the posterior from the prior and our observations (ff)N((μμ),(K,KKT,K))

f, outcome value of training data; f, outcome values of test data

μ, mean of training data; μ, mean of test data

K, covariance of training data and training data, represents the similarity of two data sets

K, covariance of training data and test data

K, covariance of test data

LLT=Kfμ+LN(0,1)
  • Pros
    • Non-parametric approach
    • Predictions can interpolate observations
    • Predictions are probabilistic so that one can compute confidence intervals
    • Flexibility: you can use different kernels
  • Cons
    • They lose efficiency in high dimensional spaces – namely when the number of features exceeds a few dozens
In [134]:
import numpy as np
import matplotlib.pyplot as pl
In [135]:
def kernel(a, b, param):
    sqdist = np.sum(a**2,1).reshape(-1,1) + np.sum(b**2,1) - 2*np.dot(a, b.T)
    return np.exp(-.5 * (1/param) * sqdist)
In [136]:
# Training data
Xtrain = np.array([-4, -3, -2, -1, 1]).reshape(5,1)
ytrain = np.sin(Xtrain) + 10

# Test data
n = 50
Xtest = np.linspace(-5, 5, n).reshape(-1,1) # 50*1
In [144]:
# Calculate Covariance Matrices
param = 0.1
K = kernel(Xtrain, Xtrain, param) # 5*5
K_s = kernel(Xtrain, Xtest, param) # 5*50
K_ss = kernel(Xtest, Xtest, param) # 50*50
In [145]:
# Calculate mean and standard derivation from training data and test data
L = np.linalg.cholesky(K + 0.00005*np.eye(len(Xtrain))) # 5*5
Lk = np.linalg.solve(L, K_s) # L, 5*5, Lk, 5*50
mu = np.dot(Lk.T, np.linalg.solve(L, ytrain)).reshape((n,)) # 50*1, mu of test data

stdv = np.sqrt(np.diag(K_ss) - np.sum(Lk**2, axis=0)) # 50*1, std of test data
In [146]:
# Calculate Posterior f(X_test|X_training)
L = np.linalg.cholesky(K_ss + 1e-6*np.eye(n) - np.dot(Lk.T, Lk)) # 50*50
f_post = mu.reshape(-1,1) + np.dot(L, np.random.normal(size=(n,1))) # 50*1
In [140]:
pl.plot(Xtrain, ytrain, 'bs', ms=8)
pl.plot(Xtest, f_post)
pl.gca().fill_between(Xtest.flat, mu-2*stdv, mu+2*stdv, color="#dddddd")
pl.plot(Xtest, mu, 'r--', lw=2)
pl.axis([-5, 5, -5, 15])
pl.title('Three samples from the GP posterior')
pl.show()
In [149]:
# Calculate Prior f(X_test)
L = np.linalg.cholesky(K_ss + 1e-15*np.eye(n)) # 50*50
f_prior = np.dot(L, np.random.normal(size=(n,1))) # 50*50, 50*1, mu is 0
In [150]:
pl.plot(Xtest, f_prior)
pl.axis([-5, 5, -3, 3])
pl.title('Three samples from the GP prior')
pl.show()

Gaussian Process Regression (GPR)

In [153]:
from sklearn.datasets import make_friedman2
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel
In [154]:
X, y = make_friedman2(n_samples=500, noise=0, random_state=0)
In [159]:
kernel = DotProduct() + WhiteKernel()
model = GaussianProcessRegressor(kernel=kernel, random_state=0)

model.fit(X, y)
Out[159]:
GaussianProcessRegressor(alpha=1e-10, copy_X_train=True,
                         kernel=DotProduct(sigma_0=1) + WhiteKernel(noise_level=1),
                         n_restarts_optimizer=0, normalize_y=False,
                         optimizer='fmin_l_bfgs_b', random_state=0)
In [160]:
model.predict(X[:2,:], return_std=True)
Out[160]:
(array([653.08792288, 592.16905327]), array([316.68016218, 316.65121679]))

Gaussian Process Classification (GPC)

In [164]:
from sklearn.datasets import load_iris
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
In [166]:
X, y = load_iris(return_X_y=True)
In [168]:
kernel = 1.0 * RBF(1.0)
model = GaussianProcessClassifier(kernel=kernel, random_state=0)

model.fit(X, y)
Out[168]:
GaussianProcessClassifier(copy_X_train=True, kernel=1**2 * RBF(length_scale=1),
                          max_iter_predict=100, multi_class='one_vs_rest',
                          n_jobs=None, n_restarts_optimizer=0,
                          optimizer='fmin_l_bfgs_b', random_state=0,
                          warm_start=False)
In [171]:
model.predict_proba(X[:2,:])
Out[171]:
array([[0.83548752, 0.03228706, 0.13222543],
       [0.79064206, 0.06525643, 0.14410151]])