Lab 3: Gaussian process regression

Machine Learning 1, September 2015

  • The lab exercises should be made in groups of two, three or four people.
  • The deadline is October 25th (Sunday) 23:59.
  • Assignment should be sent to Philip Versteeg (p.j.j.p.versteeg@uva.nl). The subject line of your email should be "lab#_lastname1_lastname2_lastname3".
  • Put your and your teammates' names in the body of the email.
  • Attach the .IPYNB (IPython Notebook) file containing your code and answers. Naming of the file follows the same rule as the subject line. For example, if the subject line is "lab01_Kingma_Hu", the attached file should be "lab01_Kingma_Hu.ipynb". Only use underscores ("_") to connect names, otherwise the files cannot be parsed.

Notes on implementation:

  • You should write your code and answers in an IPython Notebook: http://ipython.org/notebook.html. If you have problems, please contact us.
  • Among the first lines of your notebook should be "%pylab inline". This imports all required modules, and your plots will appear inline.
  • NOTE: Make sure we can run your notebook / scripts! $\newcommand{\bx}{\mathbf{x}}$ $\newcommand{\bxp}{\mathbf{x}^{'}}$ $\newcommand{\bw}{\mathbf{w}}$ $\newcommand{\bt}{\mathbf{t}}$ $\newcommand{\by}{\mathbf{y}}$ $\newcommand{\bm}{\mathbf{m}}$ $\newcommand{\bb}{\mathbf{b}}$ $\newcommand{\bS}{\mathbf{S}}$ $\newcommand{\ba}{\mathbf{a}}$ $\newcommand{\bz}{\mathbf{z}}$ $\newcommand{\bv}{\mathbf{v}}$ $\newcommand{\bq}{\mathbf{q}}$ $\newcommand{\bp}{\mathbf{p}}$ $\newcommand{\bh}{\mathbf{h}}$ $\newcommand{\bI}{\mathbf{I}}$ $\newcommand{\bX}{\mathbf{X}}$ $\newcommand{\bT}{\mathbf{T}}$ $\newcommand{\bPhi}{\mathbf{\Phi}}$ $\newcommand{\bW}{\mathbf{W}}$ $\newcommand{\bV}{\mathbf{V}}$ $\newcommand{\xm}{\mathbf{x}_m}$ $\newcommand{\xn}{\mathbf{x}_n}$ $\newcommand{\y}{\mathbf{y}}$ $\newcommand{\K}{\mathbf{K}}$ $\newcommand{\zero}{\mathbf{0}}$ $\newcommand{\yi}{\y_i}$ $\newcommand{\thetav}{\mathbf{\theta}}$ $\newcommand{\t}{\mathbf{t}}$ $\newcommand{\x}{\mathbf{x}}$ $\newcommand{\tN}{\mathbf{t}_N}$ $\newcommand{\xN}{\mathbf{x}_N}$ $\newcommand{\k}{\mathbf{k}}$ $\newcommand{\C}{\mathbf{C}}$ $\newcommand{\CN}{\mathbf{C}_N}$ $\newcommand{\KN}{\mathbf{K}_N}$ $\newcommand{\eyeN}{\mathbf{I}_N}$

Submitters:

  • Arthur Bražinskas
  • Minh Ngo

Gaussian process regression

For this Lab we will be refer to Bishop sections 6.4.2 and 6.4.3. You may also want to refer to Rasmussen's Gaussian Process text which is available online at http://www.gaussianprocess.org/gpml/chapters/ and especially to the project found at http://www.automaticstatistician.com/index.php by Ghahramani for some intuition in GP. To understand Gaussian processes, it is highly recommended understand how marginal, partitioned Gaussian distributions can be converted into conditional Gaussian distributions. This is covered in Bishop 2.3 and summarized in Eqns 2.94-2.98.

Sinusoidal Data

We will use the same data generating function that we used previously for regression. You can change sigma/beta, but keep it reasonable. Definitely play around once you have things working. Make use of these functions as you wish.


In [1]:
%pylab inline
import pylab as pp


Populating the interactive namespace from numpy and matplotlib

In [2]:
sigma = 0.5
beta  = 1.0 / pow(sigma,2) # this is the beta used in Bishop Eqn. 6.59
N_test = 100
x_test = np.linspace(-1,1,N_test)
mu_test = np.zeros( N_test )

In [3]:
def true_mean_function( x ):
    return np.sin( 2*pi*(x+1) )

def add_noise( y, sigma ):
    return y + sigma*np.random.randn(len(y))

def generate_t( x, sigma ):
    return add_noise( true_mean_function( x), sigma )

In [4]:
y_test = true_mean_function( x_test )
t_test = add_noise( y_test, sigma )
pp.plot( x_test, y_test, 'b-', lw=2)
pp.plot( x_test, t_test, 'go')
pp.plot()


Out[4]:
[]

1. Sampling from the Gaussian process prior (30 points)

We will implement Gaussian process regression using the kernel function in Bishop Eqn. 6.63.

1.1 k_n_m( xn, xm, thetas ) (10 points)

To start, implement function "k_n_m( xn, xm, thetas )" that takes scalars $\xn$ and $\xm$, and a vector of $4$ thetas, and computes the kernel function Bishop Eqn. 6.63 (10 points).


In [5]:
def k_n_m(xn, xm, thetas):
    return thetas[0] * np.exp(-thetas[1] * (xn - xm) ** 2/ 2) + thetas[2] + thetas[3] * xn * xm

1.2 computeK( X1, X2, thetas ) (5 points)

Eqn 6.60 is the marginal distribution of mean ouput of $N$ data vectors: $p(\y) = \mathcal{N}(\zero, \K)$. Notice that the expected mean function is $0$ at all locations, and that the covariance is a $N$ by $N$ kernel matrix $\K$. Write a function "computeK( X1, X2, thetas )" that computes the kernel matrix. Hint: use k_n_m as part of an innner loop (of course, there are more efficient ways of computing the kernel function making better use of vectorization, but that is not necessary) (5 points).


In [6]:
def computeK(X1, X2, thetas):
    K = np.zeros((len(X1), len(X2)))
    for n in range(len(X1)):
        for m in range(len(X2)):
            K[n, m] = k_n_m(X1[n], X2[m], thetas)
    return K

1.3 Plot function samples (15 points)

Now sample mean functions at the xtest locations for the theta values in Bishop Figure 6.5, make a figure with a 2 by 3 subplot and make sure the title reflects the theta values (make sure everything is legible). In other words, sample $\yi \sim \mathcal{N}(\zero, \K{\thetav})$. Make use of numpy.random.multivariate_normal(). On your plots include the expected value of $\y$ with a dashed line and fill_between 2 standard deviations of the uncertainty due to $\K$ (the diagonal of $\K$ is the variance of the model uncertainty) (15 points).


In [7]:
from itertools import product
thetas = np.array([(1, 4, 0, 0), (9, 4, 0, 0), (1, 64, 0, 0), (1, 0.25, 0, 0), (1, 4, 10, 0), (1, 4, 0, 5)])

fig = plt.figure()
fig.set_size_inches(14, 8)
for row, col in product(range(2), range(3)):
    idx = row * 3 + col
    K_theta = computeK(x_test, x_test, thetas[idx])
    for i in range(5):
        y = np.random.multivariate_normal(np.zeros(len(K_theta)), K_theta)
        plt.subplot(2, 3, idx + 1)
        plt.plot(x_test, y)
    plt.plot(x_test, [0] * len(x_test), '--')
    std = np.sqrt(np.diag(K_theta))
    plt.fill_between(x_test, -2 * std, 2 * std, color='r', alpha=0.25 )
    plt.title(str(thetas[idx]))
        
plt.show()


/home/ignotus/.virtualenvs/ml/lib/python2.7/site-packages/ipykernel/__main__.py:10: RuntimeWarning: covariance is not positive-semidefinite.

2. Predictive distribution (35 points)

So far we have sampled mean functions from the prior. We can draw actual data $\t$ two ways. The first way is generatively, by first sampling $\y | \K$, then sampling $\t | \y, \beta$ (Eqns 6.60 followed by 6.59). The second way is to integrate over $\y$ (the mean draw) and directly sample $\t | \K, \beta$ using Eqn 6.61. This is the generative process for $\t$. Note that we have not specified a distribution over inputs $\x$; this is because Gaussian processes are conditional models. Because of this we are free to generate locations $\x$ when playing around with the GP; obviously a dataset will give us input-output pairs.

Once we have data, we are interested in the predictive distribution (note: the prior is the predictive distribution when there is no data). Consider the joint distribution for $N+1$ targets, given by Eqn 6.64. Its covariance matrix is composed of block components $\CN$, $\k$, and $c$. The covariance matrix $CN$ for $\tN$ is $\CN = \KN + \eyeN / \beta$. We have just made explicit the size $N$ of the matrix; $N$ is the number of training points. The kernel vector $\k$ is a $N$ by $1$ vector of kernel function evaluations between the training input data and the test input vector. The scalar $c$ is a kernel evaluation at the test input.

2.1 gp_predictive_distribution(...) (10 points)

Write a function "gp_predictive_distribution(x_train, t_train, x_test, theta, beta, C = None)" that computes Eqns 6.66 and 6.67, except allow for an arbitrary number of test points (not just one) and now the kernel matrix is for training data. By having C as an optional parameter, we can avoid computing it more than once (for this problem it is unimportant, but for real problems this is an issue). The function should compute $\C$, $\k$, and $c$ and the mean and noise functions. Do not forget: the computeK function computes $\K$, not $\C$! (10 points)


In [8]:
def computeC(x_train, theta, beta):
    K_t = computeK(x_train, x_train, theta)
    return K_t + np.identity(len(x_train)) / beta

def gp_predictive_distribution(x_train, t_train, x_test, theta, beta, C = None):
    if not C:
        C = computeC(x_train, theta, beta)
    k = computeK(x_train, x_test, theta)
    Cinv = np.linalg.inv(C)
    means = k.T.dot(Cinv).dot(t_train)
    c = computeK(x_test, x_test, theta) + 1.0 / beta
    covs = c - k.T.dot(Cinv).dot(k)
    return means, covs

2.2 gp_log_likelihood(...) (10 points)

Later, to learn the hyperparameters, we will need to compute the log-likelihood of the of the training data. Implicitly, this is conditioned on the value setting for $\thetav$. Write a function "gp_log_likelihood( x_train, t_train, theta, C = None, invC = None )", where C and invC can be stored and reused. (10 points)


In [9]:
def gp_log_likelihood(x_train, t_train, theta, beta, C = None, invC = None):
    if not C:
        C = computeC(x_train, theta, beta)
    if not invC:
        invC = np.linalg.inv(C)

    return -(np.log(np.linalg.det(C)) + t_train.T.dot(invC).dot(t_train) + len(t_train) * np.log(2 * np.pi)) / 2

2.3 Plotting (10 points)

Repeat the 6 plots above, but this time conditioned on the training points. Use the sinuosoidal data generator to create 2 training points where x is sampled uniformly between $-1$ and $1$. For these plots, feel free to use the provided function "gp_plot". Make sure you put the parameters in the title and this time also the log-likelihood. (10 points) Try to understand the two types of uncertainty! If you do not use "gp_plot", please add a fill between for the model and target noise.


In [10]:
def gp_plot( x_test, y_test, mu_test, var_test, x_train, t_train, theta, beta ):
    # x_test:   the test data
    # y_test:   the true function at x_test
    # mu_test:  predictive mean at x_test
    # var_test: predictive covariance at x_test 
    # t_train:  the training values
    # theta:    the kernel parameters
    # beta:     the precision (known)
    
    # the reason for the manipulation is to allow plots separating model and data stddevs.
    std_total = np.sqrt(np.diag(var_test))         # includes all uncertainty, model and target noise
    std_model = np.sqrt( std_total**2 - 1.0/beta ) # remove data noise to get model uncertainty in stddev
    std_combo = std_model + np.sqrt( 1.0/beta )    # add stddev (note: not the same as full)
    
    pp.plot( x_test, y_test, 'b', lw=3)
    pp.plot( x_test, mu_test, 'k--', lw=2 )
    pp.fill_between( x_test, mu_test+2*std_combo,mu_test-2*std_combo, color='k', alpha=0.25 )
    pp.fill_between( x_test, mu_test+2*std_model,mu_test-2*std_model, color='r', alpha=0.25 )
    pp.plot( x_train, t_train, 'ro', ms=10 )

In [11]:
beta  = 1.0 / pow(sigma,2)

In [12]:
def plot1(x_train):
    fig = plt.figure()
    y_train = true_mean_function(x_train)
    t_train = add_noise(y_train, sigma)
    x_test = np.linspace(-1, 1, N_test)
    y_test = true_mean_function(x_test)
    
    fig.set_size_inches(14, 8)
    for row, col in product(range(2), range(3)):
        idx = row * 3 + col
        plt.subplot(2, 3, idx + 1)
        means, sigmas = gp_predictive_distribution(x_train, t_train, x_test, thetas[idx], beta)
        gp_plot(x_test, y_test, means, sigmas, x_train, t_train, thetas[idx], beta)
        plt.title(str(thetas[idx]))

    plt.show()

x_train = np.random.uniform(-1, 1, 2)
plot1(x_train)


2.4 More ploting (5 points)

Repeat the 6 plots above, but this time conditioned a new set of 10 training points. (5 points)


In [13]:
x_train = np.random.uniform(-1, 1, 10)
plot1(x_train)


3. Learning the hyperparameters (45 points)

Learning the values of the parameter $\thetav$ can be very tricky for Gaussian processes in general, but when the data is univariate like ours, we can visualize the fit and see how plausible it looks.

3.1 Derivatives (5 points)

Maximum likelihood or MAP learning is the most common way of setting the parameters, though a fully Bayesian approach is possible too. We will look at ML today. For this, we start with the dervivative of the log-likelihood with respect to the parameters $\thetav$; this is Eqn 6.70. This, in turn, requires the derivative of the kernel matrix $\CN$ wrt $\thetav$. This is the matrix of element-wise derivatives of the kernel function. Write the derivatives for $\theta_0$ to $\theta_3$ for our kernel function (5 points).

Answer

$\dfrac{\partial k(x_n,x_m)}{\partial\theta_0} = \exp(-\dfrac{\theta_1}{2}||x_n - x_m||^2)$

$\dfrac{\partial k(x_n,x_m)}{\partial\theta_1} = -\dfrac{1}{2}||x_n - x_m||^2 \theta_0 \exp(-\dfrac{\theta_1}{2}||x_n - x_m||^2)$

$\dfrac{\partial k(x_n,x_m)}{\partial\theta_2} = 1$

$\dfrac{\partial k(x_n,x_m)}{\partial\theta_3} = x_n^Tx_m$

3.2 Questions (5 points)

Which parameters in $\thetav$ are unconstrained, that is, where any positive/ negative values are valid? (5 points)

Answer

We start by decomposing our kernal function into a sum of 3 kernal functions:

$k(x_n,x_m)= \theta_0 k_1(x_n,x_m) + \theta_2 k_2(x_n,x_m) + \theta_3 k_3(x_n,x_m)$

As we can see $\theta_0$, $\theta_2$, and $\theta_3$ are constants and therefore should be >0 for the kernal to be valid. On the other hand, $\theta_1$ plays a role of a bandwidth in $k_1$ and is not constrained.

3.3 More derivatives (5 points)

For parameters that are constrained to be positive, the usual approach is to use the exponential of the free-parameter in the kernel function, but perform gradient ascent on the unconstrained values. Consider the case $\theta_i = \exp( \phi_i)$, where $\phi_i$ is unconstrained. Write the derivative for $\phi_i$ in terms of the derivatives you already computed (5 points). Hint: use the chain rule and do not repeat the full derivation.

Answer

$\dfrac{\partial k(x_n,x_m)}{\partial \phi_0}= \exp(-\dfrac{\theta_1}{2} ||x_n-x_m||^2 + \phi_0)$

$\dfrac{\partial k(x_n,x_m)}{\partial \phi_2}= \exp(\phi_2)$

$\dfrac{\partial k(x_n,x_m)}{\partial \phi_3}= x_n^Tx_m \exp(\phi_3)$

3.4 Grid search (10 points)

Grid-search: for the same training set you have above, perform a small grid search over $\thetav$ (try at least 20 combinations). Have your grid-search loop or function print out rows of log-likelihood + $\thetav$ sorted by best to worst. Use the log-likelihood to select the best $\thetav$ and the worst. Plots both the same way as the subplots above (ie a 1 by 2 subplot of best and worst). (10 points)


In [14]:
from operator import itemgetter

theta = [0, 1, 2, 4, 8]

x_train = np.random.uniform(-1, 1, 10)
y_train = true_mean_function(x_train)
t_train = add_noise(y_train, sigma)

results = []
# 27 combinations
for t in product(theta, repeat=4):
    results.append(tuple([gp_log_likelihood(x_train, t_train, t, beta)]) + t)

results = sorted(results, key=itemgetter(0))
for loglkl, t0, t1, t2, t3 in results:
    print('log-likelihood: %f thetas: %s' % (loglkl, (t0, t1, t2, t3)))
print 'Best Log Likelihood: %f for thetas: (%d,%d,%d,%d)' % results[-1]
print 'Worse Log Likelihood: %f for thetas: (%d,%d,%d,%d)' % results[0]


log-likelihood: -15.977750 thetas: (8, 1, 8, 8)
log-likelihood: -15.935507 thetas: (8, 8, 8, 8)
log-likelihood: -15.856999 thetas: (8, 1, 8, 4)
log-likelihood: -15.800903 thetas: (8, 1, 8, 2)
log-likelihood: -15.793389 thetas: (8, 1, 8, 0)
log-likelihood: -15.789301 thetas: (8, 1, 4, 8)
log-likelihood: -15.787475 thetas: (4, 1, 8, 8)
log-likelihood: -15.784481 thetas: (8, 1, 8, 1)
log-likelihood: -15.743949 thetas: (8, 8, 8, 4)
log-likelihood: -15.703182 thetas: (8, 8, 4, 8)
log-likelihood: -15.696190 thetas: (8, 0, 8, 0)
log-likelihood: -15.668499 thetas: (8, 1, 4, 4)
log-likelihood: -15.659348 thetas: (8, 1, 2, 8)
log-likelihood: -15.656905 thetas: (2, 1, 8, 8)
log-likelihood: -15.612351 thetas: (8, 1, 4, 2)
log-likelihood: -15.611039 thetas: (8, 8, 8, 2)
log-likelihood: -15.608773 thetas: (4, 1, 8, 4)
log-likelihood: -15.604734 thetas: (8, 1, 4, 0)
log-likelihood: -15.604522 thetas: (8, 0, 8, 8)
log-likelihood: -15.595887 thetas: (8, 1, 4, 1)
log-likelihood: -15.579026 thetas: (8, 1, 1, 8)
log-likelihood: -15.552609 thetas: (4, 0, 8, 0)
log-likelihood: -15.552609 thetas: (8, 0, 4, 0)
log-likelihood: -15.544891 thetas: (4, 1, 4, 8)
log-likelihood: -15.542296 thetas: (1, 1, 8, 8)
log-likelihood: -15.538496 thetas: (8, 1, 2, 4)
log-likelihood: -15.528453 thetas: (8, 8, 8, 1)
log-likelihood: -15.527903 thetas: (8, 8, 2, 8)
log-likelihood: -15.511591 thetas: (8, 8, 4, 4)
log-likelihood: -15.504525 thetas: (4, 1, 8, 2)
log-likelihood: -15.483285 thetas: (8, 1, 0, 8)
log-likelihood: -15.482299 thetas: (8, 1, 2, 2)
log-likelihood: -15.474586 thetas: (8, 1, 2, 0)
log-likelihood: -15.471986 thetas: (4, 1, 8, 0)
log-likelihood: -15.465796 thetas: (8, 1, 2, 1)
log-likelihood: -15.462174 thetas: (4, 1, 8, 1)
log-likelihood: -15.461656 thetas: (2, 0, 8, 0)
log-likelihood: -15.461656 thetas: (8, 0, 2, 0)
log-likelihood: -15.460977 thetas: (4, 0, 8, 8)
log-likelihood: -15.460977 thetas: (8, 0, 4, 8)
log-likelihood: -15.458137 thetas: (8, 1, 1, 4)
log-likelihood: -15.429476 thetas: (8, 8, 8, 0)
log-likelihood: -15.421338 thetas: (2, 1, 8, 4)
log-likelihood: -15.410100 thetas: (8, 8, 1, 8)
log-likelihood: -15.409115 thetas: (1, 0, 8, 0)
log-likelihood: -15.409115 thetas: (8, 0, 1, 0)
log-likelihood: -15.408529 thetas: (8, 2, 8, 8)
log-likelihood: -15.401902 thetas: (8, 1, 1, 2)
log-likelihood: -15.394116 thetas: (8, 1, 1, 0)
log-likelihood: -15.385369 thetas: (8, 1, 1, 1)
log-likelihood: -15.378648 thetas: (8, 8, 4, 2)
log-likelihood: -15.373305 thetas: (2, 1, 4, 8)
log-likelihood: -15.370052 thetas: (2, 0, 8, 8)
log-likelihood: -15.370052 thetas: (8, 0, 2, 8)
log-likelihood: -15.366172 thetas: (4, 1, 4, 4)
log-likelihood: -15.362342 thetas: (8, 1, 0, 4)
log-likelihood: -15.357745 thetas: (4, 1, 2, 8)
log-likelihood: -15.350396 thetas: (0, 0, 8, 0)
log-likelihood: -15.350396 thetas: (0, 1, 8, 0)
log-likelihood: -15.350396 thetas: (0, 2, 8, 0)
log-likelihood: -15.350396 thetas: (0, 4, 8, 0)
log-likelihood: -15.350396 thetas: (0, 8, 8, 0)
log-likelihood: -15.350396 thetas: (4, 0, 4, 0)
log-likelihood: -15.350396 thetas: (8, 0, 0, 0)
log-likelihood: -15.336273 thetas: (8, 8, 2, 4)
log-likelihood: -15.317529 thetas: (1, 0, 8, 8)
log-likelihood: -15.317529 thetas: (8, 0, 1, 8)
log-likelihood: -15.306084 thetas: (8, 4, 8, 8)
log-likelihood: -15.306053 thetas: (8, 1, 0, 2)
log-likelihood: -15.298163 thetas: (8, 1, 0, 0)
log-likelihood: -15.296038 thetas: (8, 8, 4, 1)
log-likelihood: -15.289479 thetas: (8, 1, 0, 1)
log-likelihood: -15.280245 thetas: (8, 0, 8, 4)
log-likelihood: -15.266613 thetas: (1, 1, 8, 4)
log-likelihood: -15.261905 thetas: (4, 1, 4, 2)
log-likelihood: -15.258835 thetas: (0, 0, 8, 8)
log-likelihood: -15.258835 thetas: (0, 1, 8, 8)
log-likelihood: -15.258835 thetas: (0, 2, 8, 8)
log-likelihood: -15.258835 thetas: (0, 4, 8, 8)
log-likelihood: -15.258835 thetas: (0, 8, 8, 8)
log-likelihood: -15.258835 thetas: (4, 0, 4, 8)
log-likelihood: -15.258835 thetas: (8, 0, 0, 8)
log-likelihood: -15.255732 thetas: (8, 8, 0, 8)
log-likelihood: -15.253318 thetas: (2, 1, 8, 2)
log-likelihood: -15.237579 thetas: (8, 2, 8, 4)
log-likelihood: -15.232045 thetas: (1, 1, 4, 8)
log-likelihood: -15.229329 thetas: (4, 1, 4, 0)
log-likelihood: -15.229019 thetas: (4, 1, 1, 8)
log-likelihood: -15.219538 thetas: (4, 1, 4, 1)
log-likelihood: -15.218437 thetas: (8, 8, 1, 4)
log-likelihood: -15.211064 thetas: (8, 2, 4, 8)
log-likelihood: -15.207075 thetas: (2, 0, 4, 0)
log-likelihood: -15.207075 thetas: (4, 0, 2, 0)
log-likelihood: -15.203294 thetas: (8, 8, 2, 2)
log-likelihood: -15.197024 thetas: (8, 8, 4, 0)
log-likelihood: -15.179005 thetas: (4, 1, 2, 4)
log-likelihood: -15.159662 thetas: (2, 1, 8, 1)
log-likelihood: -15.139232 thetas: (2, 1, 8, 0)
log-likelihood: -15.137737 thetas: (2, 1, 4, 4)
log-likelihood: -15.136699 thetas: (4, 0, 8, 4)
log-likelihood: -15.136699 thetas: (8, 0, 4, 4)
log-likelihood: -15.132944 thetas: (2, 1, 2, 8)
log-likelihood: -15.125559 thetas: (8, 2, 8, 2)
log-likelihood: -15.122468 thetas: (4, 2, 8, 8)
log-likelihood: -15.120656 thetas: (8, 8, 2, 1)
log-likelihood: -15.116329 thetas: (1, 0, 4, 0)
log-likelihood: -15.116329 thetas: (4, 0, 1, 0)
log-likelihood: -15.115659 thetas: (8, 4, 8, 4)
log-likelihood: -15.115583 thetas: (2, 0, 4, 8)
log-likelihood: -15.115583 thetas: (4, 0, 2, 8)
log-likelihood: -15.101902 thetas: (1, 2, 8, 8)
log-likelihood: -15.092221 thetas: (8, 4, 4, 8)
log-likelihood: -15.085425 thetas: (8, 8, 1, 2)
log-likelihood: -15.074717 thetas: (4, 1, 2, 2)
log-likelihood: -15.072461 thetas: (8, 2, 2, 8)
log-likelihood: -15.064010 thetas: (8, 8, 0, 4)
log-likelihood: -15.059882 thetas: (8, 2, 8, 1)
log-likelihood: -15.055211 thetas: (4, 1, 0, 8)
log-likelihood: -15.050260 thetas: (4, 1, 1, 4)
log-likelihood: -15.047287 thetas: (2, 2, 8, 8)
log-likelihood: -15.045774 thetas: (2, 0, 8, 4)
log-likelihood: -15.045774 thetas: (8, 0, 2, 4)
log-likelihood: -15.043644 thetas: (1, 1, 8, 2)
log-likelihood: -15.042097 thetas: (4, 1, 2, 0)
log-likelihood: -15.039973 thetas: (8, 2, 4, 4)
log-likelihood: -15.032330 thetas: (4, 1, 2, 1)
log-likelihood: -15.024894 thetas: (1, 0, 4, 8)
log-likelihood: -15.024894 thetas: (4, 0, 1, 8)
log-likelihood: -15.021602 thetas: (8, 8, 2, 0)
log-likelihood: -15.005379 thetas: (0, 0, 4, 0)
log-likelihood: -15.005379 thetas: (0, 1, 4, 0)
log-likelihood: -15.005379 thetas: (0, 2, 4, 0)
log-likelihood: -15.005379 thetas: (0, 4, 4, 0)
log-likelihood: -15.005379 thetas: (0, 8, 4, 0)
log-likelihood: -15.005379 thetas: (2, 0, 2, 0)
log-likelihood: -15.005379 thetas: (4, 0, 0, 0)
log-likelihood: -15.002762 thetas: (8, 8, 1, 1)
log-likelihood: -14.993251 thetas: (1, 0, 8, 4)
log-likelihood: -14.993251 thetas: (8, 0, 1, 4)
log-likelihood: -14.986581 thetas: (8, 2, 8, 0)
log-likelihood: -14.985470 thetas: (8, 2, 1, 8)
log-likelihood: -14.983831 thetas: (8, 4, 8, 2)
log-likelihood: -14.977378 thetas: (8, 0, 8, 2)
log-likelihood: -14.969717 thetas: (2, 1, 4, 2)
log-likelihood: -14.956363 thetas: (1, 1, 4, 4)
log-likelihood: -14.956349 thetas: (4, 8, 8, 8)
log-likelihood: -14.951046 thetas: (1, 1, 2, 8)
log-likelihood: -14.948411 thetas: (2, 1, 1, 8)
log-likelihood: -14.945951 thetas: (4, 1, 1, 2)
log-likelihood: -14.937025 thetas: (8, 4, 2, 8)
log-likelihood: -14.934556 thetas: (0, 0, 8, 4)
log-likelihood: -14.934556 thetas: (0, 1, 8, 4)
log-likelihood: -14.934556 thetas: (0, 2, 8, 4)
log-likelihood: -14.934556 thetas: (0, 4, 8, 4)
log-likelihood: -14.934556 thetas: (0, 8, 8, 4)
log-likelihood: -14.934556 thetas: (4, 0, 4, 4)
log-likelihood: -14.934556 thetas: (8, 0, 0, 4)
log-likelihood: -14.930943 thetas: (8, 8, 0, 2)
log-likelihood: -14.927813 thetas: (8, 2, 4, 2)
log-likelihood: -14.922497 thetas: (4, 2, 8, 4)
log-likelihood: -14.914028 thetas: (0, 0, 4, 8)
log-likelihood: -14.914028 thetas: (0, 1, 4, 8)
log-likelihood: -14.914028 thetas: (0, 2, 4, 8)
log-likelihood: -14.914028 thetas: (0, 4, 4, 8)
log-likelihood: -14.914028 thetas: (0, 8, 4, 8)
log-likelihood: -14.914028 thetas: (2, 0, 2, 8)
log-likelihood: -14.914028 thetas: (4, 0, 0, 8)
log-likelihood: -14.913291 thetas: (4, 1, 1, 0)
log-likelihood: -14.903672 thetas: (8, 8, 1, 0)
log-likelihood: -14.903547 thetas: (4, 1, 1, 1)
log-likelihood: -14.902077 thetas: (8, 4, 8, 1)
log-likelihood: -14.901731 thetas: (8, 4, 4, 4)
log-likelihood: -14.901229 thetas: (8, 2, 2, 4)
log-likelihood: -14.897375 thetas: (2, 1, 2, 4)
log-likelihood: -14.892314 thetas: (1, 1, 8, 1)
log-likelihood: -14.880118 thetas: (8, 2, 0, 8)
log-likelihood: -14.876418 thetas: (4, 1, 0, 4)
log-likelihood: -14.876065 thetas: (2, 1, 4, 1)
log-likelihood: -14.871243 thetas: (4, 2, 4, 8)
log-likelihood: -14.862573 thetas: (1, 0, 2, 0)
log-likelihood: -14.862573 thetas: (2, 0, 1, 0)
log-likelihood: -14.862025 thetas: (8, 2, 4, 1)
log-likelihood: -14.855667 thetas: (2, 1, 4, 0)
log-likelihood: -14.848237 thetas: (8, 8, 0, 1)
log-likelihood: -14.838284 thetas: (1, 2, 8, 4)
log-likelihood: -14.836615 thetas: (8, 4, 1, 8)
log-likelihood: -14.833831 thetas: (4, 0, 8, 2)
log-likelihood: -14.833831 thetas: (8, 0, 4, 2)
log-likelihood: -14.825373 thetas: (1, 1, 8, 0)
log-likelihood: -14.815913 thetas: (2, 2, 8, 4)
log-likelihood: -14.814127 thetas: (8, 2, 1, 4)
log-likelihood: -14.804293 thetas: (8, 4, 8, 0)
log-likelihood: -14.791303 thetas: (2, 0, 4, 4)
log-likelihood: -14.791303 thetas: (4, 0, 2, 4)
log-likelihood: -14.788931 thetas: (8, 2, 2, 2)
log-likelihood: -14.788562 thetas: (8, 2, 4, 0)
log-likelihood: -14.787960 thetas: (4, 2, 8, 2)
log-likelihood: -14.786736 thetas: (1, 2, 4, 8)
log-likelihood: -14.772070 thetas: (4, 1, 0, 2)
log-likelihood: -14.771361 thetas: (1, 0, 2, 8)
log-likelihood: -14.771361 thetas: (2, 0, 1, 8)
log-likelihood: -14.769842 thetas: (8, 4, 4, 2)
log-likelihood: -14.756724 thetas: (2, 2, 4, 8)
log-likelihood: -14.749085 thetas: (8, 8, 0, 0)
log-likelihood: -14.746468 thetas: (8, 4, 2, 4)
log-likelihood: -14.742906 thetas: (2, 0, 8, 2)
log-likelihood: -14.742906 thetas: (8, 0, 2, 2)
log-likelihood: -14.739335 thetas: (4, 1, 0, 0)
log-likelihood: -14.733398 thetas: (1, 1, 4, 2)
log-likelihood: -14.729634 thetas: (4, 1, 0, 1)
log-likelihood: -14.729356 thetas: (2, 1, 2, 2)
log-likelihood: -14.723034 thetas: (8, 2, 2, 1)
log-likelihood: -14.714846 thetas: (8, 0, 8, 1)
log-likelihood: -14.714372 thetas: (1, 1, 1, 8)
log-likelihood: -14.713002 thetas: (4, 8, 8, 4)
log-likelihood: -14.712841 thetas: (2, 1, 1, 4)
log-likelihood: -14.711116 thetas: (4, 2, 8, 1)
log-likelihood: -14.710885 thetas: (8, 4, 0, 8)
log-likelihood: -14.708609 thetas: (8, 2, 0, 4)
log-likelihood: -14.701718 thetas: (8, 2, 1, 2)
log-likelihood: -14.700613 thetas: (1, 0, 4, 4)
log-likelihood: -14.700613 thetas: (4, 0, 1, 4)
log-likelihood: -14.692186 thetas: (4, 4, 8, 8)
log-likelihood: -14.690382 thetas: (1, 0, 8, 2)
log-likelihood: -14.690382 thetas: (8, 0, 1, 2)
log-likelihood: -14.688041 thetas: (8, 4, 4, 1)
log-likelihood: -14.679385 thetas: (4, 8, 4, 8)
log-likelihood: -14.675367 thetas: (1, 1, 2, 4)
log-likelihood: -14.673704 thetas: (4, 2, 2, 8)
log-likelihood: -14.671179 thetas: (4, 2, 4, 4)
log-likelihood: -14.661905 thetas: (0, 0, 2, 0)
log-likelihood: -14.661905 thetas: (0, 1, 2, 0)
log-likelihood: -14.661905 thetas: (0, 2, 2, 0)
log-likelihood: -14.661905 thetas: (0, 4, 2, 0)
log-likelihood: -14.661905 thetas: (0, 8, 2, 0)
log-likelihood: -14.661905 thetas: (1, 0, 1, 0)
log-likelihood: -14.661905 thetas: (2, 0, 0, 0)
log-likelihood: -14.652771 thetas: (2, 1, 0, 8)
log-likelihood: -14.651585 thetas: (2, 2, 8, 2)
log-likelihood: -14.649409 thetas: (8, 2, 2, 0)
log-likelihood: -14.646001 thetas: (8, 4, 1, 4)
log-likelihood: -14.638910 thetas: (4, 2, 8, 0)
log-likelihood: -14.635734 thetas: (8, 2, 1, 1)
log-likelihood: -14.635709 thetas: (2, 1, 2, 1)
log-likelihood: -14.633797 thetas: (1, 2, 8, 2)
log-likelihood: -14.631687 thetas: (0, 0, 8, 2)
log-likelihood: -14.631687 thetas: (0, 1, 8, 2)
log-likelihood: -14.631687 thetas: (0, 2, 8, 2)
log-likelihood: -14.631687 thetas: (0, 4, 8, 2)
log-likelihood: -14.631687 thetas: (0, 8, 8, 2)
log-likelihood: -14.631687 thetas: (4, 0, 4, 2)
log-likelihood: -14.631687 thetas: (8, 0, 0, 2)
log-likelihood: -14.615358 thetas: (2, 1, 2, 0)
log-likelihood: -14.614513 thetas: (8, 4, 2, 2)
log-likelihood: -14.596036 thetas: (8, 2, 0, 2)
log-likelihood: -14.590189 thetas: (8, 4, 4, 0)
log-likelihood: -14.589746 thetas: (0, 0, 4, 4)
log-likelihood: -14.589746 thetas: (0, 1, 4, 4)
log-likelihood: -14.589746 thetas: (0, 2, 4, 4)
log-likelihood: -14.589746 thetas: (0, 4, 4, 4)
log-likelihood: -14.589746 thetas: (0, 8, 4, 4)
log-likelihood: -14.589746 thetas: (2, 0, 2, 4)
log-likelihood: -14.589746 thetas: (4, 0, 0, 4)
log-likelihood: -14.582075 thetas: (1, 1, 4, 1)
log-likelihood: -14.571297 thetas: (4, 0, 8, 1)
log-likelihood: -14.571297 thetas: (8, 0, 4, 1)
log-likelihood: -14.570970 thetas: (0, 0, 2, 8)
log-likelihood: -14.570970 thetas: (0, 1, 2, 8)
log-likelihood: -14.570970 thetas: (0, 2, 2, 8)
log-likelihood: -14.570970 thetas: (0, 4, 2, 8)
log-likelihood: -14.570970 thetas: (0, 8, 2, 8)
log-likelihood: -14.570970 thetas: (1, 0, 1, 8)
log-likelihood: -14.570970 thetas: (2, 0, 0, 8)
log-likelihood: -14.561981 thetas: (8, 2, 1, 0)
log-likelihood: -14.558392 thetas: (2, 2, 8, 1)
log-likelihood: -14.544822 thetas: (2, 1, 1, 2)
log-likelihood: -14.536531 thetas: (4, 2, 4, 2)
log-likelihood: -14.535011 thetas: (4, 2, 1, 8)
log-likelihood: -14.532663 thetas: (8, 4, 2, 1)
log-likelihood: -14.529924 thetas: (8, 2, 0, 1)
log-likelihood: -14.525308 thetas: (2, 2, 4, 4)
log-likelihood: -14.524954 thetas: (4, 8, 8, 2)
log-likelihood: -14.523101 thetas: (1, 2, 4, 4)
log-likelihood: -14.520182 thetas: (8, 4, 0, 4)
log-likelihood: -14.515199 thetas: (1, 1, 4, 0)
log-likelihood: -14.514482 thetas: (2, 2, 8, 0)
log-likelihood: -14.513992 thetas: (8, 4, 1, 2)
log-likelihood: -14.506268 thetas: (2, 2, 2, 8)
log-likelihood: -14.505793 thetas: (1, 2, 8, 1)
log-likelihood: -14.497616 thetas: (1, 2, 2, 8)
log-likelihood: -14.494613 thetas: (1, 2, 8, 0)
log-likelihood: -14.488433 thetas: (2, 0, 4, 2)
log-likelihood: -14.488433 thetas: (4, 0, 2, 2)
log-likelihood: -14.480370 thetas: (2, 0, 8, 1)
log-likelihood: -14.480370 thetas: (8, 0, 2, 1)
log-likelihood: -14.473526 thetas: (4, 2, 2, 4)
log-likelihood: -14.459582 thetas: (4, 2, 4, 1)
log-likelihood: -14.456034 thetas: (4, 4, 8, 4)
log-likelihood: -14.455979 thetas: (8, 2, 0, 0)
log-likelihood: -14.452407 thetas: (1, 1, 2, 2)
log-likelihood: -14.451182 thetas: (2, 1, 1, 1)
log-likelihood: -14.448351 thetas: (4, 8, 2, 8)
log-likelihood: -14.447078 thetas: (1, 0, 2, 4)
log-likelihood: -14.447078 thetas: (2, 0, 1, 4)
log-likelihood: -14.438696 thetas: (1, 1, 1, 4)
log-likelihood: -14.436007 thetas: (4, 8, 4, 4)
log-likelihood: -14.434740 thetas: (8, 4, 2, 0)
log-likelihood: -14.432100 thetas: (8, 4, 1, 1)
log-likelihood: -14.430885 thetas: (2, 1, 1, 0)
log-likelihood: -14.428763 thetas: (4, 4, 4, 8)
log-likelihood: -14.427846 thetas: (1, 0, 8, 1)
log-likelihood: -14.427846 thetas: (8, 0, 1, 1)
log-likelihood: -14.417196 thetas: (2, 1, 0, 4)
log-likelihood: -14.397741 thetas: (1, 0, 4, 2)
log-likelihood: -14.397741 thetas: (4, 0, 1, 2)
log-likelihood: -14.395379 thetas: (4, 8, 8, 1)
log-likelihood: -14.388087 thetas: (8, 4, 0, 2)
log-likelihood: -14.387178 thetas: (4, 2, 4, 0)
log-likelihood: -14.369149 thetas: (0, 0, 8, 1)
log-likelihood: -14.369149 thetas: (0, 1, 8, 1)
log-likelihood: -14.369149 thetas: (0, 2, 8, 1)
log-likelihood: -14.369149 thetas: (0, 4, 8, 1)
log-likelihood: -14.369149 thetas: (0, 8, 8, 1)
log-likelihood: -14.369149 thetas: (4, 0, 4, 1)
log-likelihood: -14.369149 thetas: (8, 0, 0, 1)
log-likelihood: -14.360920 thetas: (2, 2, 4, 2)
log-likelihood: -14.358124 thetas: (2, 4, 8, 8)
log-likelihood: -14.342443 thetas: (4, 2, 0, 8)
log-likelihood: -14.338741 thetas: (4, 2, 2, 2)
log-likelihood: -14.334720 thetas: (4, 2, 1, 4)
log-likelihood: -14.334118 thetas: (8, 4, 1, 0)
log-likelihood: -14.321472 thetas: (0, 0, 1, 0)
log-likelihood: -14.321472 thetas: (0, 1, 1, 0)
log-likelihood: -14.321472 thetas: (0, 2, 1, 0)
log-likelihood: -14.321472 thetas: (0, 4, 1, 0)
log-likelihood: -14.321472 thetas: (0, 8, 1, 0)
log-likelihood: -14.321472 thetas: (1, 0, 0, 0)
log-likelihood: -14.318588 thetas: (1, 2, 4, 2)
log-likelihood: -14.309664 thetas: (2, 2, 1, 8)
log-likelihood: -14.306130 thetas: (8, 4, 0, 1)
log-likelihood: -14.301096 thetas: (1, 1, 2, 1)
log-likelihood: -14.292369 thetas: (1, 4, 8, 8)
log-likelihood: -14.286872 thetas: (0, 0, 4, 2)
log-likelihood: -14.286872 thetas: (0, 1, 4, 2)
log-likelihood: -14.286872 thetas: (0, 2, 4, 2)
log-likelihood: -14.286872 thetas: (0, 4, 4, 2)
log-likelihood: -14.286872 thetas: (0, 8, 4, 2)
log-likelihood: -14.286872 thetas: (2, 0, 2, 2)
log-likelihood: -14.286872 thetas: (4, 0, 0, 2)
log-likelihood: -14.276500 thetas: (4, 4, 8, 2)
log-likelihood: -14.274788 thetas: (2, 2, 2, 4)
log-likelihood: -14.274526 thetas: (4, 8, 1, 8)
log-likelihood: -14.267658 thetas: (2, 2, 4, 1)
log-likelihood: -14.266652 thetas: (2, 8, 8, 8)
log-likelihood: -14.261660 thetas: (4, 2, 2, 1)
log-likelihood: -14.249482 thetas: (1, 1, 0, 8)
log-likelihood: -14.249278 thetas: (1, 2, 1, 8)
log-likelihood: -14.249180 thetas: (2, 1, 0, 2)
log-likelihood: -14.247921 thetas: (4, 8, 4, 2)
log-likelihood: -14.246683 thetas: (0, 0, 2, 4)
log-likelihood: -14.246683 thetas: (0, 1, 2, 4)
log-likelihood: -14.246683 thetas: (0, 2, 2, 4)
log-likelihood: -14.246683 thetas: (0, 4, 2, 4)
log-likelihood: -14.246683 thetas: (0, 8, 2, 4)
log-likelihood: -14.246683 thetas: (1, 0, 1, 4)
log-likelihood: -14.246683 thetas: (2, 0, 0, 4)
log-likelihood: -14.234327 thetas: (1, 1, 2, 0)
log-likelihood: -14.233952 thetas: (1, 2, 2, 4)
log-likelihood: -14.231354 thetas: (0, 0, 1, 8)
log-likelihood: -14.231354 thetas: (0, 1, 1, 8)
log-likelihood: -14.231354 thetas: (0, 2, 1, 8)
log-likelihood: -14.231354 thetas: (0, 4, 1, 8)
log-likelihood: -14.231354 thetas: (0, 8, 1, 8)
log-likelihood: -14.231354 thetas: (1, 0, 0, 8)
log-likelihood: -14.225892 thetas: (2, 0, 4, 1)
log-likelihood: -14.225892 thetas: (4, 0, 2, 1)
log-likelihood: -14.223577 thetas: (2, 2, 4, 0)
log-likelihood: -14.220015 thetas: (4, 8, 8, 0)
log-likelihood: -14.215854 thetas: (4, 4, 2, 8)
log-likelihood: -14.215745 thetas: (1, 1, 1, 2)
log-likelihood: -14.208053 thetas: (8, 4, 0, 0)
log-likelihood: -14.204928 thetas: (4, 8, 2, 4)
log-likelihood: -14.199800 thetas: (4, 2, 1, 2)
log-likelihood: -14.192545 thetas: (4, 4, 4, 4)
log-likelihood: -14.190549 thetas: (1, 2, 4, 1)
log-likelihood: -14.189013 thetas: (4, 2, 2, 0)
log-likelihood: -14.179254 thetas: (1, 2, 4, 0)
log-likelihood: -14.155557 thetas: (2, 1, 0, 1)
log-likelihood: -14.154948 thetas: (4, 4, 8, 1)
log-likelihood: -14.144200 thetas: (1, 0, 2, 2)
log-likelihood: -14.144200 thetas: (2, 0, 1, 2)
log-likelihood: -14.141931 thetas: (4, 2, 0, 4)
log-likelihood: -14.135400 thetas: (2, 1, 0, 0)
log-likelihood: -14.135198 thetas: (1, 0, 4, 1)
log-likelihood: -14.135198 thetas: (4, 0, 1, 1)
log-likelihood: -14.122591 thetas: (4, 2, 1, 1)
log-likelihood: -14.118311 thetas: (4, 8, 4, 1)
log-likelihood: -14.110311 thetas: (2, 2, 2, 2)
log-likelihood: -14.091802 thetas: (2, 4, 8, 4)
log-likelihood: -14.078107 thetas: (2, 2, 1, 4)
log-likelihood: -14.064450 thetas: (1, 1, 1, 1)
log-likelihood: -14.061648 thetas: (4, 4, 1, 8)
log-likelihood: -14.059634 thetas: (2, 4, 4, 8)
log-likelihood: -14.049703 thetas: (4, 2, 1, 0)
log-likelihood: -14.031054 thetas: (4, 8, 1, 4)
log-likelihood: -14.029394 thetas: (1, 2, 2, 2)
log-likelihood: -14.024325 thetas: (0, 0, 4, 1)
log-likelihood: -14.024325 thetas: (0, 1, 4, 1)
log-likelihood: -14.024325 thetas: (0, 2, 4, 1)
log-likelihood: -14.024325 thetas: (0, 4, 4, 1)
log-likelihood: -14.024325 thetas: (0, 8, 4, 1)
log-likelihood: -14.024325 thetas: (2, 0, 2, 1)
log-likelihood: -14.024325 thetas: (4, 0, 0, 1)
log-likelihood: -14.016948 thetas: (2, 2, 2, 1)
log-likelihood: -14.016791 thetas: (4, 8, 2, 2)
log-likelihood: -14.012932 thetas: (4, 4, 4, 2)
log-likelihood: -14.006887 thetas: (1, 4, 8, 4)
log-likelihood: -14.006746 thetas: (4, 2, 0, 2)
log-likelihood: -14.005901 thetas: (4, 8, 0, 8)
log-likelihood: -13.997832 thetas: (1, 1, 1, 0)
log-likelihood: -13.994319 thetas: (4, 4, 8, 0)
log-likelihood: -13.986573 thetas: (2, 8, 8, 4)
log-likelihood: -13.985573 thetas: (1, 2, 1, 4)
log-likelihood: -13.981054 thetas: (2, 2, 0, 8)
log-likelihood: -13.979548 thetas: (4, 4, 2, 4)
log-likelihood: -13.973820 thetas: (1, 1, 0, 4)
log-likelihood: -13.972615 thetas: (2, 2, 2, 0)
log-likelihood: -13.972433 thetas: (1, 4, 4, 8)
log-likelihood: -13.959722 thetas: (2, 8, 4, 8)
log-likelihood: -13.943798 thetas: (0, 0, 2, 2)
log-likelihood: -13.943798 thetas: (0, 1, 2, 2)
log-likelihood: -13.943798 thetas: (0, 2, 2, 2)
log-likelihood: -13.943798 thetas: (0, 4, 2, 2)
log-likelihood: -13.943798 thetas: (0, 8, 2, 2)
log-likelihood: -13.943798 thetas: (1, 0, 1, 2)
log-likelihood: -13.943798 thetas: (2, 0, 0, 2)
log-likelihood: -13.942882 thetas: (4, 8, 4, 0)
log-likelihood: -13.929284 thetas: (4, 2, 0, 1)
log-likelihood: -13.913521 thetas: (2, 2, 1, 2)
log-likelihood: -13.907057 thetas: (0, 0, 1, 4)
log-likelihood: -13.907057 thetas: (0, 1, 1, 4)
log-likelihood: -13.907057 thetas: (0, 2, 1, 4)
log-likelihood: -13.907057 thetas: (0, 4, 1, 4)
log-likelihood: -13.907057 thetas: (0, 8, 1, 4)
log-likelihood: -13.907057 thetas: (1, 0, 0, 4)
log-likelihood: -13.901297 thetas: (1, 2, 2, 1)
log-likelihood: -13.891307 thetas: (4, 4, 4, 1)
log-likelihood: -13.889809 thetas: (1, 2, 2, 0)
log-likelihood: -13.887131 thetas: (4, 8, 2, 1)
log-likelihood: -13.881884 thetas: (1, 8, 8, 8)
log-likelihood: -13.881647 thetas: (1, 0, 2, 1)
log-likelihood: -13.881647 thetas: (2, 0, 1, 1)
log-likelihood: -13.876021 thetas: (2, 4, 8, 2)
log-likelihood: -13.855924 thetas: (4, 2, 0, 0)
log-likelihood: -13.842857 thetas: (4, 8, 1, 2)
log-likelihood: -13.837604 thetas: (4, 4, 0, 8)
log-likelihood: -13.825251 thetas: (4, 4, 1, 4)
log-likelihood: -13.820033 thetas: (2, 2, 1, 1)
log-likelihood: -13.799833 thetas: (4, 4, 2, 2)
log-likelihood: -13.797254 thetas: (2, 4, 2, 8)
log-likelihood: -13.793269 thetas: (2, 4, 4, 4)
log-likelihood: -13.780950 thetas: (1, 2, 1, 2)
log-likelihood: -13.775391 thetas: (2, 2, 1, 0)
log-likelihood: -13.766120 thetas: (1, 4, 8, 2)
log-likelihood: -13.762308 thetas: (4, 8, 0, 4)
log-likelihood: -13.751240 thetas: (2, 8, 8, 2)
log-likelihood: -13.750901 thetas: (1, 1, 0, 2)
log-likelihood: -13.749273 thetas: (2, 2, 0, 4)
log-likelihood: -13.733923 thetas: (1, 2, 0, 8)
log-likelihood: -13.730545 thetas: (4, 4, 4, 0)
log-likelihood: -13.720152 thetas: (2, 4, 8, 1)
log-likelihood: -13.713142 thetas: (4, 8, 1, 1)
log-likelihood: -13.711611 thetas: (4, 8, 2, 0)
log-likelihood: -13.686929 thetas: (1, 4, 4, 4)
log-likelihood: -13.684100 thetas: (2, 8, 2, 8)
log-likelihood: -13.681233 thetas: (0, 0, 2, 1)
log-likelihood: -13.681233 thetas: (0, 1, 2, 1)
log-likelihood: -13.681233 thetas: (0, 2, 2, 1)
log-likelihood: -13.681233 thetas: (0, 4, 2, 1)
log-likelihood: -13.681233 thetas: (0, 8, 2, 1)
log-likelihood: -13.681233 thetas: (1, 0, 1, 1)
log-likelihood: -13.681233 thetas: (2, 0, 0, 1)
log-likelihood: -13.679622 thetas: (2, 8, 4, 4)
log-likelihood: -13.678112 thetas: (4, 4, 2, 1)
log-likelihood: -13.675225 thetas: (1, 4, 2, 8)
log-likelihood: -13.652766 thetas: (1, 2, 1, 1)
log-likelihood: -13.645428 thetas: (4, 4, 1, 2)
log-likelihood: -13.640996 thetas: (1, 2, 1, 0)
log-likelihood: -13.604153 thetas: (0, 0, 1, 2)
log-likelihood: -13.604153 thetas: (0, 1, 1, 2)
log-likelihood: -13.604153 thetas: (0, 2, 1, 2)
log-likelihood: -13.604153 thetas: (0, 4, 1, 2)
log-likelihood: -13.604153 thetas: (0, 8, 1, 2)
log-likelihood: -13.604153 thetas: (1, 0, 0, 2)
log-likelihood: -13.601008 thetas: (4, 4, 0, 4)
log-likelihood: -13.599673 thetas: (1, 1, 0, 1)
log-likelihood: -13.585734 thetas: (1, 4, 8, 1)
log-likelihood: -13.585688 thetas: (2, 4, 1, 8)
log-likelihood: -13.584373 thetas: (2, 2, 0, 2)
log-likelihood: -13.580470 thetas: (1, 8, 8, 4)
log-likelihood: -13.577429 thetas: (2, 4, 4, 2)
log-likelihood: -13.573968 thetas: (4, 8, 0, 2)
log-likelihood: -13.572596 thetas: (2, 8, 8, 1)
log-likelihood: -13.557206 thetas: (1, 8, 4, 8)
log-likelihood: -13.537520 thetas: (4, 8, 1, 0)
log-likelihood: -13.533671 thetas: (1, 1, 0, 0)
log-likelihood: -13.530824 thetas: (2, 4, 2, 4)
log-likelihood: -13.523607 thetas: (4, 4, 1, 1)
log-likelihood: -13.517177 thetas: (4, 4, 2, 0)
log-likelihood: -13.498972 thetas: (2, 4, 8, 0)
log-likelihood: -13.490525 thetas: (2, 2, 0, 1)
log-likelihood: -13.470027 thetas: (1, 2, 0, 4)
log-likelihood: -13.454912 thetas: (2, 8, 1, 8)
log-likelihood: -13.446127 thetas: (1, 4, 4, 2)
log-likelihood: -13.444992 thetas: (2, 2, 0, 0)
log-likelihood: -13.444257 thetas: (2, 8, 4, 2)
log-likelihood: -13.444117 thetas: (4, 8, 0, 1)
log-likelihood: -13.421492 thetas: (2, 4, 4, 1)
log-likelihood: -13.420952 thetas: (4, 4, 0, 2)
log-likelihood: -13.414803 thetas: (1, 4, 1, 8)
log-likelihood: -13.403964 thetas: (2, 8, 2, 4)
log-likelihood: -13.389683 thetas: (1, 4, 2, 4)
log-likelihood: -13.362488 thetas: (4, 4, 1, 0)
log-likelihood: -13.345578 thetas: (1, 4, 8, 0)
log-likelihood: -13.341550 thetas: (0, 0, 1, 1)
log-likelihood: -13.341550 thetas: (0, 1, 1, 1)
log-likelihood: -13.341550 thetas: (0, 2, 1, 1)
log-likelihood: -13.341550 thetas: (0, 4, 1, 1)
log-likelihood: -13.341550 thetas: (0, 8, 1, 1)
log-likelihood: -13.341550 thetas: (1, 0, 0, 1)
log-likelihood: -13.319172 thetas: (2, 4, 1, 4)
log-likelihood: -13.314891 thetas: (2, 4, 2, 2)
log-likelihood: -13.313870 thetas: (1, 8, 8, 2)
log-likelihood: -13.298913 thetas: (4, 4, 0, 1)
log-likelihood: -13.292380 thetas: (2, 8, 8, 0)
log-likelihood: -13.268245 thetas: (4, 8, 0, 0)
log-likelihood: -13.265694 thetas: (1, 4, 4, 1)
log-likelihood: -13.265577 thetas: (2, 8, 4, 1)
log-likelihood: -13.265103 thetas: (1, 2, 0, 2)
log-likelihood: -13.255779 thetas: (1, 8, 4, 4)
log-likelihood: -13.251744 thetas: (1, 8, 2, 8)
log-likelihood: -13.211747 thetas: (2, 4, 0, 8)
log-likelihood: -13.200141 thetas: (2, 4, 4, 0)
log-likelihood: -13.174728 thetas: (2, 8, 1, 4)
log-likelihood: -13.168550 thetas: (2, 8, 2, 2)
log-likelihood: -13.158849 thetas: (2, 4, 2, 1)
log-likelihood: -13.148821 thetas: (1, 4, 2, 2)
log-likelihood: -13.137398 thetas: (4, 4, 0, 0)
log-likelihood: -13.136519 thetas: (1, 2, 0, 1)
log-likelihood: -13.129204 thetas: (1, 4, 1, 4)
log-likelihood: -13.123441 thetas: (1, 2, 0, 0)
log-likelihood: -13.103120 thetas: (2, 4, 1, 2)
log-likelihood: -13.097434 thetas: (1, 8, 8, 1)
log-likelihood: -13.025361 thetas: (1, 4, 4, 0)
log-likelihood: -13.019499 thetas: (2, 8, 0, 8)
log-likelihood: -12.989811 thetas: (2, 8, 2, 1)
log-likelihood: -12.989158 thetas: (1, 8, 4, 2)
log-likelihood: -12.985268 thetas: (2, 8, 4, 0)
log-likelihood: -12.978467 thetas: (1, 8, 1, 8)
log-likelihood: -12.968306 thetas: (1, 4, 2, 1)
log-likelihood: -12.950294 thetas: (1, 8, 2, 4)
log-likelihood: -12.946940 thetas: (2, 4, 1, 1)
log-likelihood: -12.944951 thetas: (2, 4, 0, 4)
log-likelihood: -12.939244 thetas: (2, 8, 1, 2)
log-likelihood: -12.937233 thetas: (2, 4, 2, 0)
log-likelihood: -12.888249 thetas: (1, 4, 1, 2)
log-likelihood: -12.839782 thetas: (1, 4, 0, 8)
log-likelihood: -12.772693 thetas: (1, 8, 4, 1)
log-likelihood: -12.760425 thetas: (2, 8, 1, 1)
log-likelihood: -12.739128 thetas: (2, 8, 0, 4)
log-likelihood: -12.728505 thetas: (2, 4, 0, 2)
log-likelihood: -12.727669 thetas: (1, 4, 2, 0)
log-likelihood: -12.724980 thetas: (2, 4, 1, 0)
log-likelihood: -12.714444 thetas: (1, 8, 8, 0)
log-likelihood: -12.709352 thetas: (2, 8, 2, 0)
log-likelihood: -12.707608 thetas: (1, 4, 1, 1)
log-likelihood: -12.683636 thetas: (1, 8, 2, 2)
log-likelihood: -12.676980 thetas: (1, 8, 1, 4)
log-likelihood: -12.571873 thetas: (2, 4, 0, 1)
log-likelihood: -12.553869 thetas: (1, 4, 0, 4)
log-likelihood: -12.503378 thetas: (2, 8, 0, 2)
log-likelihood: -12.479758 thetas: (2, 8, 1, 0)
log-likelihood: -12.467120 thetas: (1, 8, 2, 1)
log-likelihood: -12.466504 thetas: (1, 4, 1, 0)
log-likelihood: -12.465147 thetas: (0, 0, 0, 0)
log-likelihood: -12.465147 thetas: (0, 1, 0, 0)
log-likelihood: -12.465147 thetas: (0, 2, 0, 0)
log-likelihood: -12.465147 thetas: (0, 4, 0, 0)
log-likelihood: -12.465147 thetas: (0, 8, 0, 0)
log-likelihood: -12.440086 thetas: (0, 0, 0, 8)
log-likelihood: -12.440086 thetas: (0, 1, 0, 8)
log-likelihood: -12.440086 thetas: (0, 2, 0, 8)
log-likelihood: -12.440086 thetas: (0, 4, 0, 8)
log-likelihood: -12.440086 thetas: (0, 8, 0, 8)
log-likelihood: -12.410263 thetas: (1, 8, 1, 2)
log-likelihood: -12.389590 thetas: (1, 8, 4, 0)
log-likelihood: -12.348776 thetas: (2, 4, 0, 0)
log-likelihood: -12.328976 thetas: (1, 8, 0, 8)
log-likelihood: -12.324249 thetas: (2, 8, 0, 1)
log-likelihood: -12.312413 thetas: (1, 4, 0, 2)
log-likelihood: -12.193664 thetas: (1, 8, 1, 1)
log-likelihood: -12.131088 thetas: (1, 4, 0, 1)
log-likelihood: -12.114983 thetas: (0, 0, 0, 4)
log-likelihood: -12.114983 thetas: (0, 1, 0, 4)
log-likelihood: -12.114983 thetas: (0, 2, 0, 4)
log-likelihood: -12.114983 thetas: (0, 4, 0, 4)
log-likelihood: -12.114983 thetas: (0, 8, 0, 4)
log-likelihood: -12.083815 thetas: (1, 8, 2, 0)
log-likelihood: -12.042781 thetas: (2, 8, 0, 0)
log-likelihood: -12.027251 thetas: (1, 8, 0, 4)
log-likelihood: -11.887450 thetas: (1, 4, 0, 0)
log-likelihood: -11.810516 thetas: (0, 0, 0, 2)
log-likelihood: -11.810516 thetas: (0, 1, 0, 2)
log-likelihood: -11.810516 thetas: (0, 2, 0, 2)
log-likelihood: -11.810516 thetas: (0, 4, 0, 2)
log-likelihood: -11.810516 thetas: (0, 8, 0, 2)
log-likelihood: -11.810038 thetas: (1, 8, 1, 0)
log-likelihood: -11.760148 thetas: (1, 8, 0, 2)
log-likelihood: -11.544969 thetas: (0, 0, 0, 1)
log-likelihood: -11.544969 thetas: (0, 1, 0, 1)
log-likelihood: -11.544969 thetas: (0, 2, 0, 1)
log-likelihood: -11.544969 thetas: (0, 4, 0, 1)
log-likelihood: -11.544969 thetas: (0, 8, 0, 1)
log-likelihood: -11.543014 thetas: (1, 8, 0, 1)
log-likelihood: -11.157299 thetas: (1, 8, 0, 0)
Best Log Likelihood: -11.157299 for thetas: (1,8,0,0)
Worse Log Likelihood: -15.977750 for thetas: (8,1,8,8)

In [15]:
fig = plt.figure()
fig.set_size_inches(14, 8)

x_test = np.linspace(-1, 1, N_test)
y_test = true_mean_function(x_test)

plt.subplot(1, 2, 1)
means, sigmas = gp_predictive_distribution(x_train, t_train, x_test, np.array(results[-1][1:]), beta)
gp_plot(x_test, y_test, means, sigmas, x_train, t_train, np.array(results[-1]), beta)
plt.title('Best likelihood %.2f, theta = (%d, %d, %d, %d)' % results[-1])

plt.subplot(1, 2, 2)
means, sigmas = gp_predictive_distribution(x_train, t_train, x_test, np.array(results[0][1:]), beta)
gp_plot(x_test, y_test, means, sigmas, x_train, t_train, np.array(results[0]), beta)
plt.title('Worse likelihood %.2f, theta = (%d, %d, %d, %d)' % results[0])
plt.show()


3.5 Questions (10 points)

Selecting kernel functions can be somewhat of an art. There are charateristics of kernel functions that are useful for some data sets, but not others. Complicating the matter is the ability to combine kernels with different characteristics (long term trends + seasonal fluctuations). Describe the charactistics of the kernel function we are using in terms of (signal, scale, offsets, etc). You may want to play around with $\thetav$ and see what each parameter does/affects/etc. (5 points) Describe why the best parameters work well for the training data and explain why the bad parameter settings perform poorly (in terms of the first part of the question). (5 points)

Answer

As we showed in 3.2 the kernal function can be decomposed as a combination of other kernel functions, we shall continue our discussion from that point. $k(x_n,x_m)= \theta_0 k_1(x_n,x_m) + \theta_2 k_2(x_n,x_m) + \theta_3 k_3(x_n,x_m)$

$k_1$ kernel corresponds to the similarity measure between points (signal capturing). $k_2 $ and $k_3$ operate as offsets and it was observed that it’s possible to achieve a good fit to the true function without the offset. Theta parameters have also some interpretation. $\theta_1$ corresponds to the bandwidth of the unnormalised gaussian distribution, and $\theta_0$ corresponds to the scaling factor of the distribution. Since $k_2(x_n,x_m)=x_n^0x_m^0$, $\theta_2$ plays a role of the offset, and $\theta_3$ is the scaling factor of the linear kernel.

As we stated previously, it’s possible to achieve a good data fit without the offset (i.e. only with a similarity kernel and two thetas), the opposite is also true, i.e. it’s not possible to achieve a good data fit without the similarity kernel and the corresponding thetas. The worst performance models have been observed to have low values for $\theta_0$ and $\theta_1$, and that implies that it can’t capture the signal. The key aspect in the kernel is obviously the similarity measure between two data points, and say, if both $\theta_0$ and $\theta_1$ are set close to 0, then the kernel loses its power, and the expectation on the plot becomes similar to a constant function.

3.6 Bonus: Implementation (20 points)

Implement gradient-ascent (or descent if you wish) using the combination of a) the log-likelihood objective function and b) the gradients you calculated above. Run on the training data above and show the log-likehood curve as it learns and a plot of the final model. Feel free to use available software (eg search for "minimize.py" which uses conjugate gradient descent, or something in scipy). NB: log-likelihood should be monotonically increasing. You are encouraged to also search and use "checkgrad". (20 points)


In [16]:
from scipy import optimize

# Overloading k_n_m to accept phis instead of thetas
# The implementation of computeK, computeC will be the same.
def k_n_m(xn, xm, phis):
    return np.exp(phis[0]) * np.exp(-np.exp(phis[1]) * (xn - xm) ** 2 / 2) + np.exp(phis[2]) + np.exp(phis[3]) * xn * xm

log_likelihoods = []
# Log Likelihood function should be changed a bit to accept phis first
def log_likelihood(phis, x_train, t_train, beta):
    loglik = gp_log_likelihood(x_train, t_train, phis, beta)
    log_likelihoods.append(loglik)
    return -loglik

def grad_log_lklh(phis, x_train, t_train, beta):
    thetas = np.exp(phis)
    diff_xn_xm2 = (np.tile(x_train[np.newaxis].T, (1, len(x_train))) - np.tile(x_train, (len(x_train), 1))) ** 2
    derivP = [0] * 4
    derivP[0] = np.exp(-thetas[1] * diff_xn_xm2 / 2 + phis[0])
    derivP[1] = -derivP[0] * diff_xn_xm2 * thetas[1] / 2
    derivP[2] = np.tile(thetas[2], (len(x_train), len(x_train)))
    derivP[3] = x_train[np.newaxis].T.dot(x_train[np.newaxis]) * thetas[3]
    derivLogLklh = np.zeros(4)
    C = computeC(x_train, phis, beta)
    invC = np.linalg.inv(C)
    for i in range(4):
        derivLogLklh[i] = -0.5 * (-np.trace(invC.dot(derivP[i])) +
                                  t_train.T.dot(invC).dot(derivP[i]).dot(invC).dot(t_train))
    return derivLogLklh

phis = np.array([2, 6, 0, 0])
error = optimize.check_grad(log_likelihood, grad_log_lklh, phis, x_train, t_train, beta)
print error
result = optimize.minimize(log_likelihood, phis, jac=grad_log_lklh, method='CG', args=(x_train, t_train, beta))

print result.success
print 'Best grid search likelihood %d' % results[-1][0]
# Log Likelihood is smaller than the result from the grid-search
print 'Obtained likelihood:', -log_likelihood(result.x, x_train, t_train, beta)


3.96141850722e-07
False
Best grid search likelihood -11
Obtained likelihood: -9.87738138932

Obtained likelihood turns to be better than the one in grid search


In [17]:
fig = plt.figure()
fig.set_size_inches(14, 8)

means, sigmas = gp_predictive_distribution(x_train, t_train, x_test, result.x, beta)
gp_plot(x_test, y_test, means, sigmas, x_train, t_train, result.x, beta)
plt.show()



In [18]:
plt.plot(np.arange(len(log_likelihoods)), log_likelihoods)
plt.ylabel('Log likelihood')
plt.xlabel('Iteration')
plt.show()



In [ ]: