Assume we have $n$ vectors $\vec{x}_1, \cdots, \vec{x}_n$. We also assume that for each $\vec{x}_i$ we have observed a target value $t_i$, where $$ \begin{gather} t_i = \vec{w}^T \vec{x_i} + \epsilon \\ \epsilon \sim \mathcal{N}(0, \beta^{-1}) \end{gather} $$ where $\epsilon$ is the "noise term".
(a) Quick quiz: what is the likelihood given $\vec{w}$? That is, what's $p(t_i | \vec{x}_i, \vec{w})$?
Assume we have $n$ vectors $\vec{x}_1, \cdots, \vec{x}_n$. We also assume that for each $\vec{x}_i$ we have observed a target value $t_i$, sampled IID. We will also put a prior on $\vec{w}$, using PSD matrix $\Sigma$. $$ \begin{gather} t_i = \vec{w}^T \vec{x_i} + \epsilon \\ \epsilon \sim \mathcal{N}(0, \beta^{-1}) \\ \vec{w} \sim \mathcal{N}(0, \Sigma) \end{gather} $$ Note: the difference here is that our prior is a multivariate gaussian with non-identity covariance! Also we let $\mathcal{X} = \{\vec{x}_1, \cdots, \vec{x}_n\}$
(a) Compute the log posterior function, $\log p(\vec{w}|\vec{t}, \mathcal{X},\beta)$
Hint: use Bayes' Rule
(b) Compute the MAP estimate of $\vec{w}$ for this model
Hint: the solution is very similar to the MAP estimate for a gaussian prior with identity covariance
Download the file mnist_49_3000.mat from Canvas. This is a subset of the MNIST handwritten digit database, which is a well-known benchmark database for classification algorithms. This subset contains examples of the digits 4 and 9. The data file contains variables x and y, with the former containing patterns and the latter labels. The images are stored as column vectors.
Exercise:
In [ ]:
# Enable plot here inside Jupyter Notebook
%matplotlib inline
# all the packages you need
import numpy as np
import matplotlib.pyplot as plt
import scipy.io
from numpy.linalg import inv
# load data from .mat
mat = scipy.io.loadmat('mnist_49_3000.mat')
Implement Newton’s method to find a minimizer of the regularized negative log likelihood. Try setting $\lambda$ = 10. Use the first 2000 examples as training data, and the last 1000 as test data.
Exercise
of which $\nabla^2 f(\vx_n)$ is Hessian matrix which is the second order derivative $$ \nabla^2 f = \begin{bmatrix} \frac{\partial f}{\partial x_1\partial x_1} & \cdots & \frac{\partial f}{\partial x_1\partial x_n}\\ \vdots & \ddots & \vdots\\ \frac{\partial f}{\partial x_n\partial x_1} & \cdots & \frac{\partial f}{\partial x_n\partial x_n} \end{bmatrix} $$
$$ \begin{align} \nabla^2 E(\vw) &= \nabla_\vw \nabla_\vw E(\vw) \\ &= \sum \nolimits_{n=1}^N \phi(\vx_n) r_n(\vw) \phi(\vx_n)^T + \lambda I \end{align} $$of which $r_n(\vw) = \sigma(\vw^T \phi(\vx_n)) \cdot ( 1 - \sigma(\vw^T \phi(\vx_n)) )$
Exercise
Exercise