In [1]:
# Import some libraries that will be necessary for working with data and displaying plots
# To visualize plots in the notebook
%matplotlib inline
#import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import scipy.io # To read matlab files
from sklearn.preprocessing import PolynomialFeatures
from sklearn import svm
from sklearn import model_selection
import pylab
pylab.rcParams['figure.figsize'] = 9, 7
In [2]:
# Load dataset
matvar = scipy.io.loadmat('Dataset2D.mat')
Xtrain = matvar['xTrain']
Xtest = matvar['xTest']
Xval = matvar['xVal']
# We must use astype(int) to convert the original target values (which are unsigned integers) to int.
Ytrain = matvar['yTrain'].astype(int)
Ytest = matvar['yTest'].astype(int)
Yval = matvar['yVal'].astype(int)
In [3]:
# <SOL>
# </SOL>
# Check normalization
print(np.mean(Xtrain, axis=0))
print(np.mean(Xval, axis=0))
print(np.mean(Xtest, axis=0))
print(np.std(Xtrain, axis=0))
print(np.std(Xval, axis=0))
print(np.std(Xtest, axis=0))
Visualize the input variables from the training set in a 2-dimensional plot.
In [4]:
# Data visualization. This works for dimension 2 only.
if Xtrain.shape[1]==2:
plt.scatter(Xtrain[:, 0], Xtrain[:, 1], c=Ytrain.flatten(), s=50, cmap='copper')
plt.xlabel("$x_0$", fontsize=14)
plt.ylabel("$x_1$", fontsize=14)
plt.show()
First we will analyze the behavior of logistic regression for this dataset.
Implement a function to compute the MAP estimate of the parameters of a linear logistic regression model with Gaussian prior and a given value of the inverse regularization parameter $C$. The method should return the estimated parameter and the negative log-likelihood, $\text{NLL}({\bf w})$. The sintaxis must be
w, NLL = logregFitR(Z_tr, Y_tr, rho, C, n_it)
where
Z_tr is the input training data matrix (one instance per row)Y_tr contains the labels of corresponding to each row in the data matrixrho is the learning stepC is the inverse regularizern_it is the number of iterations
In [5]:
# <SOL>
#</SOL>
Compute the MAP estimate for a polynomial regression with degree 5, for $C$ ranging from -0.01 to 100. Sample $C$ uniformly in a log scale, an plot using plt.semilogx.
Plot the final value of $\text{NLL}$ as a function of $C$. Can you explain the qualitative behavior of $\text{NLL}$ as $C$ grows?
The plot may show some oscillation because of the random noise introduced by random initializations of the learning algoritm. In order to smooth the results, you can initialize the random seed right before calling the logregFitR method, using
np.random.seed(3)
In [6]:
# <SOL>
# </SOL>
In [7]:
# This is a plot for the last value of C used in the code above.
if Xtrain.shape[1]==2:
# Create a regtangular grid.
x_min, x_max = Xtrain[:, 0].min(), Xtrain[:, 0].max()
y_min, y_max = Xtrain[:, 1].min(), Xtrain[:, 1].max()
dx = x_max - x_min
dy = y_max - y_min
h = dy /400
xx, yy = np.meshgrid(np.arange(x_min - 0.1 * dx, x_max + 0.1 * dx, h),
np.arange(y_min - 0.1 * dx, y_max + 0.1 * dy, h))
X_grid = np.array([xx.ravel(), yy.ravel()]).T
# Compute Z_grid
Z_grid = poly.fit_transform(X_grid)
n_grid = Z_grid.shape[0]
Zn, mz, sz = normalize(Z_grid[:,1:], mz, sz)
Z_grid = np.concatenate((np.ones((n_grid,1)), Zn), axis=1)
# Compute the classifier output for all samples in the grid.
pp, dd = logregPredict(Z_grid, w)
pp = pp.reshape(xx.shape)
# Paint output maps
plt.figure()
pylab.rcParams['figure.figsize'] = 8, 4 # Set figure size
for i in [1, 2]:
ax = plt.subplot(1,2,i)
ax.set_xlabel('$x_0$')
ax.set_ylabel('$x_1$')
ax.axis('equal')
if i==1:
ax.contourf(xx, yy, pp, cmap=plt.cm.copper)
else:
ax.contourf(xx, yy, np.round(pp), cmap=plt.cm.copper)
ax.scatter(Xtrain[:, 0], Xtrain[:, 1], c=Ytrain.flatten(), s=4, cmap='summer')
plt.show()
In [8]:
# <SOL>
# </SOL>
In this section we will train an SVM with Gaussian kernels. In this case, we will select parameter $C$ of the SVM by cross-validation.
Join the training and validation datasets in a single input matrix X_tr2 and a single label vector Y_tr2
In [9]:
# <SOL>
# </SOL>
In [10]:
# <SOL>
# </SOL>
Repeate exercise 3.2 for $\gamma=5$ and different values of $C$, ranging from $10^{-3}$ to $10^{4}$, obtained by uniform sampling in a logarithmic scale. Plot the average number of errors as function of $C$.
Note that fitting the SVM may take some time, specially for the largest values of $C$.
In [11]:
# <SOL>
# </SOL>
In [12]:
# <SOL>
# </SOL>
In [13]:
# <SOL>
# </SOL>