Seminar 0 (Linear models, Optimization)

In this seminar you will implement a simple linear classifier using numpy and your brain.

Two-dimensional classification



In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import random
from IPython import display
from sklearn import datasets, preprocessing

(X, y) = datasets.make_circles(n_samples=1024, shuffle=True, noise=0.2, factor=0.4)
ind = np.logical_or(y==1, X[:,1] > X[:,0] - 0.5)
X = X[ind,:]
m = np.array([[1, 1], [-2, 1]])
X = preprocessing.scale(X)
y = y[ind]
y = 2*y - 1
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.show()



In [ ]:

    
h = 0.01
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
def visualize(X, y, w, loss, n_iter):
    plt.clf()
    Z = classify(np.c_[xx.ravel(), yy.ravel()], w)
    Z = Z.reshape(xx.shape)
    plt.subplot(1,2,1)
    plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.subplot(1,2,2)
    plt.plot(loss)
    ymin, ymax = plt.ylim()
    plt.ylim(0, ymax)
    display.clear_output(wait=True)
    display.display(plt.gcf())

Your task starts here

First, let's write function that predicts class given X.

Since the problem above isn't linearly separable, we add quadratic features to the classifier. This transformation is implemented in the expand function.

don't forget to expand X inside classify and other functions

Classifying sample should not be much harder that computing sign of dot product.



In [ ]:

    
def expand(X):
    X_ = np.zeros((X.shape[0], 6))
    X_[:,0:2] = X
    X_[:,2:4] = X**2
    X_[:,4] = X[:,0] * X[:,1]
    X_[:,5] = 1
    return X_

def classify(X, w):
    """
    Given feature matrix X [n_samples,2] and weight vector w [6],
    return an array of +1 or -1 predictions"""
    <your code here>

The loss you should try to minimize is the Hinge Loss.

$$ L = {1 \over N} \sum_i max(0,1-y_i \cdot \vec w \vec x_i) $$



In [2]:

    
def compute_loss(X, y, w):
    """
    Given feature matrix X [n_samples,2], target vector [n_samples] of +1/-1,
    and weight vector w [6], compute scalar loss function using formula above.
    """
    <your code here>

def compute_grad(X, y, w):
    """
    Given feature matrix X [n_samples,2], target vector [n_samples] of +1/-1,
    and weight vector w [6], compute vector [6] of derivatives of L over each weights.
    """
    <your code here>

Training

Find an optimal learning rate for gradient descent for given batch size.

You can see the example of correct output below this cell before you run it.

Don't change the batch size!



In [4]:

    
w = np.array([1,0,0,0,0,0])

alpha = 0.0 # learning rate

n_iter = 50
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):
    ind = random.sample(range(X.shape[0]), batch_size)
    loss[i] = compute_loss(X, y, w)
    visualize(X[ind,:], y[ind], w, loss, n_iter)
    
    w = w - alpha * compute_grad(X[ind,:], y[ind], w)

visualize(X, y, w, loss, n_iter)
plt.clf()









    












    





<matplotlib.figure.Figure at 0x7fb0e06c7650>

Implement gradient descent with momentum and test it's performance for different learning rate and momentum values.



In [ ]:

    
w = np.array([1,0,0,0,0,0])

alpha = 0.0 # learning rate
mu    = 0.0 # momentum

n_iter = 50
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):
    ind = random.sample(range(X.shape[0]), batch_size)
    loss[i] = compute_loss(X, y, w)
    visualize(X[ind,:], y[ind], w, loss, n_iter)
    
    <update w and anything else here>

visualize(X, y, w, loss, n_iter)
plt.clf()

Implement RMSPROP algorithm



In [ ]:

    
w = np.array([1,0,0,0,0,0])

alpha = 0.0 # learning rate
mean_squared_norm = 0.0 #moving average of gradient norm squared

n_iter = 50
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):
    ind = random.sample(range(X.shape[0]), batch_size)
    loss[i] = compute_loss(X, y, w)
    visualize(X[ind,:], y[ind], w, loss, n_iter)
    
    <update w and anything else here>


visualize(X, y, w, loss, n_iter)
plt.clf()

Which optimization method you consider the best? Type your answer in the cell below

Bonus quest

try the same thing for Adagrad, Adam and anything else you find suitable



In [ ]: