In this lab, you will deal with several problems associated with optimization and see how momentum can improve your results.
Estimated Time Needed: 25 min
Import the following libraries that you'll use for this lab:
In [ ]:
# These are the libraries will be used for this lab.
import torch
import torch.nn as nn
import matplotlib.pylab as plt
import numpy as np
torch.manual_seed(0)
This function will plot a cubic function and the parameter values obtained via Gradient Descent.
In [ ]:
# Plot the cubic
def plot_cubic(w, optimizer):
LOSS = []
W = torch.arange(-4, 4, 0.1)
for w.state_dict()['linear.weight'][0] in W:
LOSS.append(cubic(w(torch.tensor([[1.0]]))).item())
w.state_dict()['linear.weight'][0] = 4.0
n_epochs = 10
parameter = []
loss_list = []
# n_epochs
# Use PyTorch custom module to implement a ploynomial function
for n in range(n_epochs):
optimizer.zero_grad()
loss = cubic(w(torch.tensor([[1.0]])))
loss_list.append(loss)
parameter.append(w.state_dict()['linear.weight'][0].detach().data.item())
loss.backward()
optimizer.step()
plt.plot(parameter, loss_list, 'ro', label = 'parameter values')
plt.plot(W.numpy(), LOSS, label = 'objective function')
plt.xlabel('w')
plt.ylabel('l(w)')
plt.legend()
This function will plot a 4th order function and the parameter values obtained via Gradient Descent. You can also add Gaussian noise with a standard deviation determined by the parameter std
.
In [ ]:
# Plot the fourth order function and the parameter values
def plot_fourth_order(w, optimizer, std = 0, color = 'r', paramlabel = 'parameter values', objfun = True):
W = torch.arange(-4, 6, 0.1)
LOSS = []
for w.state_dict()['linear.weight'][0] in W:
LOSS.append(fourth_order(w(torch.tensor([[1.0]]))).item())
w.state_dict()['linear.weight'][0] = 6
n_epochs = 100
parameter = []
loss_list = []
#n_epochs
for n in range(n_epochs):
optimizer.zero_grad()
loss = fourth_order(w(torch.tensor([[1.0]]))) + std * torch.randn(1, 1)
loss_list.append(loss)
parameter.append(w.state_dict()['linear.weight'][0].detach().data.item())
loss.backward()
optimizer.step()
# Plotting
if objfun:
plt.plot(W.numpy(), LOSS, label = 'objective function')
plt.plot(parameter, loss_list, 'ro',label = paramlabel, color = color)
plt.xlabel('w')
plt.ylabel('l(w)')
plt.legend()
This is a custom module. It will behave like a single parameter value. We do it this way so we can use PyTorch's build-in optimizers .
In [ ]:
# Create a linear model
class one_param(nn.Module):
# Constructor
def __init__(self, input_size, output_size):
super(one_param, self).__init__()
self.linear = nn.Linear(input_size, output_size, bias = False)
# Prediction
def forward(self, x):
yhat = self.linear(x)
return yhat
We create an object w
, when we call the object with an input of one, it will behave like an individual parameter value. i.e w(1)
is analogous to $w$
In [ ]:
# Create a one_param object
w = one_param(1, 1)
Let's create a cubic function with Saddle points
In [ ]:
# Define a function to output a cubic
def cubic(yhat):
out = yhat ** 3
return out
We create an optimizer with no momentum term
In [ ]:
# Create a optimizer without momentum
optimizer = torch.optim.SGD(w.parameters(), lr = 0.01, momentum = 0)
We run several iterations of stochastic gradient descent and plot the results. We see the parameter values get stuck in the saddle point.
In [ ]:
# Plot the model
plot_cubic(w, optimizer)
we create an optimizer with momentum term of 0.9
In [ ]:
# Create a optimizer with momentum
optimizer = torch.optim.SGD(w.parameters(), lr = 0.01, momentum = 0.90)
We run several iterations of stochastic gradient descent with momentum and plot the results. We see the parameter values do not get stuck in the saddle point.
In [ ]:
# Plot the model
plot_cubic(w, optimizer)
In this section, we will create a fourth order polynomial with a local minimum at 4 and a global minimum a -2. We will then see how the momentum parameter affects convergence to a global minimum. The fourth order polynomial is given by:
In [ ]:
# Create a function to calculate the fourth order polynomial
def fourth_order(yhat):
out = torch.mean(2 * (yhat ** 4) - 9 * (yhat ** 3) - 21 * (yhat ** 2) + 88 * yhat + 48)
return out
We create an optimizer with no momentum term. We run several iterations of stochastic gradient descent and plot the results. We see the parameter values get stuck in the local minimum.
In [ ]:
# Make the prediction without momentum
optimizer = torch.optim.SGD(w.parameters(), lr = 0.001)
plot_fourth_order(w, optimizer)
We create an optimizer with a momentum term of 0.9. We run several iterations of stochastic gradient descent and plot the results. We see the parameter values reach a global minimum.
In [ ]:
# Make the prediction with momentum
optimizer = torch.optim.SGD(w.parameters(), lr = 0.001, momentum = 0.9)
plot_fourth_order(w, optimizer)
In this section, we will create a fourth order polynomial with a local minimum at 4 and a global minimum a -2, but we will add noise to the function when the Gradient is calculated. We will then see how the momentum parameter affects convergence to a global minimum.
with no momentum, we get stuck in a local minimum
In [ ]:
# Make the prediction without momentum when there is noise
optimizer = torch.optim.SGD(w.parameters(), lr = 0.001)
plot_fourth_order(w, optimizer, std = 10)
with momentum, we get to the global minimum
In [ ]:
# Make the prediction with momentum when there is noise
optimizer = torch.optim.SGD(w.parameters(), lr = 0.001,momentum = 0.9)
plot_fourth_order(w, optimizer, std = 10)
Create two SGD
objects with a learning rate of 0.001
. Use the default momentum parameter value for one and a value of 0.9
for the second. Use the function plot_fourth_order
with an std=100
, to plot the different steps of each. Make sure you run the function on two independent cells.
In [ ]:
# Practice: Create two SGD optimizer with lr = 0.001, and one without momentum and the other with momentum = 0.9. Plot the result out.
# Type your code here
Double-click here for the solution. <!-- optimizer1 = torch.optim.SGD(w.parameters(), lr = 0.001) plot_fourth_order(w, optimizer1, std = 100, color = 'black', paramlabel = 'parameter values with optimizer 1')
optimizer2 = torch.optim.SGD(w.parameters(), lr = 0.001, momentum = 0.9) plot_fourth_order(w, optimizer2, std = 100, color = 'red', paramlabel = 'parameter values with optimizer 2', objfun = False) -->
Joseph Santarcangelo has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.
Other contributors: Michelle Carey, Mavis Zhou
Copyright © 2018 cognitiveclass.ai. This notebook and its source code are released under the terms of the MIT License.