In this lab, you will perform early stopping and save the model that minimizes the total loss on the validation data for every iteration.
( Note: Early Stopping is a general term. We will focus on the variant where we use the validation data. You can also use a pre-determined number iterations. )
Estimated Time Needed: 15 min
We'll need the following libraries, and set the random seed.
In [ ]:
# Import the libraries and set random seed
from torch import nn
import torch
import numpy as np
import matplotlib.pyplot as plt
from torch import nn,optim
from torch.utils.data import Dataset, DataLoader
torch.manual_seed(1)
First let's create some artificial data, in a dataset class. The class will include the option to produce training data or validation data. The training data includes outliers.
In [ ]:
# Create Data Class
class Data(Dataset):
# Constructor
def __init__(self, train = True):
if train == True:
self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
self.f = -3 * self.x + 1
self.y = self.f + 0.1 * torch.randn(self.x.size())
self.len = self.x.shape[0]
if train == True:
self.y[50:] = 20
else:
self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
self.y = -3 * self.x + 1
self.len = self.x.shape[0]
# Getter
def __getitem__(self, index):
return self.x[index], self.y[index]
# Get Length
def __len__(self):
return self.len
We create two objects, one that contains training data and a second that contains validation data, we will assume the training data has the outliers.
In [ ]:
#Create train_data object and val_data object
train_data = Data()
val_data = Data(train = False)
We overlay the training points in red over the function that generated the data. Notice the outliers are at x=-3 and around x=2
In [ ]:
# Plot the training data points
plt.plot(train_data.x.numpy(), train_data.y.numpy(), 'xr')
plt.plot(train_data.x.numpy(), train_data.f.numpy())
plt.show()
Create linear regression model class.
In [ ]:
# Create linear regression model class
from torch import nn
class linear_regression(nn.Module):
# Constructor
def __init__(self, input_size, output_size):
super(linear_regression, self).__init__()
self.linear = nn.Linear(input_size, output_size)
# Predition
def forward(self, x):
yhat = self.linear(x)
return yhat
Create the model object
In [ ]:
# Create the model object
model = linear_regression(1, 1)
We create the optimizer, the criterion function and a Data Loader object.
In [ ]:
# Create optimizer, cost function and data loader object
optimizer = optim.SGD(model.parameters(), lr = 0.1)
criterion = nn.MSELoss()
trainloader = DataLoader(dataset = train_data, batch_size = 1)
Run several epochs of gradient descent and save the model that performs best on the validation data.
In [ ]:
# Train the model
LOSS_TRAIN = []
LOSS_VAL = []
n=1;
min_loss = 1000
def train_model_early_stopping(epochs, min_loss):
for epoch in range(epochs):
for x, y in trainloader:
yhat = model(x)
loss = criterion(yhat, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_train = criterion(model(train_data.x), train_data.y).data
loss_val = criterion(model(val_data.x), val_data.y).data
LOSS_TRAIN.append(loss_train)
LOSS_VAL.append(loss_val)
if loss_val < min_loss:
value = epoch
min_loss = loss_val
torch.save(model.state_dict(), 'best_model.pt')
train_model_early_stopping(20, min_loss)
View the loss for every iteration on the training set and validation set.
In [ ]:
# Plot the loss
plt.plot(LOSS_TRAIN, label = 'training loss')
plt.plot(LOSS_VAL, label = 'validation loss')
plt.xlabel("epochs")
plt.ylabel("Loss")
plt.legend(loc = 'upper right')
plt.show()
We will create a new linear regression object; we will use the parameters saved in the early stopping. The model must be the same input dimension and output dimension as the original model.
In [ ]:
# Create a new linear regression model object
model_best = linear_regression(1, 1)
Load the model parameters torch.load()
, then assign them to the object model_best
using the method load_state_dict
.
In [ ]:
# Assign the best model to model_best
model_best.load_state_dict(torch.load('best_model.pt'))
Let's compare the prediction from the model obtained using early stopping and the model derived from using the maximum number of iterations.
In [ ]:
plt.plot(model_best(val_data.x).data.numpy(), label = 'best model')
plt.plot(model(val_data.x).data.numpy(), label = 'maximum iterations')
plt.plot(val_data.y.numpy(), 'rx', label = 'true line')
plt.legend()
plt.show()
We can see the model obtained via early stopping fits the data points much better. For more variations of early stopping see:
Prechelt, Lutz. "Early stopping-but when?." Neural Networks: Tricks of the trade. Springer, Berlin, Heidelberg, 1998. 55-69.
Joseph Santarcangelo has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.
Other contributors: Michelle Carey, Mavis Zhou
Copyright © 2018 cognitiveclass.ai. This notebook and its source code are released under the terms of the MIT License.