Test Sigmoid, Tanh, and Relu Activations Functions on the MNIST Dataset

In this lab, you will test sigmoid, tanh, and relu activation functions on the MNIST dataset.

Neural Network Module and Training Function

Define Several Neural Network, Criterion Function, and Optimizer

Estimated Time Needed: 25 min

You'll need the following libraries:



In [1]:

    
!conda install -y torchvision
import torch 
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
import torch.nn.functional as F
import matplotlib.pylab as plt
import numpy as np









    



Solving environment: done

# All requested packages already installed.

Neural Network Module and Training Function

Define the neural network module or class:



In [2]:

    
class Net(nn.Module):
    def __init__(self,D_in,H,D_out):
        super(Net,self).__init__()
        self.linear1=nn.Linear(D_in,H)
        self.linear2=nn.Linear(H,D_out)
       
    def forward(self,x):
        x=torch.sigmoid(self.linear1(x))  
        x=self.linear2(x)
        return x

Define the class with the Tanh activation function:



In [3]:

    
class NetTanh(nn.Module):
    def __init__(self,D_in,H,D_out):
        super(NetTanh,self).__init__()
        self.linear1=nn.Linear(D_in,H)
        self.linear2=nn.Linear(H,D_out)
        
    def forward(self,x):
        x=torch.tanh(self.linear1(x))  
        x=self.linear2(x)
        return x

Define the class for the Relu activation functiona:



In [4]:

    
class NetRelu(nn.Module):
    def __init__(self,D_in,H,D_out):
        super(NetRelu,self).__init__()
        self.linear1=nn.Linear(D_in,H)
        self.linear2=nn.Linear(H,D_out)
        
    def forward(self,x):
        x=F.relu(self.linear1(x))  
        x=self.linear2(x)
        return x

Define a function to train the model. In this case, the function returns a Python dictionary to store the training loss and accuracy on the validation data.



In [5]:

    
def train(model,criterion, train_loader,validation_loader, optimizer, epochs=100):
    i=0
    useful_stuff={'training_loss':[],'validation_accuracy':[]}  
    
    #n_epochs
    for epoch in range(epochs):
        for i,(x, y) in enumerate(train_loader):

            #clear gradient 
            optimizer.zero_grad()
            #make a prediction logits 
            z=model(x.view(-1,28*28))
            # calculate loss 
            loss=criterion(z,y)
    
            # calculate gradients of parameters 
            loss.backward()
            # update parameters 
            optimizer.step()
            useful_stuff['training_loss'].append(loss.data.item())
        correct=0
        for x, y in validation_loader:
            #perform a prediction on the validation  data  
            yhat=model(x.view(-1,28*28))
            
            _,lable=torch.max(yhat,1)
            correct+=(lable==y).sum().item()
 
    
        accuracy=100*(correct/len(validation_dataset))
   
        useful_stuff['validation_accuracy'].append(accuracy)
    
    return useful_stuff

Prepare Data

Load the training dataset by setting the parameters train to True and convert it to a tensor by placing a transform object in the argument transform.



In [6]:

    
train_dataset=dsets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())

Load the testing dataset by setting the parameter train to False and convert it to a tensor by placing a transform object in the argument transform.



In [7]:

    
validation_dataset=dsets.MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())

Create the criterion function:



In [8]:

    
criterion=nn.CrossEntropyLoss()

Create the training-data loader and the validation-data loader object:



In [9]:

    
train_loader=torch.utils.data.DataLoader(dataset=train_dataset,batch_size=2000,shuffle=True)
validation_loader=torch.utils.data.DataLoader(dataset=validation_dataset,batch_size=5000,shuffle=False)

Define the Neural Network, Criterion Function, Optimizer, and Train the Model

Create the criterion function:



In [10]:

    
criterion=nn.CrossEntropyLoss()

Create the model with 100 hidden layers:



In [11]:

    
input_dim=28*28
hidden_dim=100
output_dim=10

model=Net(input_dim,hidden_dim,output_dim)

Print the model parameters: