Central to all neural networks in PyTorch is the autograd package. Let’s first briefly visit this, and we will then go to training our first neural network.
The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by- run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.
torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.
define a function:
x = [[1,1],[1,1]]
y = x + 2
z = y ^ 2 * 3
out = z.mean()
In [1]:
import torch
# create a tensor with setting its .requires_grad as Ture
x = torch.ones(2, 2, requires_grad=True)
print(x)
x1 = torch.ones(2,2,requires_grad=False)
# x1.requires_grad_(True)
print(x1)
In [2]:
y = x + 2
print(y)
y1 = x1 + 2
print(y1)
y was created as a result of an operation, so it has a grad_fn.But y1 not
In [3]:
print(y.grad_fn)
print(y1.grad_fn)
In [4]:
z = y * y * 3
z1 = y1 * y1 * 3
out = z.mean() #calculate z average value
out1 = z1.mean() #calculate z1 average value
print(z, out)
print(z1, out1)
.requiresgrad( ) changes an existing Tensor’s requires_grad flag in-place. The input flag defaults to False if not given.
Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor (except for Tensors created by the user - their grad_fn is None).
In [5]:
a = torch.randn(2, 2) # a is created by user, its .grad_fn is None
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True) # change the attribute .grad_fn of a
print(a.requires_grad)
b = (a * a).sum() # add all elements of a to b
print(b.grad_fn)
In [6]:
out.backward()
# out.backward(torch.tensor(1.))
# out1.backward()
you can get parameters gradient like below:
In [7]:
x_grad = x.grad
y_grad = y.grad
z_grad = z.grad
print(x_grad)
print(y_grad)
print(z_grad)
If you want to compute the derivatives, you can call .backward( ) on a Tensor. If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward( ), however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.
define a function:
x = [1, 1, 1]
y = x + [1, 2, 3]
z = y ^ 3
In [8]:
x = torch.ones(3, requires_grad=True)
y = x + torch.tensor([1., 2., 3.])
z = y * y * y
print(z)
v = torch.tensor([1, 0.1, 0.01])
# z is a vector, so you need to specify a gradient whose size is the same as z
z.backward(v)
print(x.grad)
当variables
是标量的时候,.backward()
传递的参数grad_variables
可省略。该参数实际上可表示为下面代码中的z
和x
的函数关系的系数,默认为1.0。
下面代码对于$y=x^2+2x+4$进行求导,可以观察到指定权重为2时,梯度是原来的两倍,即对$y=2x^2+4x+8$求导。
In [9]:
x = torch.ones(1, requires_grad=True)
y = x * x + 2 * x + 4
y.backward(retain_graph=True)
print(x.grad)
x.grad.data.zero_()
y.backward(torch.tensor([2.0]), retain_graph=True)
print(x.grad)
当variables
是张量的时候,.backward()
传递的参数grad_variables
不可省略,且张量大小必须与variables
一致。该参数也是系数,表示x
各分量的权重。
In [10]:
x = torch.ones(2, requires_grad=True)
t = x + torch.tensor([1., 2.])
y = t * t + 2 * t + 4
y.backward(torch.tensor([1., 0.1]), retain_graph=True)
print(x.grad)
A typical training procedure for a neural network is as follows:
In [11]:
# show all points, you can skip this cell
def show_original_points():
label_csv = open('./labels/label.csv', 'r')
label_writer = csv.reader(label_csv)
class1_point = []
class2_point = []
class3_point = []
for item in label_writer:
if item[2] == '0':
class1_point.append([item[0], item[1]])
elif item[2] == '1':
class2_point.append([item[0], item[1]])
else:
class3_point.append([item[0], item[1]])
data1 = np.array(class1_point, dtype=float)
data2 = np.array(class2_point, dtype=float)
data3 = np.array(class3_point, dtype=float)
x1, y1 = data1.T
x2, y2 = data2.T
x3, y3 = data3.T
plt.figure()
plt.scatter(x1, y1, c='b', marker='.')
plt.scatter(x2, y2, c='r', marker='.')
plt.scatter(x3, y3, c='g', marker='.')
plt.axis()
plt.title('scatter')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
When you define a network, your class must to inherit nn.Module, then you should to overload __init__ method and forward method
Network(
(hidden): Linear(in_features=2, out_features=5, bias=True)
(sigmiod): Sigmoid()
(predict): Linear(in_features=5, out_features=3, bias=True)
)
In [12]:
import numpy as np
import matplotlib.pyplot as plt
import torchvision
import torch
import pandas as pd
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import time
import csv
import numpy as np
In [13]:
class Network(nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
'''
Args:
n_feature(int): size of input tensor
n_hidden(int): size of hidden layer
n_output(int): size of output tensor
'''
super(Network, self).__init__()
# define a liner layer
self.hidden = nn.Linear(n_feature, n_hidden)
# define sigmoid activation
self.sigmoid = nn.Sigmoid()
self.predict = nn.Linear(n_hidden, n_output)
def forward(self, x):
'''
x(tensor): inputs of the network
'''
# hidden layer
h1 = self.hidden(x)
# activate function
h2 = self.sigmoid(h1)
# output layer
out = self.predict(h2)
'''
Linear classifier often follows softmax to output probability,
however the loss function CrossEntropy we used have done this
operation, so we don't use softmax function here.
'''
return out
CrossEntropy written in pytorch: https://pytorch.org/docs/stable/nn.html?highlight=crossentropy#torch.nn.CrossEntropyLoss
In [14]:
class PointDataset(Dataset):
def __init__(self, csv_file, transform=None):
'''
Args:
csv_file(string): path of label file
transform (callable, optional): Optional transform to be applied
on a sample.
'''
self.frame = pd.read_csv(csv_file, encoding='utf-8', header=None)
print('csv_file source ---->', csv_file)
self.transform = transform
def __len__(self):
return len(self.frame)
def __getitem__(self, idx):
x = self.frame.iloc[idx, 0]
y = self.frame.iloc[idx, 1]
point = np.array([x, y])
label = int(self.frame.iloc[idx, 2])
if self.transform is not None:
point = self.transform(point)
sample = {'point': point, 'label': label}
return sample
In [15]:
def train(classifier_net, trainloader, testloader, device, lr, optimizer):
'''
Args:
classifier_net(nn.model): train model
trainloader(torch.utils.data.DateLoader): train loader
testloader(torch.utils.data.DateLoader): test loader
device(torch.device): the evironment your model training
LR(float): learning rate
'''
# loss function
criterion = nn.CrossEntropyLoss().to(device)
optimizer = optimizer
# save the mean value of loss in an epoch
running_loss = []
running_accuracy = []
# count loss in an epoch
temp_loss = 0.0
# count the iteration number in an epoch
iteration = 0
for epoch in range(epoches):
'''
adjust learning rate when you are training the model
'''
# adjust learning rate
# if epoch % 100 == 0 and epoch != 0:
# LR = LR * 0.1
# for param_group in optimizer.param_groups:
# param_group['lr'] = LR
for i, data in enumerate(trainloader):
point, label = data['point'], data['label']
point, label = point.to(device).to(torch.float32), label.to(device)
outputs = classifier_net(point)
'''# TODO'''
# empty parameters in optimizer
optimizer.zero_grad()
# calcutate loss value
loss = criterion(outputs, label)
# back propagation
loss.backward()
# update parameters in optimizer(update weigtht)
optimizer.step()
'''# TODO END'''
# save loss in a list
temp_loss += loss.item()
iteration +=1
# print loss value
# print('[{0:d},{1:5.0f}] loss {2:.5f}'.format(epoch + 1, i, loss.item()))
#slow down speed of print function
# time.sleep(0.5)
running_loss.append(temp_loss / iteration)
temp_loss = 0
iteration = 0
# print('test {}:----------------------------------------------------------------'.format(epoch))
# call test function and return accuracy
running_accuracy.append(predict(classifier_net, testloader, device))
# show loss curve
show_running_loss(running_loss)
# show accuracy curve
show_accuracy(running_accuracy)
return classifier_net
In [16]:
# show running loss curve, you can skip this cell.
def show_running_loss(running_loss):
# generate x value
x = np.array([i for i in range(len(running_loss))])
# generate y value
y = np.array(running_loss)
# define a graph
plt.figure()
# generate curve
plt.plot(x, y, c='b')
# show axis
plt.axis()
# define title
plt.title('loss curve:')
#define the name of x axis
plt.xlabel('step')
plt.ylabel('loss value')
# show graph
plt.show()
In [17]:
def predict(classifier_net, testloader, device):
# correct = [0 for i in range(3)]
# total = [0 for i in range(3)]
correct = 0
total = 0
with torch.no_grad():
'''
you can also stop autograd from tracking history on Tensors with .requires_grad=True
by wrapping the code block in with torch.no_grad():
'''
for data in testloader:
point, label = data['point'], data['label']
point, label = point.to(device).to(torch.float32), label.to(device)
outputs = classifier_net(point)
'''
if you want to get probability of the model prediction,
you can use softmax function here to transform outputs to probability.
'''
# transform the prediction to one-hot form
_, predicted = torch.max(outputs, 1)
# print('model prediction: ', predicted)
# print('ground truth:', label, '\n')
correct += (predicted == label).sum()
total += label.size(0)
# print('current correct is:', correct.item())
# print('current total is:', total)
# print('the accuracy of the model is {0:5f}'.format(correct.item()/total))
return correct.item() / total
In [18]:
# show accuracy curve, you can skip this cell.
def show_accuracy(running_accuracy):
x = np.array([i for i in range(len(running_accuracy))])
y = np.array(running_accuracy)
plt.figure()
plt.plot(x, y, c='b')
plt.axis()
plt.title('accuracy curve:')
plt.xlabel('step')
plt.ylabel('accuracy value')
plt.show()
In [19]:
if __name__ == '__main__':
'''
change train epoches here
'''
# number of training
epoches = 100
'''
change learning rate here
'''
# learning rate
# 1e-4 = e^-4
lr = 1e-3
'''
change batch size here
'''
# batch size
batch_size = 16
# define a transform to pretreat data
transform = torch.tensor
# define a gpu device
device = torch.device('cuda:0')
# define a trainset
trainset = PointDataset('./labels/train.csv', transform=transform)
# define a trainloader
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
# define a testset
testset = PointDataset('./labels/test.csv', transform=transform)
# define a testloader
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
show_original_points()
# define a network
classifier_net = Network(2, 5, 3).to(device)
'''
change optimizer here
'''
# define a optimizer
optimizer = optim.SGD(classifier_net.parameters(), lr=lr, momentum=0.9)
# get trained model
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer)
下面调节学习率至原来的一半。
可以观察到学习率越高,损失值下降越快,而准确率提高越快;而学习率下降,损失值下降变得更加缓慢,准确率提高也变得缓慢。当学习率越高时,权重修改的大小就越大,因此损失值和准确率变化就更加快,模型训练很容易就达到了最优解附近。
其次,学习率过高时,会导致损失值和准确率存在波动。由于学习率过高,在训练的过程中权重修改很容易导致左右摆动,在最低点左右迭代,因此产生震荡。学习率过低,会导致模型训练完后还达不到想要的效果。
在本次数据中,上面代码给定的1e-3
的学习率过低,导致最终的损失值在0.2~0.4左右,而将学习率调为0.01
,不仅损失值下降快,而且最终效果要比前者好一点。
In [20]:
epoches = 100
lr = 0.01
batch_size = 16
transform = torch.tensor
device = torch.device('cuda:0')
trainset = PointDataset('./labels/train.csv', transform=transform)
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
testset = PointDataset('./labels/test.csv', transform=transform)
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
show_original_points()
classifier_net = Network(2, 5, 3).to(device)
optimizer = optim.SGD(classifier_net.parameters(), lr=lr, momentum=0.9)
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer)
Batch的大小影响收敛速度和处理数据速度。
当batch越大时,每一代训练需要处理的时间就越少,因为处理的批次减少。但是当batch增大时,收敛速度也大大减低,并且损失值和准确率收敛的速度也下降。当处理的批次减少时,每一代得到收敛次数就减少了,因此两个值收敛速度减少。
当batch变小时,每一代训练需要处理的时间将变长。但是损失值和准确率收敛速度加快。因为每一代处理的批次增多了,损失值和准确率的收敛就加快了。
In [21]:
epoches = 100
lr = 1e-3
batch_size = 1
transform = torch.tensor
device = torch.device('cuda:0')
trainset = PointDataset('./labels/train.csv', transform=transform)
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
testset = PointDataset('./labels/test.csv', transform=transform)
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
show_original_points()
classifier_net = Network(2, 5, 3).to(device)
optimizer = optim.SGD(classifier_net.parameters(), lr=lr, momentum=0.9)
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer)
In [22]:
epoches = 100
lr = 1e-3
batch_size = 30
transform = torch.tensor
device = torch.device('cuda:0')
trainset = PointDataset('./labels/train.csv', transform=transform)
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
testset = PointDataset('./labels/test.csv', transform=transform)
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
show_original_points()
classifier_net = Network(2, 5, 3).to(device)
optimizer = optim.SGD(classifier_net.parameters(), lr=lr, momentum=0.9)
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer)
In [23]:
epoches = 100
lr = 1e-3
batch_size = 210
transform = torch.tensor
device = torch.device('cuda:0')
trainset = PointDataset('./labels/train.csv', transform=transform)
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
testset = PointDataset('./labels/test.csv', transform=transform)
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
show_original_points()
classifier_net = Network(2, 5, 3).to(device)
optimizer = optim.SGD(classifier_net.parameters(), lr=lr, momentum=0.9)
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer)
动量因子越小,就越可能陷入局部最低点,这样损失值将会很高,而准确率很低。动量因子越大,就越有可能从局部最低点跃出,从而向全局最低点迈进,这样损失值将降低,而准确率将变高。
In [24]:
epoches = 100
lr = 1e-3
batch_size = 16
transform = torch.tensor
device = torch.device('cuda:0')
trainset = PointDataset('./labels/train.csv', transform=transform)
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
testset = PointDataset('./labels/test.csv', transform=transform)
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
show_original_points()
classifier_net = Network(2, 5, 3).to(device)
optimizer = optim.SGD(classifier_net.parameters(), lr=lr, momentum=0.0)
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer)
In [25]:
epoches = 100
lr = 1e-3
batch_size = 16
transform = torch.tensor
device = torch.device('cuda:0')
trainset = PointDataset('./labels/train.csv', transform=transform)
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
testset = PointDataset('./labels/test.csv', transform=transform)
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
show_original_points()
classifier_net = Network(2, 5, 3).to(device)
optimizer = optim.SGD(classifier_net.parameters(), lr=lr, momentum=0.9)
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer)
Adam效率最差,SGD次之,而Rprop优化器的效率最高,可以更快得到更低的损失值和更高的准确率。
In [26]:
epoches = 100
lr = 1e-3
batch_size = 16
transform = torch.tensor
device = torch.device('cuda:0')
trainset = PointDataset('./labels/train.csv', transform=transform)
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
testset = PointDataset('./labels/test.csv', transform=transform)
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
show_original_points()
classifier_net = Network(2, 5, 3).to(device)
optimizer = optim.Adam(classifier_net.parameters(), lr=lr)
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer)
In [27]:
epoches = 100
lr = 1e-3
batch_size = 16
transform = torch.tensor
device = torch.device('cuda:0')
trainset = PointDataset('./labels/train.csv', transform=transform)
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
testset = PointDataset('./labels/test.csv', transform=transform)
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
show_original_points()
classifier_net = Network(2, 5, 3).to(device)
optimizer = optim.Rprop(classifier_net.parameters(), lr=lr)
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer)
当学习率为0.01、batch大小为30时,采用Rprop优化器的模型收敛最快,且执行速度也不慢。
在训练过程中,调节好学习率可以加快模型收敛速度,而增大batch的大小可以让模型训练时间缩短,但是会讲低收敛速度,甚至可能影响最终准确率。在三种优化器中,Rprop效果最好,SGD次之,Adam效果最差。
In [28]:
epoches = 100
lr = 0.01
batch_size = 30
transform = torch.tensor
device = torch.device('cuda:0')
trainset = PointDataset('./labels/train.csv', transform=transform)
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
testset = PointDataset('./labels/test.csv', transform=transform)
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
show_original_points()
classifier_net = Network(2, 5, 3).to(device)
optimizer = optim.Rprop(classifier_net.parameters(), lr=lr)
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer)
A lot of effort in solving any machine learning problem goes in to preparing the data. PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable. In this tutorial, we will see how to load and preprocess/augment data from a non trivial dataset.
scikit-image: For image io and transforms
sudo apt-get install python-numpy
sudo apt-get install python-scipy
sudo apt-get install python-matplotlib
sudo pip install scikit-image
pandas: For easier csv parsing
In [29]:
import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils
plt.ion() # interactive mode
In [30]:
# read a csv file by pandas
landmarks_frame = pd.read_csv('data/faces/face_landmarks.csv')
n = 0
# read image name, image name was saved in column 1.
img_name = landmarks_frame.iloc[n, 0]
# points were saved in columns from 2 to the end
landmarks = landmarks_frame.iloc[n, 1:].as_matrix()
# reshape the formate of points
landmarks = landmarks.astype('float').reshape(-1, 2)
print('Image name: {}'.format(img_name))
print('Landmarks shape: {}'.format(landmarks.shape))
print('First 4 Landmarks: {}'.format(landmarks[:4]))
In [31]:
def show_landmarks(image, landmarks):
"""Show image with landmarks"""
plt.imshow(image)
plt.scatter(landmarks[:, 0], landmarks[:, 1], s=10, marker='.', c='r')
plt.pause(0.001) # pause a bit so that plots are updated
plt.figure()
show_landmarks(io.imread(os.path.join('data/faces/', img_name)),
landmarks)
plt.show()
In [32]:
class FaceLandmarksDataset(Dataset):
def __init__(self, csv_file, root_dir, transform=None):
"""
Args:
csv_file (string): Path to the csv file with annotations.
root_dir (string): Directory with all the images.
transform (callable, optional): Optional transform to be applied
on a sample.
"""
self.landmarks_frame = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.landmarks_frame)
def __getitem__(self, idx):
# combine the relative path of images
img_name = os.path.join(self.root_dir,
self.landmarks_frame.iloc[idx, 0])
image = io.imread(img_name)
landmarks = self.landmarks_frame.iloc[idx, 1:].as_matrix()
landmarks = landmarks.astype('float').reshape(-1, 2)
# save all data we may need during training a network in a dict
sample = {'image': image, 'landmarks': landmarks}
if self.transform:
sample = self.transform(sample)
return sample
In [33]:
face_dataset = FaceLandmarksDataset(csv_file='data/faces/face_landmarks.csv',
root_dir='data/faces/')
fig = plt.figure()
for i in range(len(face_dataset)):
sample = face_dataset[i]
print(i, sample['image'].shape, sample['landmarks'].shape)
# create subgraph
ax = plt.subplot(1, 4, i + 1)
plt.tight_layout()
ax.set_title('Sample #{}'.format(i))
ax.axis('off')
show_landmarks(**sample)
if i == 3:
plt.show()
break