In this lab, you will construct a basic dataset by using PyTorch and learn how to apply basic transformations to it.
Estimated Time Needed: 30 min
The following are the libraries we are going to use for this lab. The torch.manual_seed()
is for forcing the random function to give the same number every time we try to recompile it.
In [ ]:
# These are the libraries will be used for this lab.
import torch
from torch.utils.data import Dataset
torch.manual_seed(1)
Let us try to create our own dataset class.
In [ ]:
# Define class for dataset
class toy_set(Dataset):
# Constructor with defult values
def __init__(self, length = 100, transform = None):
self.len = length
self.x = 2 * torch.ones(length, 2)
self.y = torch.ones(length, 1)
self.transform = transform
# Getter
def __getitem__(self, index):
sample = self.x[index], self.y[index]
if self.transform:
sample = self.transform(sample)
return sample
# Get Length
def __len__(self):
return self.len
Now, let us create our toy_set
object, and find out the value on index 1 and the length of the inital dataset
In [ ]:
# Create Dataset Object. Find out the value on index 1. Find out the length of Dataset Object.
our_dataset = toy_set()
print("Our toy_set object: ", our_dataset)
print("Value on index 0 of our toy_set object: ", our_dataset[0])
print("Our toy_set length: ", len(our_dataset))
As a result, we can apply the same indexing convention as a list
,
and apply the fuction len
on the toy_set
object. We are able to customize the indexing and length method by def __getitem__(self, index)
and def __len__(self)
.
Now, let us print out the first 3 elements and assign them to x and y:
In [ ]:
# Use loop to print out first 3 elements in dataset
for i in range(3):
x, y=our_dataset[i]
print("index: ", i, '; x:', x, '; y:', y)
Try to create an toy_set
object with length 50. Print out the length of your object.
In [ ]:
# Practice: Create a new object with length 50, and print the length of object out.
# Type your code here
Double-click here for the solution.
You can also create a class for transforming the data. In this case, we will try to add 1 to x and multiply y by 2:
In [ ]:
# Create tranform class add_mult
class add_mult(object):
# Constructor
def __init__(self, addx = 1, muly = 2):
self.addx = addx
self.muly = muly
# Executor
def __call__(self, sample):
x = sample[0]
y = sample[1]
x = x + self.addx
y = y * self.muly
sample = x, y
return sample
Now, create a transform object:.
In [ ]:
# Create an add_mult transform object, and an toy_set object
a_m = add_mult()
data_set = toy_set()
Assign the outputs of the original dataset to x
and y
. Then, apply the transform add_mult
to the dataset and output the values as x_
and y_
, respectively:
In [ ]:
# Use loop to print out first 10 elements in dataset
for i in range(10):
x, y = data_set[i]
print('Index: ', i, 'Original x: ', x, 'Original y: ', y)
x_, y_ = a_m(data_set[i])
print('Index: ', i, 'Transformed x_:', x_, 'Transformed y_:', y_)
As the result, x
has been added by 1 and y has been multiplied by 2, as [2, 2] + 1 = [3, 3] and [1] x 2 = [2]
We can apply the transform object every time we create a new toy_set object
? Remember, we have the constructor in toy_set class with the parameter transform = None
.
When we create a new object using the constructor, we can assign the transform object to the parameter transform, as the following code demonstrates.
In [ ]:
# Create a new data_set object with add_mult object as transform
cust_data_set = toy_set(transform = a_m)
This applied a_m
object (a transform method) to every element in cust_data_set
as initialized. Let us print out the first 10 elements in cust_data_set
in order to see whether the a_m
applied on cust_data_set
In [ ]:
# Use loop to print out first 10 elements in dataset
for i in range(10):
x, y = data_set[i]
print('Index: ', i, 'Original x: ', x, 'Original y: ', y)
x_, y_ = cust_data_set[i]
print('Index: ', i, 'Transformed x_:', x_, 'Transformed y_:', y_)
The result is the same as the previous method.
Try to construct your own my_add_mult
class by adding x and y with 2 and multiply both x and y by 10. Apply it on a new toy_set object, and print out the first 3 elements from the transformed dataset.
In [ ]:
# Practice: Construct your own my_add_mult transform. Apply my_add_mult on a new toy_set object. Print out the first three elements from the transformed dataset.
# Type your code here.
Double-click here for the solution.
<!--
class my_add_mult(object):
def init(self, add = 2, mul = 10):
self.add=add
self.mul=mul
def __call__(self, sample):
x = sample[0]
y = sample[1]
x = x + self.add
y = y + self.add
x = x * self.mul
y = y * self.mul
sample = x, y
return sample
my_dataset = toy_set(transform = my_addmult()) for i in range(3): x, y_ = mydataset[i] print('Index: ', i, 'Transformed x:', x, 'Transformed y:', y_)
-->
You can compose multiple transforms on the dataset object. First, import transforms
from torchvision
:
In [ ]:
# Run the command below when you do not have torchvision installed
# !conda install -y torchvision
from torchvision import transforms
Then, create a new transform class that multiplies each of the elements by 100:
In [ ]:
# Create tranform class mult
class mult(object):
# Constructor
def __init__(self, mult = 100):
self.mult = mult
# Executor
def __call__(self, sample):
x = sample[0]
y = sample[1]
x = x * self.mult
y = y * self.mult
sample = x, y
return sample
Now let us try to combine the transforms add_mult
and mult
In [ ]:
# Combine the add_mult() and mult()
data_transform = transforms.Compose([add_mult(), mult()])
print("The combination of transforms (Compose): ", data_transform)
The new Compose
object will perform each transform concurrently as shown in this figure:
Now we can pass the new Compose
object (The combination of methods add_mult()
and mult
) to the constructor for creating toy_set
object.
In [ ]:
# Create a new toy_set object with compose object as transform
compose_data_set = toy_set(transform = data_transform)
Let us print out the first 3 elements in different toy_set
datasets in order to compare the output after different transforms have been applied:
In [ ]:
# Use loop to print out first 3 elements in dataset
for i in range(3):
x, y = data_set[i]
print('Index: ', i, 'Original x: ', x, 'Original y: ', y)
x_, y_ = cust_data_set[i]
print('Index: ', i, 'Transformed x_:', x_, 'Transformed y_:', y_)
x_co, y_co = compose_data_set[i]
print('Index: ', i, 'Compose Transformed x_co: ', x_co ,'Compose Transformed y_co: ',y_co)
Let us see what happened on index 0. The original value of x
is [2, 2], and the original value of y
is [1]. If we only applied add_mult()
on the original dataset, then the x
became [3, 3] and y became [2]. Now let us see what is the value after applied both add_mult()
and mult()
. The result of x is [300, 300] and y is [200]. The calculation which is equavalent to the compose is x = ([2, 2] + 1) x 100 = [300, 300], y = ([1] x 2) x 100 = 2
Try to combine the mult()
and add_mult()
as mult()
to be executed first. And apply this on a new toy_set
dataset. Print out the first 3 elements in the transformed dataset.
In [ ]:
# Practice: Make a compose as mult() execute first and then add_mult(). Apply the compose on toy_set dataset. Print out the first 3 elements in the transformed dataset.
# Type your code here.
Double-click here for the solution.
Joseph Santarcangelo has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.
Other contributors: Michelle Carey, Mavis Zhou
Copyright © 2018 cognitiveclass.ai. This notebook and its source code are released under the terms of the MIT License.