Pascal VOC Dataset

In this notebook you're going to implement the dataloader for the Pascal-VOC dataset using PyTorch. As introduced in the last talk, Pascal-VOC is a large dataset comprising some 300,000 images which need to be classified into 20 different categories.

Preparing the dataset

Before we begin to write a dataloader, we need to get the data in proper directories so that we can access them using our program. Instructions for downloading the PASCAL-VOC dataset can be found here.

After you've extracted the dataset and put the data in correct directories, and followed the instructions for genrating the train.txt file, you're ready to write dataloader.


In [ ]:
import os, sys 
import matplotlib.pyplot as plt 
from PIL import Image
import numpy as np

import torch 
import torch.utils.data as data 

#### Pascal-VOC Dataset Loader ##### 



class VOCLoader(data.Dataset):
    
    def __init__(self, root, transforms=None):
        ####################################################
        # As explained in the previous notebook 
        # this method is used for defining the 
        # variables that are useful for dataloading
        # Here we the following variables :
        # root: The 'root' directory of your dataset
        # transforms: The 'transforms' that need to be 
        # done to the dataset before they can be used.
        ######################################################
        self.root = root 
        self.transforms = transforms 
        self.train_f  = os.path.join(self.root, 'train.txt')
        self._ids = self._build_dataset(self.train_f)
    
    
    def _build_dataset(self, fname):
        ##########################################################
        # In order to load a dataset we
        # need a list of files which we can
        # then load. Notice the self._ids 
        # variable. This function reads the text file and builds 
        # the list which is then put in that variable.
        ##########################################################
        ids = []
        ###############################################
        #
        #             YOUR CODE HERE 
        #
        ###############################################
        
        return ids 
    
    def _load_labels(self, txt_file):
        ####################################################
        # When we prepared the data we ran a script which put 
        # text label files in the "labels" directory. In the 
        # task of image classification we're concerned only with 
        # the class id of the image. This is the first number in
        # every line of the text file. This function takes the
        # text file as the argument and returns the number of 
        # classes and a list containing the classes 
        #####################################################
        lines = [] 
        ###############################################
        #
        #             YOUR CODE HERE 
        #
        ###############################################
        return len(lines), lines
    
    
    def __getitem__(self, index):
        #################################################
        # Once we have the list of files in the dataset, 
        # we can then load the data as we want to. In our 
        # case we are dealing with images and their labels
        # we can load images using the function PIL.Image 
        # The labels will be loaded using the helper function
        # you wrote above
        ##################################################
        im = Image.open(self.ids[index])
        length, labels = self._load_labels(self.ids[index])
        if self.transforms is not None:
            im = self.transforms(im)
            labels = self.transforms(labels)
        
        return im, labels

Once this class is written we can test it out using the following code. You will need to change the root variable in the following code to make it run correctly. In the last test code we saw how to load CIFAR-10 data which was a collection of matrices. In Pascal VOC we're dealing with images hence the test code is a little different. The transforms that you see in the test code are essential to the pipeline since they convert any raw data into "tensors".


In [ ]:
tf = transforms.ToTensor()
ds = VOCLoader(root, transforms=None)
train_loader = data.DataLoader(ds, batch_size=1, shuffle=True, num_workers=4)

data_iter = iter(train_loader)
img, labels = data_iter.next()
npimg = img.numpy()
npimg = npimg.squeeze()
npimg = npimg.transpose(2,1,0)
plt.imshow(npimg)
plt.title('Train Set image')
plt.show()