CIFAR-10 Different Classifiers

Authors: Sonika, Rohit, Nishchal

The aim of this notebook is to preprocess the CIFAR-10 images and apply data augmentation techniques. After this the processed images are passed into different classifiers to see their performances in comparison to CNNs

Importing the modules


In [1]:
# Import Files
import os
import sys
import time
import random
import numpy as np
import sklearn as sk
import pandas as pd
import keras
import tensorflow as tf
import matplotlib
%matplotlib inline


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

In [2]:
# Setting the random seed, to make the results reproducible
np.random.seed(123)

CIFAR-10 data files


In [3]:
# List of files
files = os.listdir("./cifar_data")
print(files)


['HadSSP_daily_qc.txt', 'cifar-10-test-inputs.npz', 'cifar-10-valid.npz', 'cifar-10-train.npz', 'cifar-10-test-targets.npz']

Preparing the input data


In [4]:
train_data = np.load("./cifar_data/cifar-10-train.npz")

In [5]:
# Checking the files inside the data
train_data.files


Out[5]:
['label_map', 'inputs', 'targets']

In [6]:
# Train data
cifar_labels = train_data.f.label_map
cifar_train_inputs = train_data.f.inputs
cifar_train_targets = train_data.f.targets

In [7]:
# Shape of training data inputs
print("Training Input Shape: ",end="")
print(cifar_train_inputs.shape)

# 3072 = 32 * 32 * 3, image dimensions


Training Input Shape: (40000, 3072)

In [8]:
# Shape of training data targets
print("Training Target Shape: ",end="")
print(cifar_train_targets.shape)


Training Target Shape: (40000,)

In [9]:
# Labels CIFAR-10
cifar_labels


Out[9]:
array([b'airplane', b'automobile', b'bird', b'cat', b'deer', b'dog',
       b'frog', b'horse', b'ship', b'truck'], dtype='|S10')

Working on Test Data


In [10]:
# Loading test file
test_data_inp = np.load("./cifar_data/cifar-10-test-inputs.npz")
test_data_tar = np.load("./cifar_data/cifar-10-test-targets.npz")

In [11]:
# Getting the data
print(test_data_inp.files)
test_data_inputs = test_data_inp.f.inputs
print(test_data_tar.files)
test_data_targets = test_data_tar.f.targets


['inputs']
['targets']

In [12]:
# Shape of data
print("Test Input Shape: ",end="")
print(test_data_inputs.shape)
print("Test Target Shape: ",end="")
print(test_data_targets.shape)


Test Input Shape: (10000, 3072)
Test Target Shape: (10000,)

Working on Validation Set


In [13]:
# Loading Validation files
val_data = np.load("./cifar_data/cifar-10-valid.npz")

In [14]:
# Checking Validation Files
val_data.files


Out[14]:
['label_map', 'inputs', 'targets']

In [15]:
# Processing the data
val_data_labels = val_data.f.label_map
val_data_inputs = val_data.f.inputs
val_data_targets = val_data.f.targets

In [16]:
# Getting the shapes
print("Shape of Validation Inputs", end="")
print(val_data_inputs.shape)
print("Shape of Validation Targets", end="")
print(val_data_targets.shape)
print("Validation Data Labels")
print(val_data_labels)


Shape of Validation Inputs(10000, 3072)
Shape of Validation Targets(10000,)
Validation Data Labels
[b'airplane' b'automobile' b'bird' b'cat' b'deer' b'dog' b'frog' b'horse'
 b'ship' b'truck']

Using classifiers without preprocessing

1. Logistic Regression


In [55]:
from sklearn.linear_model import LogisticRegression

In [56]:
clf_logistic = LogisticRegression(random_state=0, solver='lbfgs', multi_class='multinomial').fit(cifar_train_inputs, cifar_train_targets)

In [57]:
val_predictions = clf_logistic.predict(val_data_inputs)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)

In [22]:
from sklearn.metrics import accuracy_score

In [65]:
logreg_acc = accuracy_score(val_data_targets, val_predictions)

In [73]:
print("Logistic Regression Validation Accuracy: ")
print(logreg_acc)


Logistic Regression Validation Accuracy: 
0.4064

In [74]:
test_predictions = clf_logistic.predict(test_data_inputs)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)

In [75]:
logreg_acc = accuracy_score(test_data_targets, test_predictions)
print("Logistic Regression Test Accuracy: ")
print(logreg_acc)


Logistic Regression Test Accuracy: 
0.4022

2. SVM


In [69]:
from sklearn.svm import SVC

In [71]:
clf_svm = SVC(gamma='auto')

In [76]:
clf_svm.fit(cifar_train_inputs, cifar_train_targets)


Out[76]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [77]:
val_predictions = clf_svm.predict(val_data_inputs)

In [78]:
svm_acc = accuracy_score(val_data_targets, val_predictions)

In [79]:
print("SVM Validation Accuracy: ")
print(svm_acc)


SVM Validation Accuracy: 
0.4255

In [80]:
test_predictions = clf_svm.predict(test_data_inputs)

In [81]:
svm_acc = accuracy_score(test_data_targets, test_predictions)
print("SVM Test Accuracy: ")
print(svm_acc)


SVM Test Accuracy: 
0.4194

3. Random Forest Classifier


In [82]:
from sklearn.ensemble import RandomForestClassifier


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d

In [83]:
clf_randforest = RandomForestClassifier(n_estimators=100, max_depth=2,random_state=0)

In [84]:
clf_randforest.fit(cifar_train_inputs, cifar_train_targets)


Out[84]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=2, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
            oob_score=False, random_state=0, verbose=0, warm_start=False)

In [85]:
val_predictions = clf_randforest.predict(val_data_inputs)

In [86]:
randforest_acc = accuracy_score(val_data_targets, val_predictions)

In [88]:
print("Random Forest Validation Accuracy: ")
print(randforest_acc)


Random Forest Validation Accuracy: 
0.2657

In [89]:
test_predictions = clf_randforest.predict(test_data_inputs)

In [90]:
randforest_acc = accuracy_score(test_data_targets, test_predictions)
print("Random Forest Test Accuracy: ")
print(randforest_acc)


Random Forest Test Accuracy: 
0.2574

4. Multi Layer Perceptron (MLP)


In [91]:
from sklearn.neural_network import MLPClassifier

In [92]:
clf_mlp =  MLPClassifier(alpha=1)

In [93]:
clf_mlp.fit(cifar_train_inputs, cifar_train_targets)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)
Out[93]:
MLPClassifier(activation='relu', alpha=1, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [94]:
val_predictions = clf_mlp.predict(val_data_inputs)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)

In [95]:
mlp_acc = accuracy_score(val_data_targets, val_predictions)

In [96]:
print("MLP Validation Accuracy: ")
print(mlp_acc)


MLP Validation Accuracy: 
0.4352

In [97]:
test_predictions = clf_mlp.predict(test_data_inputs)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)

In [98]:
mlp_acc = accuracy_score(test_data_targets, test_predictions)
print("Random Forest Test Accuracy: ")
print(mlp_acc)


Random Forest Test Accuracy: 
0.4334

5. KNN Classifier


In [17]:
from sklearn.neighbors import KNeighborsClassifier

In [18]:
clf_knn = KNeighborsClassifier(3)

In [19]:
clf_knn.fit(cifar_train_inputs, cifar_train_targets)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)
Out[19]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=3, p=2,
           weights='uniform')

In [20]:
val_predictions = clf_knn.predict(val_data_inputs)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)

In [23]:
knn_acc = accuracy_score(val_data_targets, val_predictions)

In [24]:
print("KNN Validation Accuracy: ")
print(knn_acc)


KNN Validation Accuracy: 
0.3194

In [25]:
test_predictions = clf_knn.predict(test_data_inputs)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)

In [26]:
knn_acc = accuracy_score(test_data_targets, test_predictions)
print("KNN Forest Test Accuracy: ")
print(knn_acc)


KNN Forest Test Accuracy: 
0.3137

6. Ada Boost Classifier


In [17]:
from sklearn.ensemble import AdaBoostClassifier


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d

In [18]:
clf_ada = AdaBoostClassifier()

In [19]:
clf_ada.fit(cifar_train_inputs, cifar_train_targets)


Out[19]:
AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,
          learning_rate=1.0, n_estimators=50, random_state=None)

In [20]:
val_predictions = clf_ada.predict(val_data_inputs)

In [23]:
ada_acc = accuracy_score(val_data_targets, val_predictions)

In [24]:
print("ADA Validation Accuracy: ")
print(ada_acc)


ADA Validation Accuracy: 
0.3067

In [25]:
test_predictions = clf_ada.predict(test_data_inputs)

In [26]:
ada_acc = accuracy_score(test_data_targets, test_predictions)
print("ADA Forest Test Accuracy: ")
print(ada_acc)


ADA Forest Test Accuracy: 
0.3034

7. Naive Bayes Classifier


In [27]:
from sklearn.naive_bayes import GaussianNB

In [28]:
clf_nb = GaussianNB()

In [29]:
clf_nb.fit(cifar_train_inputs, cifar_train_targets)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)
Out[29]:
GaussianNB(priors=None)

In [30]:
val_predictions = clf_nb.predict(val_data_inputs)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)

In [31]:
nb_acc = accuracy_score(val_data_targets, val_predictions)

In [32]:
print("NB Validation Accuracy: ")
print(nb_acc)


NB Validation Accuracy: 
0.2877

In [33]:
test_predictions = clf_nb.predict(test_data_inputs)


/home/prodigal-son/miniconda2/envs/sim/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)

In [34]:
nb_acc = accuracy_score(test_data_targets, test_predictions)
print("NB Forest Test Accuracy: ")
print(nb_acc)


NB Forest Test Accuracy: 
0.2839

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]: